Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

$pwd:

skill-description-eval

Name: Skill Description Eval
Author: kylesnowschwartz

// Always consult before writing, scoring, or rewriting any SKILL.md description field. Provides the rubric, template, and apply script that handle YAML safety and produce a backup on each write.

In Manus ausführen

$ git log --oneline --stat

stars:3

forks:0

updated:28. Mai 2026 um 20:23

Datei-Explorer

2 Dateien

SKILL.md

readonly

related-skills.json

gleiches Repository

extract-web-design.md

from "kylesnowschwartz/dotfiles"

Extract design DNA from a website into a reusable specification with steerable variables

2026-05-293

recall.md

from "kylesnowschwartz/dotfiles"

YOU MUST use this skill for anything involving past Claude Code conversations in ~/.claude/projects/: when the conversation starts with 'this session is being continued from a previous conversation that ran out of context', or to recover context from a previous conversation, or to discover/catalog which conversations exist (by date, project, message count), or to search and read past conversation logs. One entry point for cataloging, searching, and rendering Claude Code JSONL transcripts.

2026-05-293

gws.md

from "kylesnowschwartz/dotfiles"

Always consult before any Google Workspace operation via the `gws` CLI: sending or reading Gmail and triaging the inbox, Calendar events and agendas, Docs, Drive files and folders, Sheets, Slides, Forms, and adding, listing, or completing Tasks, reminders, and to-dos. Routes to per-service reference files for auth, flags, and command syntax.

2026-05-283

human-writing.md

from "kylesnowschwartz/dotfiles"

Always consult before drafting any human-facing prose: messages, emails, pull request descriptions, GitHub issues, Reddit posts, agendas, documents, code review comments, bug reports, release notes, or any other text a person will read. Enforces direct, warm, unfilled tone; removes AI tells; runs a self-audit before delivery. Skip ONLY when the task is operational (calling an API, running a CLI) rather than composing prose.

2026-05-283

playwright-cli.md

from "kylesnowschwartz/dotfiles"

Always consult before opening, reading, navigating, scraping, fetching, or screenshotting any URL or webpage. Handles browser automation, JavaScript-rendered content, form filling, dynamic page interaction, and structured extraction in ways native HTTP fetch cannot. Skip ONLY for raw-text or JSON endpoints where a single WebFetch suffices.

2026-05-283

tmux-qa.md

from "kylesnowschwartz/dotfiles"

This skill should be used when the user asks to "QA changes", "verify this works", "test the build", "check if this runs", "validate changes in tmux", or wants end-to-end verification of code changes by running builds, tests, and applications in tmux.

2026-05-273

package.json

"author": "kylesnowschwartz"

"repository": "kylesnowschwartz/dotfiles"

GitHub-Repository öffnen Creator-Repositorys ansehen

$ install --global

$ download --local

In Manus ausführen

$ useful --forSOC

SoftwareentwicklerInformatik- und Mathematikberufe15-1252L4

name	skill-description-eval
disable-model-invocation	true
description	Always consult before writing, scoring, or rewriting any SKILL.md description field. Provides the rubric, template, and apply script that handle YAML safety and produce a backup on each write.

Skill Description Evaluation

The user wants help evaluating or rewriting one or more skill description fields. That is the description: value in the YAML frontmatter of a SKILL.md file. Models read it as part of the system prompt and use it as an activation trigger. Write it as one.

This skill is a procedure. Run the four steps below in order. Ask the user before assuming what they want. Works on a single description, a single SKILL.md file, or many SKILL.md files in batch; the procedure is the same.

Bundled script

scripts/apply-rewrites.ts applies a list of rewrites to one or more SKILL.md files in place, with timestamped backups and post-write verification. Used in Step 4. Run with --test to verify the YAML-safety logic against its inline test cases.

The design rule

A skill description should:

Start with an imperative trigger. "Always consult before [verb-list]..." or equivalent commanding phrasing. Models read "Use when..." as advisory and "Always consult before..." as policy.
List concrete trigger verbs and nouns. The model pattern-matches the user's intent vocabulary against the description's lexical surface. More relevant verbs and nouns activate more user intents.
Name what the skill encodes. One sentence on the specific thing the skill provides (API auth, CLI flag combinations, conventions, error handling, formatting rules) that the model would otherwise have to guess. This gives the model a reason to consult beyond pattern-match.
Stay token-economical. Every skill description loads into the system prompt of every session. The cost compounds linearly with the number of skills in the catalog. Use the fewest verbs, nouns, and sentences that still satisfy rules 1–3. Aim for ≤50 tokens (~200 characters). If a description grows past ~75 tokens, drop the weakest triggers or merge them ("messages, emails, docs" rather than enumerating every variant).
Match the user's vocabulary. Descriptions are activation surfaces. Write trigger words in terms a user would say when asking for help.

Known limits

Be honest with the user about scope.

Some intents can't be fixed at the description level. Phrases like "file on GitHub" route the model to native tools (gh CLI) regardless of what any skill description claims. If the user's failing case mentions a native tool or CLI the model would reach for directly, say so up front. Rewriting the description will not close that gap on its own.
SKIP clauses are uncertain. Whether Skip ONLY when … clauses help, hurt, or are neutral is an open question. State this honestly when the user asks about them.
Minimum-token thresholds are unknown. When in doubt, prefer fewer tokens.

Procedure

Step 1: Gather input

Ask the user what they want evaluated. The three common shapes:

A single description, pasted in chat. Evaluate inline; no file I/O needed.
A single SKILL.md file path. Read it; extract the description: field from the frontmatter.
A directory of skills. List the SKILL.md files inside. If there are more than a handful, ask whether to evaluate all of them or filter (only ones that don't start with "Always", or only ones the user names).

If the user's request is ambiguous about which of these they mean, ask before doing any reads.

Step 2: Evaluate against the rubric

For each description, score the six criteria. Use ✓ present, ~ partial, ✗ absent.

Criterion	Check
Imperative opening	Starts with "Always consult before…", "Always use when…", or equivalent commanding phrasing. NOT "Use when…", "Manages…", "Helps with…"
Trigger verbs	Concrete verb list covering the actions the skill performs ("adding, listing, completing, deleting, querying"). Count them.
Trigger nouns	Concrete noun list covering the domain ("todo, task, action item, reminder, follow-up"). Count them.
Encoded knowledge	One sentence naming what the skill provides that the model would otherwise guess.
Token economy	≤ ~50 tokens / ~200 chars total (1 token ≈ 4 English chars). Verbs and nouns are merged where possible ("messages, emails, docs") rather than enumerated exhaustively.
User vocabulary	Uses words a user would say when asking for help, not the skill's internal terms or implementation details.

The rubric splits rule #2 into separate verb and noun checks because they are independently countable. Present the scores as a small table for each description.

Step 3: Propose a rewrite

If any criterion scored less than ✓, write a rewrite using the template:

description: "Always consult before <verb1>, <verb2>, …, <verbN>ing any <noun1>, <noun2>, …, <nounM>, or related <domain-category>. Encodes <what the skill knows that the model would otherwise guess: API auth, CLI subcommands, conventions, error patterns>."

Aim for ≤50 tokens (~200 characters). If the trigger list naturally runs past ~5 verbs or ~5 nouns, group them by category ("messages, emails, docs") instead of enumerating every variant. If after grouping the description still exceeds ~75 tokens, the skill is probably doing two jobs and should be split.

For each description show the user, in order:

The original description: value (verbatim, in a code block).
The criterion-by-criterion scores from Step 2.
The proposed rewrite (in a code block).
A one-line note per change you made, mapping it back to a criterion.

Do not propose a rewrite if all six criteria already scored ✓. Tell the user the description already follows the rule.

Step 4: Apply changes if (and only if) the user asks

Default behavior is suggest-only. If the user explicitly asks you to apply changes, use the bundled script. It handles YAML safety, timestamped backups, and post-write verification.

Write a rewrites JSON at a temp location. Format:

[
  { "path": "/Users/me/.claude/skills/foo/SKILL.md",
    "description": "Always consult before …" },
  { "path": "/Users/me/.pi/agent/skills/bar/SKILL.md",
    "description": "…" }
]

Dry-run first to surface the planned changes:
```
bun ~/.claude/skills/skill-description-eval/scripts/apply-rewrites.ts \
  --rewrites /tmp/rewrites.json
```
Prints old description, new description, and char delta per file. Show this to the user before apply.

Apply once the user confirms:

bun ~/.claude/skills/skill-description-eval/scripts/apply-rewrites.ts \
  --rewrites /tmp/rewrites.json --apply

If the script aborts with multi-line description refuses rewrite or similar, escalate to the user. Those files need manual handling. Do not attempt to patch them yourself.

Reversing a change:

LATEST=$(ls -t /path/to/SKILL.md.backup-* | head -1)
cp "$LATEST" /path/to/SKILL.md

Fallback when bun is unavailable: suggest the rewrites and ask the user to apply them. If you must hand-edit, restrict to: single-line description: only; preserve the surrounding --- markers; write the new value double-quoted with backslash-escaped quotes and backslashes; refuse if the existing description spans multiple lines (block scalar or quoted multiline). Hand-edits to multi-line descriptions or unusual frontmatter shapes are silently destructive.

Worked examples

The four examples below show the rubric applied to real cases. Each illustrates a different failure pattern.

gws-tasks: no imperative, weak verbs, no encoded knowledge

Before

description: "Google Tasks: Manage task lists and tasks."

Scores: imperative ✗, verbs ~ (only "manage"), nouns ~ (task lists, tasks), encoded-knowledge ✗, token economy ✓ (42 chars), user-vocabulary ~.

After

description: "Always consult before adding, listing, completing, deleting, querying, or organizing any todo, task, action item, reminder, follow-up, or task list. Encodes Google Tasks API authentication, task-list-id resolution, due-date handling, and completion-state semantics."

Scores: imperative ✓, verbs ✓ (6), nouns ✓ (7), encoded-knowledge ✓, token economy ~ (270 chars / ~70 tokens, over the ≤50-token guideline; the trigger list could likely be trimmed), user-vocabulary ✓.

human-writing: no imperative, broad domain

Before

description: "Use when writing or editing a Slack message, email, pull request body, GitHub issue, Reddit post, agenda, or doc. Enforces a direct, warm, unfilled tone and removes AI tells. Always scores the final draft and runs a self-audit pass before delivery."

Scores: imperative ✗ ("Use when" not "Always"), verbs ~ (writing, editing), nouns ✓ (7), encoded-knowledge ✓, token economy ~ (249 chars / ~62 tokens), user-vocabulary ✓.

After

description: "Always consult before drafting any human-facing prose: messages, emails, pull request descriptions, GitHub issues, Reddit posts, agendas, documents, code review comments, bug reports, release notes, or any other text a person will read. Enforces direct, warm, unfilled tone; removes AI tells; runs a self-audit before delivery."

Scores: imperative ✓, verbs ✓, nouns ✓ (11+), encoded-knowledge ✓, token economy ✗ (430 chars / ~108 tokens, more triggers than necessary; grouping "messages, emails, docs, PR/issue text" instead of enumerating every variant would land closer to the ≤50-token target), user-vocabulary ✓.

playwright-cli: no imperative, unique domain

Before

description: "Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages."

Scores: imperative ✗, verbs ✓, nouns ✓, encoded-knowledge ~, token economy ~ (260 chars / ~65 tokens), user-vocabulary ✓.

After

description: "Always consult before opening, reading, navigating, scraping, fetching, or screenshotting any URL or webpage. Handles browser automation, JavaScript-rendered content, form filling, dynamic page interaction, and structured extraction in ways native HTTP fetch cannot."

Scores: imperative ✓, verbs ✓, nouns ✓, encoded-knowledge ✓, token economy ~ (341 chars / ~85 tokens, over guideline), user-vocabulary ✓.

github: no imperative, jargon-centric

Before

description: "Interact with GitHub using the `gh` CLI. Use `gh issue`, `gh pr`, `gh run`, and `gh api` for issues, PRs, CI runs, and advanced queries."

Scores: imperative ✗, verbs ~ (interact), nouns ✓ (issues, PRs, CI runs), encoded-knowledge ~, token economy ✓ (137 chars / ~34 tokens), user-vocabulary ✗ (centers gh rather than what the user wants to do).

After

description: "Always consult before interacting with GitHub: searching repos, listing or creating issues, listing or creating PRs, viewing CI runs, fetching repo content, querying GraphQL, or reviewing PRs. Encodes correct `gh` subcommands, flag combinations, repository scoping rules, and pagination patterns."

Scores: imperative ✓, verbs ✓, nouns ✓ (5), encoded-knowledge ✓, token economy ~ (401 chars / ~100 tokens, over guideline; "searching repos, issues, PRs, CI runs" could merge into one phrase), user-vocabulary ✓.

When in doubt

Token economy first, completeness second. Every token in a description is paid every session. If trimming the trigger list from 8 verbs to 5 still hits all six rubric criteria, ship the shorter version. A description that runs into a fourth sentence usually means the skill is doing two jobs and should be split.
Use the user's word. "Manage" is what authors think their skill does. "Add", "list", "complete" are what users ask for.
Don't claim knowledge the skill doesn't deliver. If the skill is a thin CLI wrapper, the encoded-knowledge sentence is honest about that ("Encodes the correct flag combinations and pagination patterns"). Don't promise authentication semantics if the skill doesn't actually handle them.

When to flag scope to the user

A good description helps the model activate. It does not guarantee correct routing on intents that mention a native tool the model would reach for directly. If the user is debugging a specific failing intent and the intent mentions gh, fetch, grep, or similar, flag that description quality is one factor among several and tell them where the boundary is.

skill-description-eval

Mehr aus diesem Repository

Mehr aus diesem Repository

Skill Description Evaluation

Bundled script

The design rule

Known limits

Procedure

Step 1: Gather input

Step 2: Evaluate against the rubric

Step 3: Propose a rewrite

Step 4: Apply changes if (and only if) the user asks

Worked examples

gws-tasks: no imperative, weak verbs, no encoded knowledge

human-writing: no imperative, broad domain

playwright-cli: no imperative, unique domain

github: no imperative, jargon-centric

When in doubt

When to flag scope to the user

Skill Description Evaluation

Bundled script

The design rule

Known limits

Procedure

Step 1: Gather input

Step 2: Evaluate against the rubric

Step 3: Propose a rewrite

Step 4: Apply changes if (and only if) the user asks

Worked examples

gws-tasks: no imperative, weak verbs, no encoded knowledge

human-writing: no imperative, broad domain

playwright-cli: no imperative, unique domain

github: no imperative, jargon-centric

When in doubt

When to flag scope to the user