mit einem Klick
skill-description-eval
// Always consult before writing, scoring, or rewriting any SKILL.md description field. Provides the rubric, template, and apply script that handle YAML safety and produce a backup on each write.
// Always consult before writing, scoring, or rewriting any SKILL.md description field. Provides the rubric, template, and apply script that handle YAML safety and produce a backup on each write.
Extract design DNA from a website into a reusable specification with steerable variables
YOU MUST use this skill for anything involving past Claude Code conversations in ~/.claude/projects/: when the conversation starts with 'this session is being continued from a previous conversation that ran out of context', or to recover context from a previous conversation, or to discover/catalog which conversations exist (by date, project, message count), or to search and read past conversation logs. One entry point for cataloging, searching, and rendering Claude Code JSONL transcripts.
Always consult before any Google Workspace operation via the `gws` CLI: sending or reading Gmail and triaging the inbox, Calendar events and agendas, Docs, Drive files and folders, Sheets, Slides, Forms, and adding, listing, or completing Tasks, reminders, and to-dos. Routes to per-service reference files for auth, flags, and command syntax.
Always consult before drafting any human-facing prose: messages, emails, pull request descriptions, GitHub issues, Reddit posts, agendas, documents, code review comments, bug reports, release notes, or any other text a person will read. Enforces direct, warm, unfilled tone; removes AI tells; runs a self-audit before delivery. Skip ONLY when the task is operational (calling an API, running a CLI) rather than composing prose.
Always consult before opening, reading, navigating, scraping, fetching, or screenshotting any URL or webpage. Handles browser automation, JavaScript-rendered content, form filling, dynamic page interaction, and structured extraction in ways native HTTP fetch cannot. Skip ONLY for raw-text or JSON endpoints where a single WebFetch suffices.
This skill should be used when the user asks to "QA changes", "verify this works", "test the build", "check if this runs", "validate changes in tmux", or wants end-to-end verification of code changes by running builds, tests, and applications in tmux.
| name | skill-description-eval |
| disable-model-invocation | true |
| description | Always consult before writing, scoring, or rewriting any SKILL.md description field. Provides the rubric, template, and apply script that handle YAML safety and produce a backup on each write. |
The user wants help evaluating or rewriting one or more skill description fields. That is the description: value in the YAML frontmatter of a SKILL.md file. Models read it as part of the system prompt and use it as an activation trigger. Write it as one.
This skill is a procedure. Run the four steps below in order. Ask the user before assuming what they want. Works on a single description, a single SKILL.md file, or many SKILL.md files in batch; the procedure is the same.
scripts/apply-rewrites.ts applies a list of rewrites to one or more SKILL.md files in place, with timestamped backups and post-write verification. Used in Step 4. Run with --test to verify the YAML-safety logic against its inline test cases.
A skill description should:
Be honest with the user about scope.
gh CLI) regardless of what any skill description claims. If the user's failing case mentions a native tool or CLI the model would reach for directly, say so up front. Rewriting the description will not close that gap on its own.Skip ONLY when … clauses help, hurt, or are neutral is an open question. State this honestly when the user asks about them.Ask the user what they want evaluated. The three common shapes:
description: field from the frontmatter.If the user's request is ambiguous about which of these they mean, ask before doing any reads.
For each description, score the six criteria. Use ✓ present, ~ partial, ✗ absent.
| Criterion | Check |
|---|---|
| Imperative opening | Starts with "Always consult before…", "Always use when…", or equivalent commanding phrasing. NOT "Use when…", "Manages…", "Helps with…" |
| Trigger verbs | Concrete verb list covering the actions the skill performs ("adding, listing, completing, deleting, querying"). Count them. |
| Trigger nouns | Concrete noun list covering the domain ("todo, task, action item, reminder, follow-up"). Count them. |
| Encoded knowledge | One sentence naming what the skill provides that the model would otherwise guess. |
| Token economy | ≤ ~50 tokens / ~200 chars total (1 token ≈ 4 English chars). Verbs and nouns are merged where possible ("messages, emails, docs") rather than enumerated exhaustively. |
| User vocabulary | Uses words a user would say when asking for help, not the skill's internal terms or implementation details. |
The rubric splits rule #2 into separate verb and noun checks because they are independently countable. Present the scores as a small table for each description.
If any criterion scored less than ✓, write a rewrite using the template:
description: "Always consult before <verb1>, <verb2>, …, <verbN>ing any <noun1>, <noun2>, …, <nounM>, or related <domain-category>. Encodes <what the skill knows that the model would otherwise guess: API auth, CLI subcommands, conventions, error patterns>."
Aim for ≤50 tokens (~200 characters). If the trigger list naturally runs past ~5 verbs or ~5 nouns, group them by category ("messages, emails, docs") instead of enumerating every variant. If after grouping the description still exceeds ~75 tokens, the skill is probably doing two jobs and should be split.
For each description show the user, in order:
description: value (verbatim, in a code block).Do not propose a rewrite if all six criteria already scored ✓. Tell the user the description already follows the rule.
Default behavior is suggest-only. If the user explicitly asks you to apply changes, use the bundled script. It handles YAML safety, timestamped backups, and post-write verification.
Write a rewrites JSON at a temp location. Format:
[
{ "path": "/Users/me/.claude/skills/foo/SKILL.md",
"description": "Always consult before …" },
{ "path": "/Users/me/.pi/agent/skills/bar/SKILL.md",
"description": "…" }
]
Dry-run first to surface the planned changes:
bun ~/.claude/skills/skill-description-eval/scripts/apply-rewrites.ts \
--rewrites /tmp/rewrites.json
Prints old description, new description, and char delta per file. Show this to the user before apply.
Apply once the user confirms:
bun ~/.claude/skills/skill-description-eval/scripts/apply-rewrites.ts \
--rewrites /tmp/rewrites.json --apply
If the script aborts with multi-line description refuses rewrite or similar, escalate to the user. Those files need manual handling. Do not attempt to patch them yourself.
Reversing a change:
LATEST=$(ls -t /path/to/SKILL.md.backup-* | head -1)
cp "$LATEST" /path/to/SKILL.md
Fallback when bun is unavailable: suggest the rewrites and ask the user to apply them. If you must hand-edit, restrict to: single-line description: only; preserve the surrounding --- markers; write the new value double-quoted with backslash-escaped quotes and backslashes; refuse if the existing description spans multiple lines (block scalar or quoted multiline). Hand-edits to multi-line descriptions or unusual frontmatter shapes are silently destructive.
The four examples below show the rubric applied to real cases. Each illustrates a different failure pattern.
Before
description: "Google Tasks: Manage task lists and tasks."
Scores: imperative ✗, verbs ~ (only "manage"), nouns ~ (task lists, tasks), encoded-knowledge ✗, token economy ✓ (42 chars), user-vocabulary ~.
After
description: "Always consult before adding, listing, completing, deleting, querying, or organizing any todo, task, action item, reminder, follow-up, or task list. Encodes Google Tasks API authentication, task-list-id resolution, due-date handling, and completion-state semantics."
Scores: imperative ✓, verbs ✓ (6), nouns ✓ (7), encoded-knowledge ✓, token economy ~ (270 chars / ~70 tokens, over the ≤50-token guideline; the trigger list could likely be trimmed), user-vocabulary ✓.
Before
description: "Use when writing or editing a Slack message, email, pull request body, GitHub issue, Reddit post, agenda, or doc. Enforces a direct, warm, unfilled tone and removes AI tells. Always scores the final draft and runs a self-audit pass before delivery."
Scores: imperative ✗ ("Use when" not "Always"), verbs ~ (writing, editing), nouns ✓ (7), encoded-knowledge ✓, token economy ~ (249 chars / ~62 tokens), user-vocabulary ✓.
After
description: "Always consult before drafting any human-facing prose: messages, emails, pull request descriptions, GitHub issues, Reddit posts, agendas, documents, code review comments, bug reports, release notes, or any other text a person will read. Enforces direct, warm, unfilled tone; removes AI tells; runs a self-audit before delivery."
Scores: imperative ✓, verbs ✓, nouns ✓ (11+), encoded-knowledge ✓, token economy ✗ (430 chars / ~108 tokens, more triggers than necessary; grouping "messages, emails, docs, PR/issue text" instead of enumerating every variant would land closer to the ≤50-token target), user-vocabulary ✓.
Before
description: "Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages."
Scores: imperative ✗, verbs ✓, nouns ✓, encoded-knowledge ~, token economy ~ (260 chars / ~65 tokens), user-vocabulary ✓.
After
description: "Always consult before opening, reading, navigating, scraping, fetching, or screenshotting any URL or webpage. Handles browser automation, JavaScript-rendered content, form filling, dynamic page interaction, and structured extraction in ways native HTTP fetch cannot."
Scores: imperative ✓, verbs ✓, nouns ✓, encoded-knowledge ✓, token economy ~ (341 chars / ~85 tokens, over guideline), user-vocabulary ✓.
Before
description: "Interact with GitHub using the `gh` CLI. Use `gh issue`, `gh pr`, `gh run`, and `gh api` for issues, PRs, CI runs, and advanced queries."
Scores: imperative ✗, verbs ~ (interact), nouns ✓ (issues, PRs, CI runs), encoded-knowledge ~, token economy ✓ (137 chars / ~34 tokens), user-vocabulary ✗ (centers gh rather than what the user wants to do).
After
description: "Always consult before interacting with GitHub: searching repos, listing or creating issues, listing or creating PRs, viewing CI runs, fetching repo content, querying GraphQL, or reviewing PRs. Encodes correct `gh` subcommands, flag combinations, repository scoping rules, and pagination patterns."
Scores: imperative ✓, verbs ✓, nouns ✓ (5), encoded-knowledge ✓, token economy ~ (401 chars / ~100 tokens, over guideline; "searching repos, issues, PRs, CI runs" could merge into one phrase), user-vocabulary ✓.
A good description helps the model activate. It does not guarantee correct routing on intents that mention a native tool the model would reach for directly. If the user is debugging a specific failing intent and the intent mentions gh, fetch, grep, or similar, flag that description quality is one factor among several and tell them where the boundary is.