con un clic
skill-authoring
Create, optimize, and test Claude Code skills, including metadata refinement and trigger benchmarking.
Menú
Create, optimize, and test Claude Code skills, including metadata refinement and trigger benchmarking.
Design, build, and evaluate MCP servers end-to-end. Use when user says 'design an MCP server', 'decompose MCP tools', 'write a tool schema', 'implement an MCP server in Rust', 'call an MCP server from an agent', or 'evaluate MCP quality'. Five modes: DESIGN (decomposition, contracts), SCHEMA (JSON Schema), IMPLEMENT (Rust + rmcp), CLIENT (agent as consumer), QUALITY (8-dimension rubric, forked orchestrator spawns 8 parallel judge subagents).
Run a Plan-Do-Check-Act (PDCA) cycle to test a hypothesis with measurable success criteria. Use when user says 'run a PDCA cycle', 'test this hypothesis', 'A/B test this change', 'validate the improvement worked', or 'standardize this change'.
Archive completed plans, deduplicate stale memory, and clean auto-memory at the end of a project. Use when the user wants to wrap up a plan, clean up files, or run a memory cleanup pass.
Review a PR, simplify complex code, polish prose, or capture a project learning. Use when the user wants to improve the quality of an artifact (code, doc, PR) or consolidate insights into memory. Do NOT use for bug diagnosis or for creating new features.
Analyze Claude Code session transcripts to extract metrics, diagnose behavioral anti-patterns, and file GitHub issues from findings. Use when user says 'parse session log', 'session metrics', 'review session for anti-patterns', 'capture a session', 'cross-analyze artifacts', or 'create a meta-issue'. Modes: CAPTURE, INSPECT, REVIEW, ISSUE, CROSS-ANALYZE, ADJUDICATE.
Drive the Claude Code CLI from Bash — headless sessions, structured output, model/permission control, and code reviews. Use when an agent needs to run Claude programmatically.
| name | skill-authoring |
| description | Create, optimize, and test Claude Code skills, including metadata refinement and trigger benchmarking. |
| allowed-tools | Read, Edit, Write, Grep, Glob |
| when_to_use | Use when building new capabilities, fixing skill routing, or optimizing trigger signals. Do NOT use for general code authoring or project planning. |
context: fork when this skill orchestrates multiple steps, produces heavy parallel output, or builtin agent behavior matches the goal.IF creating a new skill within this plugin's structure → use WORKFLOW mode
IF writing descriptions that trigger reliably → use METHODOLOGY mode
IF optimizing existing trigger performance → use METHODOLOGY mode
Creating new skills in this plugin's structure
Before committing a skill, verify:
claude -p)Skills fall into five categories. Each has a different purpose and design pattern.
| Category | Purpose | Test |
|---|---|---|
| Constraint/Guardrail | Override the agent's DEFAULT behavior | "Does this change what the agent produces by default?" |
| Orchestration | Route work to the right specialist at the right time | "Does this define WHEN to delegate?" |
| Domain Expertise | Provide deep knowledge the base model lacks | "Would plausible-but-wrong outputs result without this?" |
| Quality Assurance | Enforce verification gates BEFORE completion | "Does this define what evidence must exist?" |
| Creative Direction | Break output convergence toward generic responses | "Does this prevent AI slop?" |
Real examples:
How categories map to Policy vs. Mechanism:
Policy = when a skill should trigger (the routing decision) Mechanism = what the skill teaches and how it guides behavior
A skill conflating policy and mechanism produces:
Good skill design separates:
Skills share the context window with everything else. Every token competes with the user's request, conversation history, and other loaded content. Skills use progressive disclosure across three levels:
Level 1 (Metadata): Always loaded at startup. Put routing signals here.
Level 2 (Instructions): Loaded when triggered via bash. Put essential principles in SKILL.md.
Level 3 (Resources & Code): Loaded as needed via bash. Put details, schemas, and executable scripts here to consume zero tokens until accessed.
Assume Claude is smart. Don't explain obvious things.
If a line doesn't earn its keep, delete it.
Design principles for skills:
| Metric | Limit | Why |
|---|---|---|
| Description length | 1024 chars max | Official limit; use the full space |
| when_to_use length | 200 chars max | Longer = context bloat, not better routing |
| Skill body | 500 lines max | Beyond = principle dilution; split or reference |
| Tools allowed | 7 max | Beyond = coordination overhead |
| Spawn prompt length | 1500 tokens max | Reliability drop beyond this |
Split signal: If a skill needs >7 tools, >500 lines, or covers multiple concerns — split into focused skills.
Simple skills (one focused task) → single SKILL.md file
Complex skills (multiple workflows, extensive domain knowledge) → use folders:
skill-name/
├── SKILL.md # Principles, routing
├── agents/ # Bundled subagent prompts
├── references/ # Step-by-step procedures + domain knowledge
├── templates/ # Output structures
└── scripts/ # Executable code
WARNING - Discovery Depth Limit: Automatic skill discovery only scans 1 level deep. Do not nest skills inside category folders (e.g., skills/category/skill-name/SKILL.md), or they will not be found by the Skill tool and will require manual Glob scanning to locate.
When to use agents/ vs inline:
agents/ when: reused by multiple skills, needs independent versioning, or must be portableHeadless trigger validation:
claude -p "<test-query>" \
--output-format stream-json \
--include-partial-messages \
2>&1 | grep -o '"skill-name"'
Train/test split: Keep 4-5 held-out cases. If 100% on training but fails on held-out → overfit. Rebuild with genuinely different queries.
Collect test queries:
| Wrong | Right | Why |
|---|---|---|
| "Helps with coding tasks" | "Reviews Python code for security vulnerabilities" | Vague triggers nothing; specific routes correctly |
| "Use when writing Go code" (too broad) | "Use when user adopts library/foo" | Restrict to single concern |
| Generic name "helper" | Domain noun "security-audit" | "helper" has no semantic anchor |
| Version metadata per file | Git history as source of truth | Version numbers never control anything |
Trigger optimization, context:fork patterns, frontmatter precision
The description is a routing prompt, not a keyword tag. Write for the model's linguistic reasoning — a good description triggers for "generate a presentation" even without the word "pptx".
Key rules:
"...") not block scalar (> or |-)when_to_use — prevents false triggeringdescription: "[Verb] [artifact] for [domain]. Use when user [trigger1], [trigger2], or [trigger3]."
when_to_use: |
Do NOT use for [exclusion1], [exclusion2].
# Commit Writer
description: "Writes commit messages from staged changes. Use when user asks to commit, types 'commit this', or 'write a commit message'."
when_to_use: |
Do NOT use for pushing, merging, or checking status.
# Deploy (side-effect skill)
description: "Deploys to staging or production. Use when user types /deploy, says 'deploy', 'ship', or 'push to prod'."
when_to_use: |
Do NOT use for building, testing, or code review.
disable-model-invocation: true
# Test Generator
description: "Creates unit tests with edge cases. Use when user asks to 'write tests', 'add test coverage', or 'generate tests'."
when_to_use: |
Do NOT use for running existing tests.
# Code Review
description: "Reviews code for bugs, security, logic errors. Use when user asks 'review code', 'check PR', 'find bugs', or 'audit this'."
when_to_use: |
Do NOT use for formatting, linting, or running tests.
# Security Audit
description: "Audits code for OWASP Top 10 security flaws. Use when user mentions 'security', 'vulnerabilities', 'SQL injection', or 'XSS'."
when_to_use: |
Do NOT use for general code review without security focus.
effort: high
| Agent type | Builtin behavior | Use when |
|---|---|---|
Explore | "Investigate. Do not edit. Summarize." | Codebase exploration |
Plan | "Analyze architecture. Plan steps." | Implementation planning |
general-purpose | None | Complex orchestration |
Use context: fork when: skill orchestrates multiple steps, produces heavy parallel output, or builtin agent behavior matches the goal.
Don't use when: skill is simple (single tool call), you need intermediate results in main conversation, or builtin behavior contradicts the skill's purpose.
context:fork vs skills: field:
context:fork — Agent provides the system prompt (role/persona), skill is the task to executeskills: field — You write the system prompt directly, skill is reference knowledge to inject at startupWhy this matters: Custom agent definitions require complex configuration and aren't portable. A simpler pattern:
agents/*.mdgeneral-purpose subagent with that content# Research Analyst Persona
## Your Role
You are a research specialist focused on finding recent, credible sources.
## Your Approach
1. Start with web search for sources from the last 12 months
2. Cross-reference claims across at least 3 independent sources
3. Identify consensus vs disagreement in the field
## Output Format
Return findings as markdown with inline citations like [source](url)
Template for new agent files:
# [Role Name]
## Your Role
[Brief description of what this agent does]
## Your Approach
[Step-by-step methodology or key principles]
## Output Format
[Expected structure for results]
The Trigger Testing Loop: Write candidate → Test with 10 queries → Analyze misses → Refine → Repeat
Build three test categories:
Description tuning knobs:
| Symptom | Fix |
|---|---|
| Never triggers | Add "Use when user says 'X'" with explicit phrases |
| Triggers too often | Add NOT clause |
| Triggers on wrong intent | Narrow the action verb ("Generate" vs "Review" vs "Fix") |
Measuring success: Trigger rate >90%, false positive rate <10%
Split when:
Combine when:
skill-name/
├── SKILL.md # YAML frontmatter + body (<500 lines)
├── agents/ # Prompt templates for portable delegation
├── scripts/ # Only for deterministic/fragile operations. Code never enters context, only output does. Install dependencies locally.
├── references/ # One level deep — schemas, cheatsheets
└── assets/ # Templates, JSON schemas
Writing style:
Three canonical rules:
Path resolution: Any path within a skill that points to its own supporting content resolves within that skill's folder. Use clean relative paths like references/file.md, agents/template.md. Never use complex variables like {baseDir} or ${CLAUDE_SKILL_DIR}.
No parent traversal: File paths MUST NOT use relative parent paths (../) to traverse outside the skill directory. Skills are self-contained. Cross-skill references must be semantic (citing a skill or role by name) rather than path-based.
Centralized routing: ONLY SKILL.md cites supporting files. Reference files must never cross-cite other reference files. The SKILL.md is the sole router — all citations flow through it, never peer-to-peer between references.
Citation language — deterministic over passive:
references/format.md BEFORE writing any code. Do not proceed or make assumptions without reading this file."Passive citations are ignored by LLMs 99% of the time. Every reference must be a strict imperative with explicit dependency.
The brittle-tool-name problem: Skills that say "Use the Agent tool" or "Use the Task tool" break when the underlying API migrates. The Task tool was renamed to Agent tool — a skill that hardcoded "Task tool" would fail silently or produce wrong behavior.
The solution: Use semantic natural language that delegates to whatever is currently available.
| Brittle (breaks on API rename) | Native (forward-compatible) |
|---|---|
Use the Agent tool to spawn | Use your native tools to spawn a subagent |
Use the Task tool | delegate work via your native tools |
Use the Write tool | Use your native tools to write the file |
Use the Edit tool | Use your native tools to make the change |
Use the Read tool | Use your native tools to read the file |
Use the Bash tool | Use your native tools to run shell commands |
Spawn a subagent with Write tool access | Spawn a subagent with write access |
Why "native tools" works: The phrase "use your native tools" forces the model to actively consult its dynamically injected tool registry rather than blindly executing a hardcoded string. It acts as a cognitive anchor — the model must enumerate what tools it actually has available and select from that live list, not from a static string in the prompt.
Principles:
BigQuery:bigquery_schema) must stay exact — those are server-level identities, not model tools.Standard fields only — no custom metadata blocks:
| Field | Purpose |
|---|---|
name | Display name. Max 64 chars, lowercase/hyphens only, no XML, no "claude"/"anthropic" |
description | Primary routing signal. Semantic intent, not keywords. ≤150 chars (max 1024), no XML |
when_to_use | Additional trigger contexts. ≤200 chars |
argument-hint | Autocomplete hint after /skill-name |
arguments | Named positional args → $name substitution in order |
disable-model-invocation | true = only you invoke. Breaks skills: preloading |
user-invocable | false = hidden from /, auto-loads by relevance |
allowed-tools | Tools pre-approved during skill. Does NOT restrict |
model | Override session model: sonnet/opus/haiku/full-ID/inherit |
effort | Thinking budget: low/medium/high/xhigh/max |
context | Set fork to run body in isolated subagent context |
agent | Subagent type when context:fork: Explore/Plan/general-purpose |
hooks | Hooks scoped to skill lifecycle |
paths | Glob patterns for auto-activation by file path |
shell | Shell for command execution: bash (default) or powershell |
| Pitfall | Fix |
|---|---|
Guidelines-only context:fork | A forked skill without actionable tasks wastes the subagent dispatch. Use forks for executing workflows, not injecting reference knowledge. |
| Vague description | "Deploy to production via bin/deploy.sh. Use when user says 'deploy' or 'ship'." |
| Missing guard rail | Set disable-model-invocation: true for destructive skills |
| Brittle path reference | Use clean relative paths (e.g., references/file.md) — paths resolve within the skill's folder by default |
Relying on allowed-tools for security | allowed-tools only partially blocks; tools like Edit and Agent completely bypass it. Use disallowed-tools instead. |
| Undeclared dependency | Document required MCP servers in .mcp.json (see knowledge/raw/official/plugins/plugins-reference.md §MCP servers); plugin agents silently ignore mcpServers frontmatter — declare all MCP servers in the plugin's .mcp.json or plugin.json, not in agent definition files |
| Global package installation | Install packages locally to avoid interfering with the user's computer environment |
| Unsafe network calls | Audit external sources; external dependencies can change and become malicious |
| Recursive trigger | Let descriptions route. No cross-references in bodies. |
| Passive file citations | Never use "You can read" or "See reference" — write "You MUST read references/X.md BEFORE proceeding. Do not make assumptions without this file." |
Symptom: "Claude never uses my skill"
disable-model-invocation: true (prevents auto-trigger)Symptom: "Claude uses my skill at the wrong time"
paths: to restrict to specific file typesSymptom: "Skill loads but doesn't do what I want"
allowed-tools: if skill needs specific tools without prompting/skill-name test arg and check $0 expansionYou MUST read the relevant reference file BEFORE working on that aspect. Do not proceed or make assumptions without reading the applicable reference.
| Reference | Purpose | When to Read (BEFORE this action) |
|---|---|---|
references/context-management.md | Context window principles, SKILL.md vs references/ load strategy | You MUST read this BEFORE deciding what belongs in body vs references/. Do NOT split, merge, or restructure a skill approaching 500 lines or 7 tools without first understanding the three-tier progressive disclosure pattern. |
references/skill-self-testing.md | YAML validation, threshold checks, trigger testing | You MUST read this BEFORE committing any new skill. Do NOT mark a skill complete without running all threshold checks, YAML validation, and trigger tests documented here. |
references/cross-skill-discovery.md | Skill routing, description patterns, name conventions | You MUST read this BEFORE writing or debugging a description that triggers incorrectly. Do NOT modify description or when_to_use without first understanding the routing patterns documented here. |
references/frontmatter-complete.md | Full frontmatter field reference with examples | You MUST read this BEFORE writing complex frontmatter. Do NOT guess at field types, defaults, or valid values — ALWAYS verify against this canonical reference first. |
Skill creation follows phases: requirements → draft → verify → integrate. Each phase is a decision point — proceed when criteria are met, not on a schedule.
Principle over procedure: A skill about coordinated work should demonstrate coordination. Frame each phase as a tracked task with clear completion criteria.
For pre-commit verification of threshold checks, ALWAYS spawn a reviewer subagent and loop until no HIGH findings.
Spawn tp-skill-auditor to audit frontmatter validity, description triggers, and structure.
Spawn tp-grader to evaluate teaching effectiveness across the four weighted dimensions.
This verifies routing signal density, delta clarity, and anti-pattern quality before the skill enters the loading pool.