| name | skill-enhancer |
| description | Use when you need to audit, fix, or improve an existing agent skill to meet Gold Standard compliance. |
| tier | 2 |
| version | 1.3 |
Skill Enhancer
Purpose: This meta-skill analyzes other skills for compliance with TDD, CSO, and Script-First standards, guiding the agent through upgrades.
1. Red Flags (Anti-Rationalization)
STOP and READ THIS if you are thinking:
- "I'll just add the sections blindly" -> WRONG. You must understand why the skill fails before fixing it.
- "The description is close enough" -> WRONG. It must start with "Use when".
- "Examples are optional" -> WRONG. "Rich Skills" mandate examples.
- "It's just a small 20-line example" -> WRONG. Inline blocks > 12 lines are prohibited. Extract them.
- "I'll instruct the agent to parse the file line-by-line in text" -> WRONG. Use "Script-First".
2. Capabilities
- Audit: Detect gaps (missing Red Flags, inline blocks > 12 lines, poor CSO, weak language) using
analyze_gaps.py.
- Execution Policy Audit: Detect missing
Execution Mode, Script Contract, Safety Boundaries, and Validation Evidence sections.
- Security Remediation: Fix vulnerabilities flagged by
skill-validator (e.g., curl | bash, secrets, weak permissions).
- Plan: Propose specific content improvements using
references/refactoring_patterns.md.
- Execute: Apply refactoring patterns to upgrade the skill.
2.5. Execution Mode
- Mode:
hybrid
- Rationale: gap triage and refactoring decisions are prompt-driven, while gap detection is script-driven.
2.6. Script Contract
- Primary Command:
python3 scripts/analyze_gaps.py <target-skill-path> [--json]
- Inputs: target skill path + optional output mode.
- Outputs: structured gap list and pass/fail status.
- Failure Semantics: non-zero exit when gaps exist (for deterministic gate behavior).
2.7. Safety Boundaries
- Scope: apply edits only to explicitly selected target skill.
- Default Exclusions: do not refactor unrelated skills or global docs by default.
- Destructive Actions: full-file overwrite is prohibited unless explicitly requested and reviewed.
2.8. Validation Evidence
- Primary Evidence: before/after
analyze_gaps.py output.
- Secondary Evidence: targeted diffs proving that each reported gap was addressed.
- Quality Gate: no unresolved critical structure gaps after refactor.
3. Instructions
Phase 1: Audit
- Run Analyzer:
python3 scripts/analyze_gaps.py <target-skill-path>.
- Manual Checks:
- Graduated Language Review: Check instruction language against the graduated approach:
- Safety-critical steps (data loss, destructive ops): Must use
MUST/ALWAYS + explanation why — if the explanation is missing, add it
- Behavioral steps (formatting, style): Apply explain-why + imperative style — if bare
MUST without rationale, add the rationale; if weak "should"/"could", strengthen to imperative + reason
- Do NOT blindly replace every "should" with "MUST" — evaluate whether the instruction is safety-critical or behavioral first
- Script-First Gap: Identify if complex logic steps (> 5 lines of text) MUST be converted to a
script/.
- Review Gaps: Read the analyzer output and your manual findings.
Phase 1.5: Execution-Policy Audit
- Verify
Execution Mode section exists and is explicit (prompt-first, script-first, or hybrid).
- If skill uses
scripts/, verify Script Contract section defines command, inputs, outputs, and exit behavior.
- Verify
Safety Boundaries section defines scope limits and non-default destructive behavior.
- Verify
Validation Evidence section defines objective verification outputs.
- Mark missing pieces as migration gaps (warning-first for legacy skills).
Phase 1.7: Behavioral Analysis (If Usage Logs Available)
If transcripts or logs from real skill usage exist, analyze them for patterns:
- Repeated Code: Did the agent write the same helper script across multiple runs? → Extract to
scripts/.
- Repeated Questions: Did the agent ask the same questions or re-discover the same context? → Add to
references/.
- Excessive Token Usage: Did the agent spend tokens reading large inline blocks? → Plan extraction to external files.
- Unused Sections: Did the agent skip reading certain sections entirely? → Consider trimming or consolidating.
If no usage logs are available, skip this phase — it will become relevant after the skill is deployed.
Phase 2: Plan
- Read Target Skill: Read the content of the target skill.
- Draft Improvements:
- Token Efficiency: Identify blocks > 12 lines and plan extraction to
examples/, assets/, or references/.
- Script-First: Identify logic blocks > 5 lines and plan extraction to
scripts/.
- Execution Policy: Add missing policy sections and scope constraints.
- Graduated Language: Replace weak words using the graduated approach —
MUST + why for safety, explain-why + imperative for behavioral.
- Red Flags: Identify 2-3 likely agent excuses for this specific task.
- CSO & Pushiness: Rewrite description to "Use when [TRIGGER]...". Check if description is "pushy" enough to prevent under-triggering — add edge-case triggers and phrases like "even if the user doesn't explicitly ask for…".
- Generalization: Check if instructions are overfitted to specific examples. A skill must work across many prompts, not just the test cases it was developed with.
- Confirm: Ensure improvements align with the "Skills as Code" philosophy.
Phase 3: Execute
- Update File: Edit the target
SKILL.md to insert the new sections.
- CRITICAL: Use
replace_file_content or multi_replace_file_content.
- DO NOT use
write_to_file to overwrite existing content (Data Loss Risk).
- Tip: Use
references/refactoring_patterns.md for the style guide.
- Verify: Re-run
analyze_gaps.py. Expect output "No Gaps Found".
Phase 3.5: Security Repair (If triggered by Validator)
- Analyze Report: Read the
skill-validator JSON output.
- Consult Guide: Use
references/security_refactoring.md to find safe alternatives for flagged patterns.
- Apply Fixes:
- Shell Injection: Replace direct execution with argument arrays.
- Downloads: Replace
curl | bash with download -> inspect -> execute.
- Secrets: Move hardcoded keys to environment variables.
Phase 4: Final VDD Check
- Read Checklist: Open
references/vdd_checklist.md.
- Self-Correction: Verify your work against the 5 criteria (Data Safety, Anti-Laziness, etc.).
- Refine: If any check fails (e.g., found "TODO", found unmotivated "should"), fix it immediately.
- Test Coverage: Verify the skill has at least 2-3 test prompts — either in
evals/evals.json or documented inline. If none exist, create them based on the skill's intended use cases.
4. Best Practices
| DO THIS | DO NOT DO THIS |
|---|
| Specific Red Flags: "Don't skip tests" | Generic Red Flags: "Don't be lazy" |
| Trigger-Based Desc: "Use when debugging race conditions" | Summary Desc: "Guide for debugging" |
| Strong Verbs: "MUST", "EXECUTE", "VERIFY" | Weak Verbs: "should", "consider", "try" |
Rationalization Table
| Agent Excuse | Reality / Counter-Argument |
|---|
| "The skill is too simple for Red Flags" | Simple skills are skipped most often. Explicit rules prevent this. |
| "I don't have time to write examples" | Examples save time by preventing hallucinations later. |
| "It's easier to write logic in text" | Text logic is unreliable. Scripts are deterministic. |
5. Examples (Few-Shot)
[!TIP]
See examples/usage_example.md for a complete Before & After walkthrough of upgrading a legacy skill.
Input:
python3 scripts/analyze_gaps.py ../target-skill
Output:
⚠️ Gaps Detected...
Recommendation: Run 'Execute Improvement Plan'...
6. Resources
scripts/analyze_gaps.py: The gap detection tool.
references/writing_skills_best_practices_anthropic.md: The authoritative "Gold Standard" guide used to verify compliance.
references/testing-skills-with-subagents.md: Methodology for verifying fixes using TDD (Red-Green-Refactor).
../skill-creator/agents/grader.md: Prompt for evaluating skill execution results against expectations.
../skill-creator/agents/comparator.md: Prompt for blind A/B comparison of two skill outputs.
../skill-creator/agents/analyzer.md: Prompt for post-hoc analysis — identifies why one skill version outperforms another.