| name | skill-reviewer |
| description | Review SKILL.md files for correct structure, section-purpose compliance, and absence of duplication. Evaluates whether knowledge, capabilities, rules, and examples sections each serve their intended purpose. Use when users request feedback, a quality assessment, or want to improve, fix, or ensure a copilot skill file triggers correctly. |
- User asks to review a skill file (SKILL.md)
- User asks to improve or fix a skill file (SKILL.md)
- User asks whether a SKILL.md is correctly structured
- User asks for feedback on section placement, duplication, or capability format in a skill file
- User asks whether a skill will trigger or activate correctly
Section-purpose table and common structural violations. Load **reference/section-semantics.md** for the full rubric.
Criteria for description trigger clarity and `` consistency. Load **reference/trigger-correctness.md** for the full rubric.
| Level | Symbol | When to use |
|---|
| Blocker | 🚫 | The skill cannot be used in production as-is — it will fail to load, never activate, or produce incorrect output in all or most realistic scenarios |
| Major | 🔴 | A structural violation or coverage gap that causes the agent to behave incorrectly for at least one realistic scenario |
| Minor | 🟡 | A pattern deviation that reduces quality or maintainability but does not break the skill for the common case |
| Nit | 🟢 | Cosmetic or trivial issue — naming consistency, minor wording, style alignment |
| Inconsistency | ⚠️ | Two conflicting patterns that cannot be auto-resolved; present both variants and ask the user to decide |
Blocker vs. Major: Use 🚫 Blocker when the agent cannot complete the task at all in the normal case (e.g., no capability defined, frontmatter missing entirely, <when-to-use-this-skill> absent so the skill never self-confirms activation). Use 🔴 Major for violations that break the skill only in specific — but realistic — scenarios.
**Objective**: Evaluate a SKILL.md file for correct section structure, separation of concerns, and absence of duplication.
Note: Do not modify the skill file during review. Suggest changes with clear descriptions or patch-style snippets.
Steps:
- Read the full skill file to understand its domain and all sections.
a. Verify all expected top-level sections are present: frontmatter YAML,
<when-to-use-this-skill>, <knowledge>, and <capabilities>; flag any missing required section as 🔴 Major.
b. Verify sections appear in the correct order: frontmatter → <when-to-use-this-skill> → <knowledge> → <capabilities> → <rules> (if present); flag out-of-order sections as 🟡 Minor.
- Check description quality and trigger consistency — load reference/trigger-correctness.md first:
a. Verify the frontmatter
description follows the two-part template (domain summary + trigger phrase) — load reference/description-template.md; flag a missing trigger phrase as 🔴 Major.
b. Score the description using the five-dimension quality metric — load reference/description-scoring.md; report the score (x/10) and flag a score ≤5 as 🔴 Major; flag a score of 6–8 as 🟡 Minor.
c. Check that the intent verbs and key scenarios in the trigger phrase match the bullets in <when-to-use-this-skill> (bidirectional): flag any <when-to-use-this-skill> scenario whose keyword or intent verb is absent from description as 🟡 Minor (under-coverage); flag any trigger scenario in description absent from <when-to-use-this-skill> as 🟡 Minor (over-triggering or undocumented scope).
d. If <when-to-use-this-skill> is missing entirely, flag as 🔴 Major.
e. Flag any direct contradiction between the scope described in description and the bullets in <when-to-use-this-skill> as 🔴 Major.
- For each capability section, verify it describes how to do something as ordered steps — flag any that are fact lists, reference tables, or constraint bullets (those belong in
<knowledge>).
- For each rule, verify it answers "when scenario X → use capability Y" — flag any rule that re-states content already in a capability (duplication). If the skill has only one capability and no
<rules> section, do not flag its absence.
- Check that a
<knowledge> section exists and contains all reference material (tables, layouts, API signatures, platform constraints) that capabilities currently cite inline. Also check that large reference rubrics are not embedded directly in SKILL.md — they should be in reference/ files loaded on demand; flag inline rubrics as 🔴 Major.
- Check capability section names use action verbs; flag noun-named sections. Also verify that
<knowledge> subsection names use descriptive noun phrases — a subsection named with an action verb (e.g., <check-constraints>) signals that procedural content has leaked into <knowledge>.
- Check that on-demand context (examples, reference rubrics) is exposed via a
<context-loading-guide> entry inside <knowledge> (preferred) rather than a standalone <examples> section. If a bare <examples> section exists instead, flag it as 🟡 Minor. If a <context-loading-guide> exists but uses a description-first Scenario | Reference format instead of a condition-first Load when | Provides | File format, flag it as 🟡 Minor — the first column must state the decision condition, not describe the file's content. If the guide is written as a bullet list, flag it as 🟡 Minor. Either way, verify that all referenced content is linked by file path — not embedded inline — and flag inline content as 🔴 Major.
- Assess example coverage: cross-reference each named capability against the linked examples. Flag capabilities with no corresponding example as 🔴 Major; flag skills where examples cover only a subset of scenarios as 🟡 Minor. Load reference/example-coverage-criteria.md for the full rubric.
- Load and review each linked example file:
a. Verify the file has a clear scenario heading that names the trigger condition and the capability being demonstrated — flag missing or vague descriptions as 🟡 Minor.
b. Verify the example output structure matches what the capability's steps would produce — flag structural drift as 🔴 Major.
c. Check the scenario is realistic and non-trivial relative to the capability's complexity — flag toy/hello-world inputs for complex capabilities as 🟡 Minor.
d. Check the example does not contradict any rule or knowledge entry in the parent skill — flag contradictions as 🔴 Major.
e. Check that the example references the current capability name; flag stale names that no longer match the skill as 🟢 Nit.
Load reference/example-quality-criteria.md for the full rubric.
- Surface inconsistencies: mixed styles within a section type, two conflicting patterns, or differing levels of procedural detail across capabilities of the same kind. Present both variants with file/line references and ask the user which should be canonical — do not silently pick one.
- Include a Positive Highlights section that acknowledges at least one well-structured aspect of the skill.
- Include a Risks & Assumptions section that states any assumptions made about the intended skill format (e.g., four-section semantics) and notes that no runtime evaluation was performed.
- Format findings with severity levels (🚫 Blocker, 🔴 Major, 🟡 Minor, 🟢 Nit, ⚠️ Inconsistency) and load examples/skill-file-review.md for output structure guidance.
When the user submits a SKILL.md file for review or asks to improve or fix a skill file, use **review-skill-file**.
When the user asks whether a skill will trigger or activate correctly, or whether its description matches its scenarios, use **review-skill-file** and focus on step 2 (trigger correctness).