| name | review-prompt |
| description | Review LLM prompts against the prompt-engineering skill's gap-calibration principles, reporting issues without modifying files. Use when reviewing prompt quality, auditing a prompt, evaluating a system prompt, or checking whether prompt issues are high-confidence and fixable. |
| user-invocable | true |
Review LLM prompts. Report findings without modifying files — this skill diagnoses; it never edits the prompt.
First: invoke the prompt-engineering skill to load the principles. Evaluate the prompt against them.
Input: if no prompt is given (file path or inline text), ask before analyzing — don't assume.
Report format:
Assessment: {Excellent | Good with Minor Issues | Needs Work}
Strengths — what's working, what the author should preserve.
Issues:
| Issue | Severity | Fix | Tag |
|---|
| {description} | High / Medium / Low | {concrete recommendation} | NEEDS_USER_INPUT or AUTO_FIXABLE |
Priority: the highest-impact change first.
Severity:
- High — the prompt actively misbehaves or breaks a contract. Examples: contradiction between two rules that can't both hold; missing the goal entirely; absolute used on a judgment call that observably misfires; the agent declares a need for a tool it doesn't have, or omits a tool it actually uses.
- Medium — the prompt works but drifts toward known failure modes. Examples: vague directive that produces inconsistent behavior across runs; restated model default adding noise the model has to wade through; missing a gap-closer that the discipline says should be there; arbitrary numbers without a rubric; boundary failures — naming a harness-bound primitive, a rule-scope qualifier that silently excludes valid cases, mechanism stated as the only path, or one principle split across multiple places.
- Low — minor friction with no functional impact. Examples: duplication that doesn't change behavior; awkward phrasing where the meaning is still unambiguous; stylistic-only cleanup.
Tag (parsed as control flow by /auto-optimize-prompt — do not rename or repurpose):
NEEDS_USER_INPUT — only the author can resolve: missing context, unclear intent, ambiguity the model can't infer.
AUTO_FIXABLE — a clear fix exists per the loaded principles.
Only flag high-confidence issues. Low-confidence findings are noise — skip style preferences, minor wording, and unverified hunches.