with one click
recipe-eval-prompt
// Compares original and optimized prompts by parallel execution in git worktrees. Use when evaluating prompt improvement effects or learning prompt engineering through concrete examples.
// Compares original and optimized prompts by parallel execution in git worktrees. Use when evaluating prompt improvement effects or learning prompt engineering through concrete examples.
Analyzes and optimizes prompts using BP-001~008 patterns and 3-step flow (detect, optimize, balance). Use when "optimize this prompt", "review prompt quality", "analyze prompt issues", or creating/reviewing rashomon skill content.
Project-specific prompt optimization knowledge management. Use when storing or retrieving learned patterns from comparisons. Provides schema, extraction criteria, capacity management, and retention scoring.
Git worktree management for isolated parallel prompt execution. Use when creating isolated environments for prompt comparison or managing worktree lifecycle. Provides creation, cleanup, and orphan detection scripts.
Creates or updates Claude Code skills through interactive dialog, then evaluates effectiveness by parallel execution comparison. Use when creating new skills, updating existing skills, or evaluating skill quality.
| name | recipe-eval-prompt |
| description | Compares original and optimized prompts by parallel execution in git worktrees. Use when evaluating prompt improvement effects or learning prompt engineering through concrete examples. |
| disable-model-invocation | true |
Purpose: Provide accurate feedback on prompt optimization effects, enabling users to learn effective prompting through concrete comparison results.
Core Identity: "I route information between specialized agents. I pass user input to analyzers. I present agent outputs to users."
Pass-through Principle: User requests flow directly to agents. Agent outputs flow directly to users. Both prompts execute under identical conditions.
Execution Protocol:
No user confirmation required between phases unless explicitly requested. Each phase must complete all required outputs before proceeding.
The user provides a natural language request. Pass it directly to prompt-analyzer.
Exception: If the request lacks any identifiable target (no file, function, or scope mentioned at all), ask ONE question to establish scope, then pass through.
Extended timeout: If the user mentions needing more time, use up to 1800 seconds (default: 300 seconds)
Task Registration: Register execution steps via TaskCreate and proceed systematically
Run worktree-execution skill.
Invoke: prompt-analyzer agent
Input:
Output:
Quality Gate:
Execute environment setup per worktree-execution skill "Creation" section.
Invoke: Two prompt-executor agents simultaneously (single message, parallel Task calls)
Subagent 1:
agent: prompt-executor
working_directory: {worktree_original_path}
prompt: {original_request}
Subagent 2:
agent: prompt-executor
working_directory: {worktree_optimized_path}
prompt: {optimized_request}
Each subagent executes the prompt as a development task within its isolated worktree.
CRITICAL: Both Task tool calls MUST be in the same message to achieve true parallel execution.
Execute worktree cleanup per worktree-execution skill "Cleanup" section.
Invoke: report-generator agent
Input:
Output:
Quality Gate:
Trigger: Report generation completes
Action: Ask user for feedback on comparison results, then delegate to knowledge-optimizer agent
Apply the execution quality criteria from the prompt-optimization skill.
| Classification | Definition | Interpretation |
|---|---|---|
| Structural | Prompt structure, clarity, specificity improvements | Prompt writing technique |
| Context Addition | Project-specific information added from codebase investigation | Information advantage |
| Expressive | Different phrasing, equivalent substance | Neutral |
| Variance | Within LLM probabilistic variance | Original prompt sufficient |
Key Principle: Distinguish between prompt writing improvements (Structural) and information additions (Context Addition).
Present report-generator's complete output to user. Optimized prompt must appear in full. This is the core learning value of the report.
The report includes (defined in report-generator):
| Scenario | Behavior |
|---|---|
| One subagent fails | Continue with successful result, report as "partial" |
| Both subagents fail | Report full failure with diagnostics |
| Timeout | Terminate, capture partial results, cleanup |
| Worktree creation fails | Report git error, suggest checking repository state |
/recipe-eval-prompt
Add error handling to generateResponse in geminiService.ts. Handle 429, timeout, and invalid responses.
/recipe-eval-prompt
Generate code following this skill: .claude/skills/my-skill/SKILL.md
For complex tasks:
/recipe-eval-prompt
Refactor the message pipeline for readability. This may take a while.