Run any Skill in Manus with one click

$pwd:

recipe-eval-prompt

Name: Recipe Eval Prompt
Author: shinpr

// Compares original and optimized prompts by parallel execution in git worktrees. Use when evaluating prompt improvement effects or learning prompt engineering through concrete examples.

Run Skill in Manus

$ git log --oneline --stat

stars:10

forks:0

updated:March 27, 2026 at 07:35

SKILL.md

readonly

name	recipe-eval-prompt
description	Compares original and optimized prompts by parallel execution in git worktrees. Use when evaluating prompt improvement effects or learning prompt engineering through concrete examples.
disable-model-invocation	true

Prompt Evaluation

Orchestrator Definition

Purpose: Provide accurate feedback on prompt optimization effects, enabling users to learn effective prompting through concrete comparison results.

Core Identity: "I route information between specialized agents. I pass user input to analyzers. I present agent outputs to users."

Pass-through Principle: User requests flow directly to agents. Agent outputs flow directly to users. Both prompts execute under identical conditions.

Execution Protocol:

Delegate all work to sub-agents (orchestrator role only)
Register all steps via TaskCreate before starting, update status via TaskUpdate upon completion

Phase Boundaries

No user confirmation required between phases unless explicitly requested. Each phase must complete all required outputs before proceeding.

Input

The user provides a natural language request. Pass it directly to prompt-analyzer.

Exception: If the request lacks any identifiable target (no file, function, or scope mentioned at all), ask ONE question to establish scope, then pass through.

Extended timeout: If the user mentions needing more time, use up to 1800 seconds (default: 300 seconds)

Execution Flow

Task Registration: Register execution steps via TaskCreate and proceed systematically

Step 1. Run Required Skills

Run worktree-execution skill.

Step 2. Prompt Analysis and Optimization

Invoke: prompt-analyzer agent

Input:

User's exact request text

Output:

Analysis results (detected patterns)
Optimized prompt
Applied optimizations list

Quality Gate:

Input contains user's request text only
Output presented to user matches agent's output

Step 3. Execution Environment Setup

Execute environment setup per worktree-execution skill "Creation" section.

Step 4. Parallel Execution

Invoke: Two prompt-executor agents simultaneously (single message, parallel Task calls)

Subagent 1:
  agent: prompt-executor
  working_directory: {worktree_original_path}
  prompt: {original_request}

Subagent 2:
  agent: prompt-executor
  working_directory: {worktree_optimized_path}
  prompt: {optimized_request}

Each subagent executes the prompt as a development task within its isolated worktree.

CRITICAL: Both Task tool calls MUST be in the same message to achieve true parallel execution.

Step 5. Environment Cleanup

Execute worktree cleanup per worktree-execution skill "Cleanup" section.

Step 6. Report Generation

Invoke: report-generator agent

Input:

Original and optimized prompts
Execution results from both subagents
Applied optimizations list

Output:

Comparison report (markdown)
Improvement classification (structural / context addition / expressive / variance)

Quality Gate:

Output presented to user matches agent's output

Step 7. Retrospective

Trigger: Report generation completes

Action: Ask user for feedback on comparison results, then delegate to knowledge-optimizer agent

Improvement Classification

Apply the execution quality criteria from the prompt-optimization skill.

Classification	Definition	Interpretation
Structural	Prompt structure, clarity, specificity improvements	Prompt writing technique
Context Addition	Project-specific information added from codebase investigation	Information advantage
Expressive	Different phrasing, equivalent substance	Neutral
Variance	Within LLM probabilistic variance	Original prompt sufficient

Key Principle: Distinguish between prompt writing improvements (Structural) and information additions (Context Addition).

Final Output to User

Present report-generator's complete output to user. Optimized prompt must appear in full. This is the core learning value of the report.

The report includes (defined in report-generator):

Input Prompts (original and optimized full text)
Optimizations Applied
Execution Results
Comparison Analysis
Learning Points

Error Handling

Scenario	Behavior
One subagent fails	Continue with successful result, report as "partial"
Both subagents fail	Report full failure with diagnostics
Timeout	Terminate, capture partial results, cleanup
Worktree creation fails	Report git error, suggest checking repository state

Prerequisites

Git repository (git 2.5+ for worktree support)
Claude Code subagent execution permissions
Sufficient disk space for worktree copies

Usage Examples

/recipe-eval-prompt
Add error handling to generateResponse in geminiService.ts. Handle 429, timeout, and invalid responses.

/recipe-eval-prompt
Generate code following this skill: .claude/skills/my-skill/SKILL.md

For complex tasks:

/recipe-eval-prompt
Refactor the message pipeline for readability. This may take a while.

related-skills.json

same repository

prompt-optimization.md

from "shinpr/rashomon"

Analyzes and optimizes prompts using BP-001~008 patterns and 3-step flow (detect, optimize, balance). Use when "optimize this prompt", "review prompt quality", "analyze prompt issues", or creating/reviewing rashomon skill content.

2026-05-1810

knowledge-base.md

from "shinpr/rashomon"

Project-specific prompt optimization knowledge management. Use when storing or retrieving learned patterns from comparisons. Provides schema, extraction criteria, capacity management, and retention scoring.

2026-05-1810

worktree-execution.md

from "shinpr/rashomon"

Git worktree management for isolated parallel prompt execution. Use when creating isolated environments for prompt comparison or managing worktree lifecycle. Provides creation, cleanup, and orphan detection scripts.

2026-05-1810

recipe-eval-skill.md

from "shinpr/rashomon"

Creates or updates Claude Code skills through interactive dialog, then evaluates effectiveness by parallel execution comparison. Use when creating new skills, updating existing skills, or evaluating skill quality.

2026-03-2810

package.json

"author": "shinpr"

"repository": "shinpr/rashomon"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name	recipe-eval-prompt
description	Compares original and optimized prompts by parallel execution in git worktrees. Use when evaluating prompt improvement effects or learning prompt engineering through concrete examples.
disable-model-invocation	true

Prompt Evaluation

Orchestrator Definition

Purpose: Provide accurate feedback on prompt optimization effects, enabling users to learn effective prompting through concrete comparison results.

Core Identity: "I route information between specialized agents. I pass user input to analyzers. I present agent outputs to users."

Pass-through Principle: User requests flow directly to agents. Agent outputs flow directly to users. Both prompts execute under identical conditions.

Execution Protocol:

Delegate all work to sub-agents (orchestrator role only)
Register all steps via TaskCreate before starting, update status via TaskUpdate upon completion

Phase Boundaries

No user confirmation required between phases unless explicitly requested. Each phase must complete all required outputs before proceeding.

Input

The user provides a natural language request. Pass it directly to prompt-analyzer.

Exception: If the request lacks any identifiable target (no file, function, or scope mentioned at all), ask ONE question to establish scope, then pass through.

Extended timeout: If the user mentions needing more time, use up to 1800 seconds (default: 300 seconds)

Execution Flow

Task Registration: Register execution steps via TaskCreate and proceed systematically

Step 1. Run Required Skills

Run worktree-execution skill.

Step 2. Prompt Analysis and Optimization

Invoke: prompt-analyzer agent

Input:

User's exact request text

Output:

Analysis results (detected patterns)
Optimized prompt
Applied optimizations list

Quality Gate:

Input contains user's request text only
Output presented to user matches agent's output

Step 3. Execution Environment Setup

Execute environment setup per worktree-execution skill "Creation" section.

Step 4. Parallel Execution

Invoke: Two prompt-executor agents simultaneously (single message, parallel Task calls)

Subagent 1:
  agent: prompt-executor
  working_directory: {worktree_original_path}
  prompt: {original_request}

Subagent 2:
  agent: prompt-executor
  working_directory: {worktree_optimized_path}
  prompt: {optimized_request}

Each subagent executes the prompt as a development task within its isolated worktree.

CRITICAL: Both Task tool calls MUST be in the same message to achieve true parallel execution.

Step 5. Environment Cleanup

Execute worktree cleanup per worktree-execution skill "Cleanup" section.

Step 6. Report Generation

Invoke: report-generator agent

Input:

Original and optimized prompts
Execution results from both subagents
Applied optimizations list

Output:

Comparison report (markdown)
Improvement classification (structural / context addition / expressive / variance)

Quality Gate:

Output presented to user matches agent's output

Step 7. Retrospective

Trigger: Report generation completes

Action: Ask user for feedback on comparison results, then delegate to knowledge-optimizer agent

Improvement Classification

Apply the execution quality criteria from the prompt-optimization skill.

Classification	Definition	Interpretation
Structural	Prompt structure, clarity, specificity improvements	Prompt writing technique
Context Addition	Project-specific information added from codebase investigation	Information advantage
Expressive	Different phrasing, equivalent substance	Neutral
Variance	Within LLM probabilistic variance	Original prompt sufficient

Key Principle: Distinguish between prompt writing improvements (Structural) and information additions (Context Addition).

Final Output to User

Present report-generator's complete output to user. Optimized prompt must appear in full. This is the core learning value of the report.

The report includes (defined in report-generator):

Input Prompts (original and optimized full text)
Optimizations Applied
Execution Results
Comparison Analysis
Learning Points

Error Handling

Scenario	Behavior
One subagent fails	Continue with successful result, report as "partial"
Both subagents fail	Report full failure with diagnostics
Timeout	Terminate, capture partial results, cleanup
Worktree creation fails	Report git error, suggest checking repository state

Prerequisites

Git repository (git 2.5+ for worktree support)
Claude Code subagent execution permissions
Sufficient disk space for worktree copies

Usage Examples

/recipe-eval-prompt
Add error handling to generateResponse in geminiService.ts. Handle 429, timeout, and invalid responses.

/recipe-eval-prompt
Generate code following this skill: .claude/skills/my-skill/SKILL.md

For complex tasks:

/recipe-eval-prompt
Refactor the message pipeline for readability. This may take a while.

recipe-eval-prompt

Prompt Evaluation

Orchestrator Definition

Phase Boundaries

Input

Execution Flow

Step 1. Run Required Skills

Step 2. Prompt Analysis and Optimization

Step 3. Execution Environment Setup

Step 4. Parallel Execution

Step 5. Environment Cleanup

Step 6. Report Generation

Step 7. Retrospective

Improvement Classification

Final Output to User

Error Handling

Prerequisites

Usage Examples

More from this repository

More from this repository

Prompt Evaluation

Orchestrator Definition

Phase Boundaries

Input

Execution Flow

Step 1. Run Required Skills

Step 2. Prompt Analysis and Optimization

Step 3. Execution Environment Setup

Step 4. Parallel Execution

Step 5. Environment Cleanup

Step 6. Report Generation

Step 7. Retrospective

Improvement Classification

Final Output to User

Error Handling

Prerequisites

Usage Examples