ワンクリックで
ワンクリックで
Compare two eval runs and report what changed. Reads both runs' events, transcripts, and produced artifacts. Writes a short markdown summary classifying differences as regression, improvement, or neutral.
Generalized recursive iteration loop. Runs parallel sub-agents against a target, scores deterministically, diagnoses instruction gaps, applies fixes, and recurses until the stop condition is met or max depth is reached.
Run an evaluation against an eval with a specific config
Show evaluation results and comparisons
Test-driven development of a Hono/Bun WebSocket application. Read requirements, read tests, build server, verify, iterate until all tests pass.
Scaffold a new evaluation
| name | handoff |
| description | Write a handoff document so a new chat can continue the work |
| argument-hint | <name> |
Write a focused handoff document that a fresh Claude session can execute without needing this conversation's context.
$1 — short name for the handoff file (e.g. token-efficiency-runs). If omitted, derive from the current task.Summarize what was done in this session: key decisions, files created/modified, what's working.
Define what's left for the next chat: specific tasks, in what order, with exact commands where possible.
State constraints: rules the next chat must follow (e.g. "one agent at a time", "don't modify configs", "run in background").
Write the handoff to docs/handoff/<name>.md:
docs/handoff/<name>.md
# <Title>
## What Was Done
<Brief summary of completed work, key files, decisions made>
## Task
<What the next chat should do, step by step>
## Commands
<Exact commands to run, in order>
## Constraints
<Rules and boundaries>
read docs/handoff/<name>.md and execute it"