원클릭으로
woz-benchmark
// Compare WOZCODE vs vanilla Claude Code on the user's codebase — real cost, turn, and time savings. TRIGGER on "compare woz", "how much does woz save", "benchmark woz", "woz vs claude", "show me savings", or /woz-benchmark.
// Compare WOZCODE vs vanilla Claude Code on the user's codebase — real cost, turn, and time savings. TRIGGER on "compare woz", "how much does woz save", "benchmark woz", "woz vs claude", "show me savings", or /woz-benchmark.
| name | woz-benchmark |
| description | Compare WOZCODE vs vanilla Claude Code on the user's codebase — real cost, turn, and time savings. TRIGGER on "compare woz", "how much does woz save", "benchmark woz", "woz vs claude", "show me savings", or /woz-benchmark. |
| allowed-tools | Bash(node *), Bash(git *), Bash(ls *), Bash(test *), Bash(mkdir *), Bash(date *), Write, Read |
Run a side-by-side comparison of WOZCODE vs vanilla Claude Code on the user's own codebase. Each prompt runs twice against a fresh copy of the repo with git reset --hard between runs, so the target MUST be a clean git repo.
TRIGGER: "compare woz", "how much does woz save", "benchmark woz", "woz vs claude", "show me the savings", "is woz worth it", or /woz-benchmark.
Ask for all three in ONE short message (< 10 lines). Do not re-explain what the benchmark does — the user already invoked it.
.env)? Skip if the repo is self-contained."Do NOT ask about the model. Default to opus in the YAML config. Only switch to sonnet or haiku if the user volunteers a different choice in their answer.
Before writing any config, verify the target is usable:
test -d <target>
git -C <target> rev-parse --git-dir
git -C <target> status --porcelain
If the directory doesn't exist, isn't a git repo, or has uncommitted changes, STOP and tell the user how to fix it.
Use the Write tool to create a YAML file at /tmp/woz-benchmark-<timestamp>.yaml (get the timestamp from date +%s). Format:
model: opus
maxTurns: 15
prompts:
- "first prompt from the user"
- "second prompt from the user"
setup:
commands:
- "curl -L https://example.com/dataset.csv -o data/sample.csv"
- "psql $DATABASE_URL -f seed.sql"
Omit the entire setup: block if the user didn't give any environment setup commands.
One-line warning: "This'll take several minutes — each prompt runs twice." Then run:
node "${CODEX_HOME:-$HOME/.codex}/plugins/wozcode/scripts/benchmark.js" --target <target> --config <yaml-path> --user-env
--user-env loads the user's project CLAUDE.md hierarchy on BOTH sides. Do NOT pass --screenshots, --codex, --judge, or --trace.
The benchmark prints a detailed text report at the end. Relay the full report to the user, then add a clear, sales-oriented savings summary at the top. Compute the deltas from the report's totals.
Report a WOZCODE bug. Same backend as /woz-feedback, tagged for bug triage. Session context (current session id, anonymous id, OS, arch, Node version) is auto-attached.
Share feedback about WOZCODE — feature requests, general thoughts, anything that's working or not. For broken-behavior reports use /woz-bug (same backend, bug-tagged).
Authenticate with the Woz service. Use when the user needs to log in or when authentication is required.
Clear stored Woz credentials and log out.
Show WOZCODE savings report - calls saved, time saved, tokens saved, and lifetime totals.
Show current Woz authentication status.