| name | autoresearch |
| description | Autonomous optimization loop inspired by Karpathy's autoresearch pattern. Iteratively improve any measurable codebase metric (coverage, latency, accuracy, test pass rate) by observing misses, hypothesizing fixes, dispatching edits to Cursor/Codex, evaluating, and ratcheting gains via git. Use when: "dev:autoresearch", "autoresearch", "optimize X metric", "iterate until X improves", "hill-climb", "run the loop", "autoresearch on [target]", or when the user wants autonomous iterative improvement of a scalar metric. Part of the orc system: dev:orchestrate, dev:backlog, dev:autoresearch, dev:status, dev:recap, dev:scope, dev:handoff.
|
Orc: Autoresearch
Autonomous hill-climbing for any codebase metric. Opus orchestrates; Cursor/Codex execute; git ratchets gains.
Commands
/autoresearch [description] — Start new session (discovery + loop)
/autoresearch:status — Read var/autoresearch/<slug>/session.json, display iteration, best metric, last hypothesis
/autoresearch:resume — Read session.json, restore state, continue from last iteration. See references/loop-integration.md
Phase 1: Discover (interactive)
Use AskUserQuestion (max 3 questions). Skip when args make the answer obvious.
- Metric: Propose a scalar metric + evaluator command. Present as choice.
- Mutable scope: Which files the loop may edit (propose 1-3). Can include "new files in X dir" for catalog enrichment.
- Guard command: Detect from justfile/package.json/Makefile.
Phase 2: Init (automated)
Run bash scripts/init-session.sh <slug> <branch-name>. The script:
- Creates
autoresearch/<slug> branch
- Creates
var/autoresearch/<slug>/ directory
- Writes
allowed_files.txt
Then:
- Write
program.md from references/program-template.md
- Write
evaluate.sh (must output JSON: {"metric": N, "miss_report": {...}})
- Run evaluator twice — if results differ > 0.1%, warn user (non-deterministic evaluator)
- Save
baseline.json, commit: autoresearch(<slug>): init baseline=<metric>
Phase 3: Loop (autonomous via /loop)
Integrates with /loop for self-pacing. See references/loop-integration.md for session.json schema and resume protocol.
Each /loop tick = one OHEE iteration:
Observe — Read last iteration-<N>.json + git log --oneline -5 + hypotheses.jsonl. Do NOT re-read full mutable files.
Hypothesize — Pick highest-leverage fix from miss report. Check hypotheses.jsonl to avoid repeats. Log hypothesis.
Experiment — Route by size: trivial (<20 LOC) → Claude direct, focused (<100 LOC) → Cursor, complex → Codex. After executor completes, run bash scripts/check-allowed-files.sh <slug> to verify scope.
Evaluate — Run bash scripts/ratchet.sh <slug>. The script runs guard, evaluator, compares, and either commits or resets. Returns JSON with keep/revert decision + new metric.
Convergence (any triggers exit)
- Plateau: Rolling 5-iteration window with net gain < threshold (default 1pp)
- Max iterations: 20 (configurable in program.md)
- Target reached: Metric meets goal
- Exhaustion: Proposed hypothesis already in hypotheses.jsonl
Phase 4: Report
Write var/autoresearch/<slug>/REPORT.md. Display summary: baseline → final, top gains, plateau point.
Offer: merge autoresearch/<slug> branch (squash) or keep as-is.
Hard Rules
- Never edit evaluator or guard during the loop
- Never edit files outside mutable scope (enforced by
check-allowed-files.sh)
- One focused change per iteration
- All work on
autoresearch/<slug> branch — never commit directly to main
- Git commit successes, git reset failures
- Read
hypotheses.jsonl before proposing — no repeat experiments
References
references/architecture.md — Three-file pattern, orchestrator/executor split, miss report design
references/program-template.md — Template for program.md
references/loop-integration.md — /loop self-pacing, session.json schema, resume protocol