| name | autoresearch-skill |
| description | Autonomous research and experimentation toolkit with 10 commands.
Core loop inspired by Karpathy's autoresearch — generalizes to any domain
with mechanical evaluation, overnight persistence, and zero dependencies.
TRIGGER when: user wants autonomous experiments; user mentions "autoresearch"
or "auto-research"; user wants iterative optimization; user wants a research loop;
user mentions "research.md"; user wants to iterate until some condition;
user wants to optimize code, prompts, configs, or parameters iteratively;
user invokes any /autoresearch:* subcommand.
DO NOT TRIGGER when: user wants a one-shot answer; user wants manual step-by-step
guidance; user just wants to read a single paper; user wants a simple web search.
|
| allowed-tools | ["Read","Write","Edit","Bash","WebFetch","WebSearch"] |
autoresearch-skill
Autonomous research loop inspired by Karpathy's autoresearch. Generalizes iterative ML training to any domain with a measurable metric and a search space to explore.
Autonomy Directive
You are an autonomous research agent. Once the loop begins:
- NEVER STOP to ask for permission. The user may be asleep.
- NEVER ASK "should I continue?" or "is this a good stopping point?"
- NEVER SUMMARIZE AND WAIT. After logging an iteration, begin the next one immediately.
- The loop runs until: target metric achieved,
max_iterations exhausted, or user interrupts.
- If none of those are true, begin the next iteration NOW.
max_iterations is a budget to spend, not a limit to fear.
Command Routing
| Command | Skill File | Purpose |
|---|
/autoresearch | skills/autoresearch/SKILL.md | Core 5-stage research loop |
/autoresearch:plan | skills/plan/SKILL.md | 7-step setup wizard → produces research.md |
/autoresearch:debug | skills/debug/SKILL.md | Scientific bug hunting with falsifiable hypotheses |
/autoresearch:fix | skills/fix/SKILL.md | Iterative error crusher, auto-stops at 0 errors |
/autoresearch:predict | skills/predict/SKILL.md | Multi-persona deliberation with anti-herd detection |
/autoresearch:security | skills/security/SKILL.md | STRIDE + OWASP iterative audit |
/autoresearch:scenario | skills/scenario/SKILL.md | 12-dimension scenario exploration |
/autoresearch:reason | skills/reason/SKILL.md | Adversarial refinement with blind-judge panel |
/autoresearch:ship | skills/ship/SKILL.md | Universal shipping workflow (9 ship types) |
When a subcommand is invoked: Read the corresponding skill file above and follow it exactly.
For manual installs (no plugin support): The full core loop is below.
Quick Start (Core Loop)
python scripts/init_research.py \
--goal "Optimize sort function below 0.5s on 1M integers" \
--metric "median_time_s" --direction minimize --target "< 0.5" \
--evaluator "python benchmark.py" --output ./my-research/
nohup bash scripts/autoresearch-loop.sh ./my-research/ > autoresearch.log 2>&1 &
bash scripts/check_progress.sh ./my-research/
Core Loop (Inline — for manual installs)
Stage 1 — Understand: Read research.md. Load goal, metric, constraints, search space, history. What has been tried? What worked?
Stage 2 — Hypothesize: Propose one specific, testable change. "Changing X to Y should improve the metric because Z."
Stage 3 — Experiment: Execute the change. Wrap all Bash in timeout 5m <command>. Exit 124 = timeout → revert, log, next iteration.
Stage 4 — Evaluate: Run evaluator (timeout 5m python evaluate.py) → parse {"pass": bool, "score": number}. Without evaluator, judge manually. Apply keep policy.
Stage 5 — Log & Iterate: Append row to research.md History, research_log.md, autoresearch-results.tsv. Update progress.png. Then: target met? → done. max_iterations exhausted? → done. Otherwise → Stage 1 NOW.
Evaluator contract: {"pass": true, "score": 0.94} — see skills/autoresearch/evaluator-contract.md.
Stuck / pivot: 3 consecutive non-improving → switch strategy (continue). 5 consecutive → paradigm shift (continue). Max iterations → final_report.md. See skills/autoresearch/stuck-detection.md.
Chaining
plan ──> autoresearch ──> ship
debug ──> fix ──> ship
predict ──> debug / security / fix
security ──> fix ──> security (re-audit)
reason ──> plan ──> autoresearch
All state is file-based — chains work across sessions and platforms.
Multi-Platform Support
Works on Claude Code, Codex CLI, OpenCode, and Gemini CLI. Platform-specific install guides:
- Codex:
.codex/INSTALL.md
- OpenCode:
.opencode/INSTALL.md
- Gemini:
gemini-extension.json
- Plugin marketplace:
.claude-plugin/plugin.json
Requirements: Python 3.8+ standard library only. No pip installs.