name	autoresearch
description	Autonomous codebase improvement loop inspired by Karpathy's autoresearch. USE WHEN user wants to iteratively improve a codebase, run autonomous code improvement, or apply the autoresearch pattern. Individual commands use /autoresearch directly.

Autoresearch

Autonomous codebase improvement loop that converges on measurable improvements through iterative improve-evaluate-iterate cycles.

Quick Start

/autoresearch                              # Interactive discovery mode
/autoresearch src/ --profile quality       # Quality-focused on src/
/autoresearch --profile coverage           # Maximize test coverage
/autoresearch --profile performance        # Optimize performance
/autoresearch --resume                     # Resume a previous run
/autoresearch --dry-run                    # Preview what would be evaluated

How It Works

Autoresearch runs a tight loop inspired by Karpathy's autoresearch pattern:

┌─ DISCOVER ──────────────────────────────┐
│ Analyze codebase → propose constraints  │
│ → interview user via AskUserQuestion    │
│ → lock evaluation commands              │
└─────────────────────────────────────────┘
         ↓
┌─ BASELINE ──────────────────────────────┐
│ Create git branch → run all evaluators  │
│ → capture baseline scores              │
└─────────────────────────────────────────┘
         ↓
┌─ LOOP (until convergence) ──────────────┐
│ Improve → Evaluate → Decide → Track    │
│                                         │
│ Keep if score improves (git commit)     │
│ Revert if score regresses (git reset)   │
│ Stop on diminishing returns             │
└─────────────────────────────────────────┘
         ↓
┌─ REPORT ────────────────────────────────┐
│ Full LLM evaluation → learning report   │
│ → improvement table → convergence data  │
└─────────────────────────────────────────┘

Arguments

Argument	Description	Default
`[scope]`	File or directory path(s) to improve	auto-discover
`--profile <name>`	Preset: `quality`, `performance`, `coverage`	interactive
`--max-iterations <n>`	Override max iterations	20
`--time-box <seconds>`	Override per-iteration time box	120
`--resume`	Resume from `.autoresearch/state.json`	—
`--dry-run`	Discovery only, no loop	—

Preset Profiles

Profile	Focus	Evaluators	Time Box
`quality`	Code quality, type safety, naming	lint 25%, types 20%, tests 25%, LLM 30%	120s
`performance`	Bundle size, algorithms, hot paths	lint 15%, tests 20%, benchmark 35%, LLM 30%	180s
`coverage`	Test coverage, edge cases	coverage 35%, tests 25%, lint 10%, LLM 30%	150s

Evaluation Axes

Static Analysis — Lint warnings, type errors, complexity scores
Test Suite — Pass rate, coverage percentage
LLM Rubric — Readability, architecture, maintainability, idiomaticness (full or lite probe)
Custom Commands — User-defined evaluation scripts

Each axis is grounded in ISO 25010 quality characteristics with documented weight rationale and pre-computed orthogonality analysis.

Production-Ready Features

Pre-flight permissions — All Bash, Write, and git permissions requested upfront. Loop runs uninterrupted.
Phase-adaptive scoring — Arithmetic mean early (broad improvement), harmonic mean late (enforce balance)
Adaptive LLM scheduling — Full eval when volatile, lite 1-dimension probe when stable. 60-75% token savings.
Fallback evaluators — If a permission is denied, the axis auto-substitutes with an LLM-based fallback.
Token economics — Per-phase breakdown, cost estimation, tokens-per-improvement-point efficiency ratio
Confidence intervals — LLM scores reported with 95% CI from rubric dimension variance
Trajectory prediction — Diminishing returns curve fit, predicted quality ceiling, optimal stop point

Safety Guarantees

Git branch isolation (never touches main)
Command sandboxing (SHA-256 hash verification)
Scope enforcement (writes only within scope)
Circuit breaker (stops on >10% regression)
Non-destructive git (never force-push or delete)
Permission scope minimization (least-privilege manifest)
No mid-loop permission escalation

Output

.autoresearch/state.json — Loop state for resume (includes token breakdown, volatility, eval decisions)
.autoresearch/report.md — Full report with token dashboard, confidence intervals, trajectory analysis, learning summary
Git branch autoresearch/<timestamp>-<scope> with per-iteration commits

Reference Implementation

The TypeScript modules in src/ provide structured reference implementations:

Module	Purpose
`src/types.ts`	Type definitions and defaults
`src/loop.ts`	Core loop state machine
`src/discovery.ts`	Codebase introspection + constraint pipeline
`src/report.ts`	Summary report generation
`src/permissions.ts`	Permission manifest + pre-flight verification
`src/scoring.ts`	Phase-adaptive composite scoring (arithmetic/harmonic/geometric)
`src/analytics.ts`	Token dashboard, confidence intervals, trajectory prediction
`src/scheduling.ts`	Adaptive LLM eval scheduling + volatility detection
`src/evaluators/`	Static, test, LLM, custom, and fallback evaluators

name	autoresearch
description	Autonomous codebase improvement loop inspired by Karpathy's autoresearch. USE WHEN user wants to iteratively improve a codebase, run autonomous code improvement, or apply the autoresearch pattern. Individual commands use /autoresearch directly.

Autoresearch

Autonomous codebase improvement loop that converges on measurable improvements through iterative improve-evaluate-iterate cycles.

Quick Start

/autoresearch                              # Interactive discovery mode
/autoresearch src/ --profile quality       # Quality-focused on src/
/autoresearch --profile coverage           # Maximize test coverage
/autoresearch --profile performance        # Optimize performance
/autoresearch --resume                     # Resume a previous run
/autoresearch --dry-run                    # Preview what would be evaluated

How It Works

Autoresearch runs a tight loop inspired by Karpathy's autoresearch pattern:

┌─ DISCOVER ──────────────────────────────┐
│ Analyze codebase → propose constraints  │
│ → interview user via AskUserQuestion    │
│ → lock evaluation commands              │
└─────────────────────────────────────────┘
         ↓
┌─ BASELINE ──────────────────────────────┐
│ Create git branch → run all evaluators  │
│ → capture baseline scores              │
└─────────────────────────────────────────┘
         ↓
┌─ LOOP (until convergence) ──────────────┐
│ Improve → Evaluate → Decide → Track    │
│                                         │
│ Keep if score improves (git commit)     │
│ Revert if score regresses (git reset)   │
│ Stop on diminishing returns             │
└─────────────────────────────────────────┘
         ↓
┌─ REPORT ────────────────────────────────┐
│ Full LLM evaluation → learning report   │
│ → improvement table → convergence data  │
└─────────────────────────────────────────┘

Arguments

Argument	Description	Default
`[scope]`	File or directory path(s) to improve	auto-discover
`--profile <name>`	Preset: `quality`, `performance`, `coverage`	interactive
`--max-iterations <n>`	Override max iterations	20
`--time-box <seconds>`	Override per-iteration time box	120
`--resume`	Resume from `.autoresearch/state.json`	—
`--dry-run`	Discovery only, no loop	—

Preset Profiles

Profile	Focus	Evaluators	Time Box
`quality`	Code quality, type safety, naming	lint 25%, types 20%, tests 25%, LLM 30%	120s
`performance`	Bundle size, algorithms, hot paths	lint 15%, tests 20%, benchmark 35%, LLM 30%	180s
`coverage`	Test coverage, edge cases	coverage 35%, tests 25%, lint 10%, LLM 30%	150s

Evaluation Axes

Static Analysis — Lint warnings, type errors, complexity scores
Test Suite — Pass rate, coverage percentage
LLM Rubric — Readability, architecture, maintainability, idiomaticness (full or lite probe)
Custom Commands — User-defined evaluation scripts

Each axis is grounded in ISO 25010 quality characteristics with documented weight rationale and pre-computed orthogonality analysis.

Production-Ready Features

Pre-flight permissions — All Bash, Write, and git permissions requested upfront. Loop runs uninterrupted.
Phase-adaptive scoring — Arithmetic mean early (broad improvement), harmonic mean late (enforce balance)
Adaptive LLM scheduling — Full eval when volatile, lite 1-dimension probe when stable. 60-75% token savings.
Fallback evaluators — If a permission is denied, the axis auto-substitutes with an LLM-based fallback.
Token economics — Per-phase breakdown, cost estimation, tokens-per-improvement-point efficiency ratio
Confidence intervals — LLM scores reported with 95% CI from rubric dimension variance
Trajectory prediction — Diminishing returns curve fit, predicted quality ceiling, optimal stop point

Safety Guarantees

Git branch isolation (never touches main)
Command sandboxing (SHA-256 hash verification)
Scope enforcement (writes only within scope)
Circuit breaker (stops on >10% regression)
Non-destructive git (never force-push or delete)
Permission scope minimization (least-privilege manifest)
No mid-loop permission escalation

Output

.autoresearch/state.json — Loop state for resume (includes token breakdown, volatility, eval decisions)
.autoresearch/report.md — Full report with token dashboard, confidence intervals, trajectory analysis, learning summary
Git branch autoresearch/<timestamp>-<scope> with per-iteration commits

Reference Implementation

The TypeScript modules in src/ provide structured reference implementations:

Module	Purpose
`src/types.ts`	Type definitions and defaults
`src/loop.ts`	Core loop state machine
`src/discovery.ts`	Codebase introspection + constraint pipeline
`src/report.ts`	Summary report generation
`src/permissions.ts`	Permission manifest + pre-flight verification
`src/scoring.ts`	Phase-adaptive composite scoring (arithmetic/harmonic/geometric)
`src/analytics.ts`	Token dashboard, confidence intervals, trajectory prediction
`src/scheduling.ts`	Adaptive LLM eval scheduling + volatility detection
`src/evaluators/`	Static, test, LLM, custom, and fallback evaluators

autoresearch

Autoresearch

Quick Start

How It Works

Arguments

Preset Profiles

Evaluation Axes

Production-Ready Features

Safety Guarantees

Output

Reference Implementation

More from this repository

More from this repository

Autoresearch

Quick Start

How It Works

Arguments

Preset Profiles

Evaluation Axes

Production-Ready Features

Safety Guarantees

Output

Reference Implementation