mit einem Klick
autoresearch
Autonomous improvement loop: scan codebase metrics, scaffold experiment files, run agent-driven iterations until metric improves
Menü
Autonomous improvement loop: scan codebase metrics, scaffold experiment files, run agent-driven iterations until metric improves
Audit Claude Code agents defined in .claude/agents/ for description specificity, model tier appropriateness, tools scoping, and system prompt quality. Detects dispatch ambiguity between agents, flags over-permissive tool grants, and checks for human-in-the-loop patterns that break programmatic orchestration. Use when onboarding to a project with existing agents, after adding new agents to a fleet, or when an orchestrator consistently selects the wrong agent.
Audit Claude Code hooks defined in settings.json files for validity, performance safety, and correctness. Resolves each command against the filesystem, checks exit-code strategy for blocking hooks, flags missing timeouts, and reviews interactive vs async patterns. Use when setting up hooks for the first time, debugging a hook that never fires or hangs the agent, or doing a periodic hooks hygiene pass.
Post-deploy monitoring: watch production after a deploy and alert on regressions
Restore context after /clear by summarizing recent work and project state
Launch and navigate the ccboard TUI/Web dashboard for Claude Code. Use when monitoring token usage, tracking costs, browsing sessions, or checking MCP server status across projects.
Audit Claude Code setup for cache bugs (CC#40524): sentinel, --resume/--continue, attribution header + ArkNill B3/B4/B5
Scan codebase quality metrics, propose improvement loops, and run autonomous agent iterations. Inspired by karpathy/autoresearch, adapted from ML research to code quality.
Concept: The agent proposes a code change, runs the measurement, keeps the change if the metric improved, reverts via git reset if not, and repeats until manually stopped.
Time: Scan ~30s | Per iteration: depends on scope | Loop: runs indefinitely until you stop it
Measure current state, detect existing loops, propose next actions.
Run the following metrics and display a prioritized proposal table.
Step 1: Measure codebase metrics
Adapt grep patterns to your project's conventions. These are TypeScript defaults, adjust for your stack.
# M1: Function declarations (prefer arrow functions)
M1=$(grep -r "export function " src/ --include="*.ts" --include="*.tsx" -l 2>/dev/null | wc -l | tr -d ' ')
# M2: Interface declarations (prefer type aliases)
M2=$(grep -r "export interface " src/ --include="*.ts" --include="*.tsx" -l 2>/dev/null | wc -l | tr -d ' ')
# M3: ESLint disables
M3=$(grep -r "eslint-disable" src/ --include="*.ts" --include="*.tsx" -l 2>/dev/null | wc -l | tr -d ' ')
# M4: Type casts to any
M4=$(grep -r " as any" src/ --include="*.ts" --include="*.tsx" -l 2>/dev/null | wc -l | tr -d ' ')
# M5: TODO comments
M5=$(grep -r "// TODO" src/ --include="*.ts" --include="*.tsx" -l 2>/dev/null | wc -l | tr -d ' ')
Step 2: Detect existing loops
for dir in scripts/autoresearch/loop-*/; do
[ -d "$dir" ] || continue
LOOP_NAME=$(basename "$dir")
# Check if loop has results
if [[ -f "$dir/results.tsv" ]]; then
ITERS=$(wc -l < "$dir/results.tsv" | tr -d ' ')
BEST=$(sort -t$'\t' -k2 -n "$dir/results.tsv" | head -1 | cut -f2)
echo "ACTIVE:$LOOP_NAME:iterations=$ITERS:best=$BEST"
else
echo "SCAFFOLDED:$LOOP_NAME"
fi
done
Step 3: Display
Autoresearch Scan: {date}
Codebase metrics:
| # | Loop | Metric | Current | Target | Priority | Risk |
|---|-------------------|-------------------|---------|--------|----------|------|
| A | loop-remove-as-any| `as any` casts | {M4} | 0 | P1 | LOW |
| B | loop-eslint-disable| eslint-disable | {M3} | 0 | P2 | MED |
| C | loop-export-fn | export function | {M1} | 0 | P1 | LOW |
| D | loop-interface-type| export interface | {M2} | 0 | P1 | LOW |
| E | loop-todo-comments| TODO comments | {M5} | 0 | P3 | LOW |
Existing loops: {detected loops or "none yet"}
Recommended next step (P1, LOW risk):
/autoresearch --scaffold loop-remove-as-any
Then write program.md, create a worktree, and run the loop.
--scaffold <loop-name>Generate the 3 mechanical files for a loop. Does not generate program.md: write that yourself to encode project-specific constraints.
Create the following files under scripts/autoresearch/{loop-name}/:
measure.sh: the evaluation harness (single metric, returns an integer):
#!/usr/bin/env bash
# measure.sh: {loop-name}
# Returns an integer. Direction: lower = better (unless loop targets coverage/score).
set -euo pipefail
grep -r "PATTERN" src/ --include="*.ts" --include="*.tsx" 2>/dev/null | wc -l | tr -d ' '
direction.txt: improvement direction:
lower
(Use higher for metrics like test coverage or quality score.)
files.txt: scope the agent should operate on:
src/
After creating the files, display:
Loop scaffolded: scripts/autoresearch/{loop-name}/
measure.sh : {pattern} in {scope} -> {N} occurrences today
direction : lower (fewer = better)
files.txt : src/
Current metric: {N} (target: 0)
Next steps:
1. Write program.md -- agent behavior, constraints, what it can/cannot touch
Reference: scripts/autoresearch/loop-remove-as-any/program.md
2. Create a worktree: /worktree feature/autoresearch-{loop-name}
3. cd into the worktree
4. bash scripts/autoresearch/runner.sh {loop-name} 0 15
--run <loop-name>Execute the autonomous loop. The agent runs indefinitely: stop it manually when satisfied.
Verify prerequisites:
[ -f "scripts/autoresearch/{loop-name}/measure.sh" ] || { echo "ERROR: measure.sh missing. Run --scaffold first."; exit 1; }
[ -f "scripts/autoresearch/{loop-name}/program.md" ] || { echo "ERROR: program.md missing. Write it first, this encodes your constraints."; exit 1; }
Run the loop:
Read scripts/autoresearch/{loop-name}/program.md fully before starting. Then enter the following cycle, repeat until stopped:
LOOP ITERATION #{N}
1. Current metric: bash scripts/autoresearch/{loop-name}/measure.sh
2. Read program.md constraints
3. Propose ONE targeted change to files in files.txt
4. Apply the change
5. Re-measure: bash scripts/autoresearch/{loop-name}/measure.sh
6. Evaluate:
- direction=lower AND new < previous -> KEEP (git add -p && git commit -m "autoresearch: {description}")
- otherwise -> REVERT (git checkout -- .)
7. Log to results.tsv: {timestamp}\t{metric}\t{status}\t{description}
8. Continue to iteration #{N+1}
Stopping criteria (from program.md):
Display each iteration:
[iter #{N}] metric: {before} -> {after} | {KEPT/REVERTED} | {change description}
--statusShow status of all loops in the project.
for dir in scripts/autoresearch/loop-*/; do
[ -d "$dir" ] || continue
NAME=$(basename "$dir")
CURRENT=$(bash "$dir/measure.sh" 2>/dev/null || echo "?")
ITERS=$([ -f "$dir/results.tsv" ] && wc -l < "$dir/results.tsv" | tr -d ' ' || echo "0")
KEPT=$([ -f "$dir/results.tsv" ] && grep -c "KEPT" "$dir/results.tsv" || echo "0")
echo "$NAME | current: $CURRENT | iters: $ITERS | kept: $KEPT"
done
Display:
Autoresearch Status
| Loop | Current | Iterations | Kept | Status |
|---------------------|---------|------------|------|-----------|
| loop-remove-as-any | {N} | {N} | {N} | ACTIVE |
| loop-export-fn | {N} | 0 | 0 | SCAFFOLDED|
program.md: The Most Important Fileprogram.md is the agent's behavior contract. Write it yourself, never auto-generate it. It must encode what the agent can/cannot touch for your specific codebase.
Minimal structure:
# Program: {loop-name}
## Objective
Reduce `{metric}` in `src/` to 0. One mechanical change per iteration.
## Measurement
bash scripts/autoresearch/{loop-name}/measure.sh
Lower = better. Target: 0.
## What you CAN do
- Replace `export function X(` with `export const X = (`
- Keep the function signature identical
## What you CANNOT do
- Modify test files
- Change function signatures
- Touch files outside src/
- Make multiple changes per iteration
## Stop when
- Metric = 0
- No more mechanical replacements exist
This command implements the autoresearch loop pattern from karpathy/autoresearch:
| ML Research (karpathy) | Code Quality (this command) |
|---|---|
Modify train.py | Modify src/ files |
Measure val_bpb | Measure grep count |
| 5-minute GPU budget | One atomic change per iteration |
| Keep if val_bpb improves | Keep if count decreases |
git reset if not | git checkout -- . if not |
program.md = agent skill | program.md = agent skill |
Key insight: a fixed, objective metric + git as rollback mechanism = safe autonomous iteration. The agent never needs human approval per-change because every bad change is automatically reverted.
Scan and propose loops:
/autoresearch
Scaffold files for a specific loop:
/autoresearch --scaffold loop-remove-as-any
Run the autonomous loop (after writing program.md):
/autoresearch --run loop-remove-as-any
Check status of all loops:
/autoresearch --status
$ARGUMENTS