// Autonomous time-budget experiment loop. Modify a training script, train for a fixed wall-clock budget, evaluate, record, repeat. Inspired by karpathy/autoresearch. Use for overnight architecture search, systematic hyperparameter sweeps, or any iterative model improvement workflow.
Autonomous time-budget experiment loop. Modify a training script, train for a fixed wall-clock budget, evaluate, record, repeat. Inspired by karpathy/autoresearch. Use for overnight architecture search, systematic hyperparameter sweeps, or any iterative model improvement workflow.
Run autonomous time-budget experiment loops. Each iteration modifies train.py,
trains for a fixed wall-clock budget, evaluates, records in results.tsv, and repeats.
Setup
Ensure results.tsv exists with a baseline (exp000) before iterating
Create EXPERIMENT.md with your goal, baseline, hypothesis, and constraints
Run: /mlx:autoexperiment path/to/train.py
Protocol
Before each iteration
Read EXPERIMENT.md for the current hypothesis
Read results.tsv for experiment history
Identify one change to make (ONE variable only)
Iteration loop
Edit train.py with the single change
Run with TIME_BUDGET: timeout $BUDGET uv run train.py
Capture exit code and metrics
Record in results.tsv: KEEP / DISCARD / CRASH
If CRASH 3× in a row on the same error → stop, report diagnosis
After each iteration
Update EXPERIMENT.md "Next to try" section
Summarize: what changed, what happened, what's next
Templates
See references/EXPERIMENT.md.template for the hypothesis file format.
See scripts/time_budget_train.py for a complete training script template with all patterns.
Key patterns
TIME_BUDGET: wall-clock seconds, not epochs. ~12 experiments/hour at 300s each