with one click
autoresearch
// Autonomous ML research loop — modify train.py, run 5-min GPU experiments, track val_bpb, iterate overnight. Invoke when the user wants to run self-directed architecture search on the autoresearch repo.
// Autonomous ML research loop — modify train.py, run 5-min GPU experiments, track val_bpb, iterate overnight. Invoke when the user wants to run self-directed architecture search on the autoresearch repo.
CLI tool (arxivterminal) for fetching, searching, and managing arXiv papers locally. Use when working with arXiv papers using the arxivterminal command - fetching new papers by category, searching the local database, viewing papers from specific dates, or managing the local paper database.
Fetch arXiv papers by date range and topics, rank them for research value, and produce introduction digests. Use when the user wants a literature sweep, daily or weekly paper triage, or a written overview of the best papers in a niche — without relying on the arxivterminal local database.
Search and analyze research papers, find related work, summarize key ideas. Use when user says "find papers", "related work", "literature review", "what does this paper say", or needs to understand academic papers.
AI paper reviewer. Use when the user says 'review my paper', 'help me review this paper', '审稿', 'give me feedback on my paper', 'check my manuscript', 'evaluate this paper for NeurIPS/ICLR/EuroSys'. Accepts PDF files and produces structured narrative reviews with venue-specific dimensional scores and Accept/Reject recommendation.
Convert a local paper PDF to structured Markdown and export all figures as PNG + SVG + drawio. Attempts editable figure reconstruction via the built-in autofigure pipeline (SAM3 → RMBG-2.0 → VLM → SVG), falling back to a layered-SVG wrapper when API keys are unavailable. Use when the user wants to parse a paper PDF, extract its text as Markdown, or get editable/exportable figure assets.
Convert raster figure images into editable DrawIO files using SAM3 segmentation, RMBG-2.0 background removal, and multimodal LLM drawio generation.
| name | autoresearch |
| description | Autonomous ML research loop — modify train.py, run 5-min GPU experiments, track val_bpb, iterate overnight. Invoke when the user wants to run self-directed architecture search on the autoresearch repo. |
| metadata | {"clawphd":{"emoji":"🔬","os":["linux"],"requires":{"bins":["uv","git"]}}} |
Autonomous ML experiment loop. You modify train.py, run timed GPU training, log val_bpb, and iterate forever — keeping improvements, reverting regressions.
The autoresearch/ directory lives at <workspace>/../autoresearch/ relative to the ClawPhD workspace, or wherever the user specifies. Always confirm the path before starting.
Work with the user to:
/home/ubuntu/research/ClawPhD/autoresearch. Verify it exists with exec.mar12). The branch autoresearch/<tag> must not already exist.git checkout -b autoresearch/<tag> from current master.README.md, prepare.py (fixed — do not modify), and train.py (agent-editable).~/.cache/autoresearch/ for data shards and tokenizer. If missing, tell the user to run uv run prepare.py first.commit val_bpb memory_gb status description
Since each training run takes ~5 minutes, use background execution to avoid blocking:
# Step 1 — Launch in background (returns immediately)
cd /home/ubuntu/research/ClawPhD/autoresearch
uv run train.py > run.log 2>&1 &
echo "Training started, PID: $!"
# Step 2 — Poll every ~60s until summary appears
tail -80 run.log
# Step 3 — Extract results once complete
grep "^val_bpb:\|^peak_vram_mb:\|^training_seconds:" run.log
The summary block looks like:
---
val_bpb: 0.997900
training_seconds: 300.1
total_seconds: 325.9
peak_vram_mb: 45060.2
If grep "^val_bpb:" run.log returns empty after 10+ minutes, the run crashed — check tail -50 run.log for the stack trace.
Timeout rule: If training_seconds exceeds 360 after the summary appears, treat it as anomalous but still log it. If the process is still running after 10 minutes wall-clock, kill it: kill <PID>, log as crash, revert.
You CAN:
train.py freely: architecture, optimizer, hyperparameters, batch size, model size, anything.You CANNOT:
prepare.py (fixed evaluation harness, data loading, constants).evaluate_bpb.Goal: lowest val_bpb. Time budget is fixed at 5 minutes, so parameter count and compute efficiency both matter.
Simplicity criterion: All else equal, simpler code wins. A 0.001 improvement with 20 hacky lines is not worth it. Equal performance with less code is always a win.
Tab-separated (NOT comma-separated). Five columns:
commit val_bpb memory_gb status description
commit: 7-char git hashval_bpb: e.g. 0.997900; use 0.000000 for crashesmemory_gb: peak_vram_mb / 1024, rounded to 1 decimal; use 0.0 for crashesstatus: keep, discard, or crashdescription: brief plain-text description of the changeDo not commit results.tsv (leave it untracked).
LOOP FOREVER:
train.py with a new idea.git commit -m "autoresearch: <brief description>"run.log until the summary block appears.val_bpb and peak_vram_mb.results.tsv.val_bpb improved (lower): keep the commit, advance the branch.git reset --hard HEAD~1 (revert to before this experiment).NEVER STOP: Once the loop begins, do not pause to ask the user whether to continue. The user may be asleep. Run indefinitely until manually interrupted.
When stuck: Read papers referenced in train.py comments, re-read prepare.py for new angles, try combining previous near-misses, attempt more radical architectural changes (different depth/width ratios, different attention patterns, different optimizers).
At the end of each experiment (or every N experiments), use the message tool to send a summary to the user's configured channel:
Experiment #N: <description>
val_bpb: 0.XXXXXX → 0.YYYYYY (<+/-> delta)
Status: keep / discard
This way the user wakes up to a digest of overnight progress in Telegram/Slack/etc.
To run the experiment loop fully autonomously (without a human kicking it off), add a task to HEARTBEAT.md in the workspace:
## autoresearch
Every 10 minutes: check if an autoresearch branch is active and the training loop is running. If not, resume the loop from where it left off.
The HeartbeatService will trigger this periodically and the agent will keep the loop alive.