Run any Skill in Manus with one click

$pwd:

autoresearch

Name: Autoresearch
Author: ZhihaoAIRobotic

// Autonomous ML research loop — modify train.py, run 5-min GPU experiments, track val_bpb, iterate overnight. Invoke when the user wants to run self-directed architecture search on the autoresearch repo.

Run Skill in Manus

$ git log --oneline --stat

stars:153

forks:11

updated:March 12, 2026 at 07:46

SKILL.md

readonly

name	autoresearch
description	Autonomous ML research loop — modify train.py, run 5-min GPU experiments, track val_bpb, iterate overnight. Invoke when the user wants to run self-directed architecture search on the autoresearch repo.
metadata	{"clawphd":{"emoji":"🔬","os":["linux"],"requires":{"bins":["uv","git"]}}}

autoresearch

Autonomous ML experiment loop. You modify train.py, run timed GPU training, log val_bpb, and iterate forever — keeping improvements, reverting regressions.

The autoresearch/ directory lives at <workspace>/../autoresearch/ relative to the ClawPhD workspace, or wherever the user specifies. Always confirm the path before starting.

Setup

Work with the user to:

Confirm the repo path: default is /home/ubuntu/research/ClawPhD/autoresearch. Verify it exists with exec.
Agree on a run tag: propose a tag based on today's date (e.g. mar12). The branch autoresearch/<tag> must not already exist.
Create the branch: git checkout -b autoresearch/<tag> from current master.
Read the in-scope files: Read README.md, prepare.py (fixed — do not modify), and train.py (agent-editable).
Verify data: Check ~/.cache/autoresearch/ for data shards and tokenizer. If missing, tell the user to run uv run prepare.py first.
Initialize results.tsv: Create it with just the header row:
```
commit	val_bpb	memory_gb	status	description
```
Confirm and go: Confirm setup, then start the loop immediately.

Running experiments (exec + background pattern)

Since each training run takes ~5 minutes, use background execution to avoid blocking:

# Step 1 — Launch in background (returns immediately)
cd /home/ubuntu/research/ClawPhD/autoresearch
uv run train.py > run.log 2>&1 &
echo "Training started, PID: $!"

# Step 2 — Poll every ~60s until summary appears
tail -80 run.log

# Step 3 — Extract results once complete
grep "^val_bpb:\|^peak_vram_mb:\|^training_seconds:" run.log

The summary block looks like:

---
val_bpb:          0.997900
training_seconds: 300.1
total_seconds:    325.9
peak_vram_mb:     45060.2

If grep "^val_bpb:" run.log returns empty after 10+ minutes, the run crashed — check tail -50 run.log for the stack trace.

Timeout rule: If training_seconds exceeds 360 after the summary appears, treat it as anomalous but still log it. If the process is still running after 10 minutes wall-clock, kill it: kill <PID>, log as crash, revert.

Experiment rules

You CAN:

Modify train.py freely: architecture, optimizer, hyperparameters, batch size, model size, anything.

You CANNOT:

Modify prepare.py (fixed evaluation harness, data loading, constants).
Install new packages or add dependencies.
Change the evaluation function evaluate_bpb.

Goal: lowest val_bpb. Time budget is fixed at 5 minutes, so parameter count and compute efficiency both matter.

Simplicity criterion: All else equal, simpler code wins. A 0.001 improvement with 20 hacky lines is not worth it. Equal performance with less code is always a win.

Logging to results.tsv

Tab-separated (NOT comma-separated). Five columns:

commit	val_bpb	memory_gb	status	description

commit: 7-char git hash
val_bpb: e.g. 0.997900; use 0.000000 for crashes
memory_gb: peak_vram_mb / 1024, rounded to 1 decimal; use 0.0 for crashes
status: keep, discard, or crash
description: brief plain-text description of the change

Do not commit results.tsv (leave it untracked).

The experiment loop

LOOP FOREVER:

Check git state: current branch and last commit.
Edit train.py with a new idea.
git commit -m "autoresearch: <brief description>"
Launch training in background (see above).
Poll run.log until the summary block appears.
Extract val_bpb and peak_vram_mb.
Log to results.tsv.
If val_bpb improved (lower): keep the commit, advance the branch.
If equal or worse: git reset --hard HEAD~1 (revert to before this experiment).

NEVER STOP: Once the loop begins, do not pause to ask the user whether to continue. The user may be asleep. Run indefinitely until manually interrupted.

When stuck: Read papers referenced in train.py comments, re-read prepare.py for new angles, try combining previous near-misses, attempt more radical architectural changes (different depth/width ratios, different attention patterns, different optimizers).

Reporting results via ClawPhD channels

At the end of each experiment (or every N experiments), use the message tool to send a summary to the user's configured channel:

Experiment #N: <description>
val_bpb: 0.XXXXXX → 0.YYYYYY (<+/-> delta)
Status: keep / discard

This way the user wakes up to a digest of overnight progress in Telegram/Slack/etc.

Autonomous overnight use with HeartbeatService

To run the experiment loop fully autonomously (without a human kicking it off), add a task to HEARTBEAT.md in the workspace:

## autoresearch
Every 10 minutes: check if an autoresearch branch is active and the training loop is running. If not, resume the loop from where it left off.

The HeartbeatService will trigger this periodically and the agent will keep the loop alive.

related-skills.json

same repository

arxivterminal.md

from "ZhihaoAIRobotic/ClawPhD"

CLI tool (arxivterminal) for fetching, searching, and managing arXiv papers locally. Use when working with arXiv papers using the arxivterminal command - fetching new papers by category, searching the local database, viewing papers from specific dates, or managing the local paper database.

2026-03-24153

paper-scout.md

from "ZhihaoAIRobotic/ClawPhD"

Fetch arXiv papers by date range and topics, rank them for research value, and produce introduction digests. Use when the user wants a literature sweep, daily or weekly paper triage, or a written overview of the best papers in a niche — without relying on the arxivterminal local database.

2026-03-24153

research-lit.md

from "ZhihaoAIRobotic/ClawPhD"

Search and analyze research papers, find related work, summarize key ideas. Use when user says "find papers", "related work", "literature review", "what does this paper say", or needs to understand academic papers.

2026-03-24153

ai-review.md

from "ZhihaoAIRobotic/ClawPhD"

AI paper reviewer. Use when the user says 'review my paper', 'help me review this paper', '审稿', 'give me feedback on my paper', 'check my manuscript', 'evaluate this paper for NeurIPS/ICLR/EuroSys'. Accepts PDF files and produces structured narrative reviews with venue-specific dimensional scores and Accept/Reject recommendation.

2026-03-23153

pdf2md.md

from "ZhihaoAIRobotic/ClawPhD"

Convert a local paper PDF to structured Markdown and export all figures as PNG + SVG + drawio. Attempts editable figure reconstruction via the built-in autofigure pipeline (SAM3 → RMBG-2.0 → VLM → SVG), falling back to a layered-SVG wrapper when API keys are unavailable. Use when the user wants to parse a paper PDF, extract its text as Markdown, or get editable/exportable figure assets.

2026-03-15153

autofigure.md

from "ZhihaoAIRobotic/ClawPhD"

Convert raster figure images into editable DrawIO files using SAM3 segmentation, RMBG-2.0 background removal, and multimodal LLM drawio generation.

2026-03-14153

package.json

"author": "ZhihaoAIRobotic"

"repository": "ZhihaoAIRobotic/ClawPhD"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Computer and Information Research ScientistsComputer and Mathematical Occupations15-1221L4

name	autoresearch
description	Autonomous ML research loop — modify train.py, run 5-min GPU experiments, track val_bpb, iterate overnight. Invoke when the user wants to run self-directed architecture search on the autoresearch repo.
metadata	{"clawphd":{"emoji":"🔬","os":["linux"],"requires":{"bins":["uv","git"]}}}

autoresearch

Autonomous ML experiment loop. You modify train.py, run timed GPU training, log val_bpb, and iterate forever — keeping improvements, reverting regressions.

The autoresearch/ directory lives at <workspace>/../autoresearch/ relative to the ClawPhD workspace, or wherever the user specifies. Always confirm the path before starting.

Setup

Work with the user to:

Confirm the repo path: default is /home/ubuntu/research/ClawPhD/autoresearch. Verify it exists with exec.
Agree on a run tag: propose a tag based on today's date (e.g. mar12). The branch autoresearch/<tag> must not already exist.
Create the branch: git checkout -b autoresearch/<tag> from current master.
Read the in-scope files: Read README.md, prepare.py (fixed — do not modify), and train.py (agent-editable).
Verify data: Check ~/.cache/autoresearch/ for data shards and tokenizer. If missing, tell the user to run uv run prepare.py first.
Initialize results.tsv: Create it with just the header row:
```
commit	val_bpb	memory_gb	status	description
```
Confirm and go: Confirm setup, then start the loop immediately.

Running experiments (exec + background pattern)

Since each training run takes ~5 minutes, use background execution to avoid blocking:

# Step 1 — Launch in background (returns immediately)
cd /home/ubuntu/research/ClawPhD/autoresearch
uv run train.py > run.log 2>&1 &
echo "Training started, PID: $!"

# Step 2 — Poll every ~60s until summary appears
tail -80 run.log

# Step 3 — Extract results once complete
grep "^val_bpb:\|^peak_vram_mb:\|^training_seconds:" run.log

The summary block looks like:

---
val_bpb:          0.997900
training_seconds: 300.1
total_seconds:    325.9
peak_vram_mb:     45060.2

If grep "^val_bpb:" run.log returns empty after 10+ minutes, the run crashed — check tail -50 run.log for the stack trace.

Experiment rules

You CAN:

Modify train.py freely: architecture, optimizer, hyperparameters, batch size, model size, anything.

You CANNOT:

Modify prepare.py (fixed evaluation harness, data loading, constants).
Install new packages or add dependencies.
Change the evaluation function evaluate_bpb.

Goal: lowest val_bpb. Time budget is fixed at 5 minutes, so parameter count and compute efficiency both matter.

Simplicity criterion: All else equal, simpler code wins. A 0.001 improvement with 20 hacky lines is not worth it. Equal performance with less code is always a win.

Logging to results.tsv

Tab-separated (NOT comma-separated). Five columns:

commit	val_bpb	memory_gb	status	description

commit: 7-char git hash
val_bpb: e.g. 0.997900; use 0.000000 for crashes
memory_gb: peak_vram_mb / 1024, rounded to 1 decimal; use 0.0 for crashes
status: keep, discard, or crash
description: brief plain-text description of the change

Do not commit results.tsv (leave it untracked).

The experiment loop

LOOP FOREVER:

Check git state: current branch and last commit.
Edit train.py with a new idea.
git commit -m "autoresearch: <brief description>"
Launch training in background (see above).
Poll run.log until the summary block appears.
Extract val_bpb and peak_vram_mb.
Log to results.tsv.
If val_bpb improved (lower): keep the commit, advance the branch.
If equal or worse: git reset --hard HEAD~1 (revert to before this experiment).

NEVER STOP: Once the loop begins, do not pause to ask the user whether to continue. The user may be asleep. Run indefinitely until manually interrupted.

Reporting results via ClawPhD channels

At the end of each experiment (or every N experiments), use the message tool to send a summary to the user's configured channel:

Experiment #N: <description>
val_bpb: 0.XXXXXX → 0.YYYYYY (<+/-> delta)
Status: keep / discard

This way the user wakes up to a digest of overnight progress in Telegram/Slack/etc.

Autonomous overnight use with HeartbeatService

To run the experiment loop fully autonomously (without a human kicking it off), add a task to HEARTBEAT.md in the workspace:

## autoresearch
Every 10 minutes: check if an autoresearch branch is active and the training loop is running. If not, resume the loop from where it left off.

The HeartbeatService will trigger this periodically and the agent will keep the loop alive.

autoresearch

autoresearch

Setup

Running experiments (exec + background pattern)

Experiment rules

Logging to results.tsv

The experiment loop

Reporting results via ClawPhD channels

Autonomous overnight use with HeartbeatService

More from this repository

More from this repository

autoresearch

Setup

Running experiments (exec + background pattern)

Experiment rules

Logging to results.tsv

The experiment loop

Reporting results via ClawPhD channels

Autonomous overnight use with HeartbeatService