mit einem Klick
auto-experiment
// Launch an autonomous THINK→EXECUTE→REFLECT experiment loop on a GPU project
// Launch an autonomous THINK→EXECUTE→REFLECT experiment loop on a GPU project
Search papers from top AI/ML conferences
Daily arXiv paper recommendations with automatic deduplication
Check status of running autonomous experiment loops
Check GPU status, running experiments, and available resources
Refresh Obsidian dashboard and daily notes from current experiment state
Deep analysis of a single paper with figure extraction from arXiv source
| name | auto-experiment |
| description | Launch an autonomous THINK→EXECUTE→REFLECT experiment loop on a GPU project |
Launch an autonomous experiment agent that runs your deep learning experiments 24/7.
This skill starts a THINK → EXECUTE → REFLECT loop that:
PROJECT_BRIEF.md to understand the research goalMEMORY_LOG.mdnohup (tracks PID)kill -0 PID + tail log + nvidia-smi)Claude Code: /auto-experiment
Claude Code: /auto-experiment --project /path/to/my_project --gpu 0
Claude Code: /auto-experiment --project . --max-cycles 5
Codex: $auto-experiment
The project directory must contain:
PROJECT_BRIEF.md (required)A frozen reference describing your research goal. Example:
# Goal
Train a ViT-B/16 on ImageNet to reach 78%+ top-1 accuracy.
# Codebase
- Training: train.py
- Config: configs/vit_base.yaml
- Data: /data/imagenet/
# Constraints
- GPU 0-3 available (use DDP)
- Max 90 epochs per run
- Report val accuracy after each run
# Current Best
- ResNet-50 baseline: 76.1%
config.yaml (optional)Override default agent settings:
agent:
provider: "anthropic" # or "openai" / "claude_cli" / "codex_cli"
model: "claude-sonnet-4-6"
base_url: "" # optional compatible endpoint override
api_key_env: "" # optional custom key env var
auth_token_env: "" # optional custom bearer token env var
max_cycles: -1 # -1 = unlimited
max_steps_per_cycle: 3 # max sub-agent dispatches per cycle
cooldown_interval: 300 # 5 min smart polling
memory:
brief_max_chars: 3000
log_max_chars: 2000
monitor:
poll_interval: 900 # check every 15 min during training
zero_llm: true
experiment:
mandatory_dry_run: true
If the user wants a compatible API endpoint instead of the official Anthropic
or OpenAI API, keep the same provider values and set base_url plus a custom
api_key_env. Do not invent provider names like qwen or glm.
Optional remote execution over SSH:
execution:
mode: "ssh"
ssh_host: "user@server"
remote_workspace: "/home/user/my_project/workspace"
remote_python: "python3"
In SSH mode, the controller state stays local (PROJECT_BRIEF.md,
workspace/MEMORY_LOG.md, workspace/HUMAN_DIRECTIVE.md, state.json),
while code edits, shell commands, training, log tailing, PID checks, and GPU
queries run on the configured remote host.
PROJECT_BRIEF.md (frozen, max 3000 chars)MEMORY_LOG.md (rolling, auto-compacted)HUMAN_DIRECTIVE.md (highest priority, auto-archived after reading)run_shell, launch_experiment, write_file, read_file, list_files)nohup, capture PIDnvidia-smi — GPU utilizationtail -50 logfile — latest training output# Drop a directive file — agent reads it next cycle with highest priority
echo "Try learning rate 1e-5 with cosine schedule" > workspace/HUMAN_DIRECTIVE.md
Two-Tier, constant size (~5K chars / ~1500 tokens), no matter how long the agent runs:
| Tier | File | Content | Cap |
|---|---|---|---|
| 1 | PROJECT_BRIEF.md | Frozen project reference | 3,000 chars |
| 2 | MEMORY_LOG.md | Key Results + Recent Decisions | 2,000 chars |
Auto-compaction rules:
| Phase | Duration | LLM Cost |
|---|---|---|
| THINK | 5-10 min | ~$0.05 |
| EXECUTE (training) | hours/days | $0.00 |
| REFLECT | 5-10 min | ~$0.03 |
| 24h cycle total | ~$0.08 |
After a few cycles, your workspace/MEMORY_LOG.md will look like:
# Memory Log
## Key Results
[04-07 14:30] Exp001: ResNet-50 baseline, lr=0.1, acc=76.1%
[04-07 22:15] Exp002: ViT-B/16, lr=1e-3, acc=74.8% (underperforming, lr too high)
[04-08 06:00] Exp003: ViT-B/16, lr=3e-4 + cosine, acc=77.9% (new best!)
[04-08 14:45] Exp004: ViT-B/16, lr=3e-4 + cosine + mixup, acc=78.3% (target reached!)
## Recent Decisions
[04-07 14:30] Start with ResNet-50 baseline to establish reference
[04-07 22:15] ViT lr=1e-3 too high, try 3e-4 next
[04-08 06:00] Cosine schedule helped significantly, try adding regularization
[04-08 14:45] Target reached! Generate final report.