Run any Skill in Manus with one click

autoresearch

Autonomous AI research loop - let the agent run ML experiments overnight. Inspired by Karpathy's autoresearch. Use when: autonomous research, ml experiments, overnight training, self-improving models, auto-optimization.

Run Skill in Manus

Overview

Install command

npx skills add https://github.com/hiyenwong/ai_collection --skill autoresearch

Copy and paste this command into Claude Code to install the skill

Source

hiyenwong/ai_collection

Stars1

Forks0

UpdatedJune 4, 2026 at 02:00

File Explorer

2 files

SKILL.md

readonly

More from this repository

same repository

attachment-representations-interbrain-synchrony

hiyenwong/ai_collection

Attachment representations in early childhood as independent endogenous driver of interbrain synchrony during remote cooperation. Novel Remote Partner-Belief Manipulation paradigm isolates attachment representations by manipulating partner-belief. EEG synchrony concentrated at P4 channel (right TPJ). Activation: attachment, interbrain synchrony, EEG hyperscanning, child-adult interaction, attachment representations, social neuroscience, partner-belief manipulation, early childhood, mother-child interaction, brain synchronization, attachment security, social-emotional development.

2026-06-041

sleep-replay-acceleration-sharp

hiyenwong/ai_collection

SHARP (Sleep-based Hierarchical Accelerated Replay) 方法论 — 睡眠启发的分层加速回放框架用于长程非平稳时序模式识别。受啮齿动物慢波睡眠中加速回放启发，通过分离记忆模块和模式识别模块实现无反向传播的长程信用分配。适用于流式时序学习、长程依赖建模、神经科学启发的 AI 架构。触发词：睡眠回放、加速回放、SHARP、时序学习、长程依赖、流式学习、慢波睡眠、hierarchical replay

2026-06-041

piston-control-two-ion-quantum

hiyenwong/ai_collection

Inverse-engineering methodology for piston operations in trapped-ion quantum devices. One ion serves as classical piston driven by Coulomb interaction with quantum-controlled ion. Stationary state determined self-consistently. Inverse-engineering protocols enable precise control of classical ion motion. Provides route toward controlled piston dynamics in microscopic quantum devices.

2026-06-041

quantum-fault-trees-minimal-cut

hiyenwong/ai_collection

Quantum fault tree analysis methodology using quantum computing. Extends classical reliability engineering fault trees to quantum domain. Identifies minimal cut sets in system reliability analysis using quantum algorithms. Applicable to safety-critical systems, cyber-physical systems, and quantum system reliability engineering.

2026-06-041

adaptive-hybrid-feature-fusion-medical

hiyenwong/ai_collection

Adaptive Hybrid Quantum-Classical Feature Fusion methodology for medical image classification. Addresses optimization asymmetries between quantum and classical paradigms using Temperature-Scaled Hybrid Fusion (TSHF), Dynamic Hybrid Fusion (DHF), and Static Hybrid Fusion (SHF) strategies. Use when designing hybrid quantum-classical ML pipelines for healthcare/medical imaging, especially when combining ResNet backbones with variational quantum circuits for diagnostic tasks.

2026-06-041

adaptive-spiking-neuron-asn

hiyenwong/ai_collection

Adaptive Spiking Neuron (ASN) methodology for vision and language modeling. Implements trainable membrane potential dynamics with adaptive firing mechanisms for efficient Spiking Neural Networks (SNNs). Activation: adaptive spiking neuron, ASN, spiking neural network vision language, SNN adaptive neuron, neuromorphic vision language model.

2026-06-041

Source

hiyenwong

hiyenwong/ai_collection

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

name	autoresearch
description	Autonomous AI research loop - let the agent run ML experiments overnight. Inspired by Karpathy's autoresearch. Use when: autonomous research, ml experiments, overnight training, self-improving models, auto-optimization.

AutoResearch 🔬

Let the agent run autonomous ML experiments while you sleep.

Description

AutoResearch enables the agent to autonomously iterate on machine learning experiments. It modifies code, runs training, evaluates results, and keeps improvements - looping indefinitely until manually stopped.

Inspired by karpathy/autoresearch.

Activation Keywords

autoresearch
autonomous research
overnight experiments
ml experiments loop
auto optimization
自主研究
自动实验

Prerequisites

A working ML training setup (single GPU recommended)
uv package manager: curl -LsSf https://astral.sh/uv/install.sh | sh
Clone the autoresearch repo or have your own training code

Quick Start

User: "Start autoresearch on my training code"
Agent: Reads this skill, sets up experiment loop, runs indefinitely

Experiment Loop

Phase 1: Setup

Agree on run tag: Create a tag based on date (e.g., apr5)
Create branch: git checkout -b autoresearch/<tag>
Read in-scope files:
- Training code (e.g., train.py)
- Data prep (e.g., prepare.py) - READ ONLY
- README.md for context
Verify data exists: Check training data is prepared
Initialize results log: Create results.tsv with header
Confirm setup with user

Git Hygiene (CRITICAL)

When the user says "commit each progress" or the session involves paper/research output:

Commit and push after every significant finding — new result, new experiment file, paper update
Use descriptive commit messages that summarize the finding (e.g., "add: expander beats ring by 54% in graph signal experiment")
Never batch multiple findings into one commit — each finding gets its own commit so the progression is traceable
Push before starting the next experiment: git add -A && git commit -m "..." && git push

Phase 2: First Baseline Run

Always run the initial training to establish baseline metrics:

uv run train.py > run.log 2>&1
grep "^val_loss:\|^val_bpb:\|^peak_vram_mb:" run.log

Record baseline in results.tsv.

Phase 3: Autonomous Loop

LOOP FOREVER (until manually interrupted):

1. ANALYZE current state
   - Read results.tsv to see what's been tried
   - Identify patterns: what worked, what didn't
   - Consider next experiment

2. MODIFY code
   - Edit train.py with experimental idea
   - Keep changes focused and reviewable
   
3. COMMIT
   git add -A && git commit -m "experiment: <description>"

4. RUN experiment
   uv run train.py > run.log 2>&1
   
5. EVALUATE results
   grep "^val_bpb:\|^peak_vram_mb:" run.log
   
6. LOG to results.tsv
   - commit hash (7 chars)
   - metric value
   - memory usage
   - status: keep/discard/crash
   - description
   
7. DECIDE
   - Improved (lower val_bpb)? → KEEP, advance branch
   - Worse or equal? → DISCARD, git reset --hard HEAD~1
   - Crashed? → LOG crash, fix or skip

8. REPEAT

Results Log Format

results.tsv (tab-separated):

commit	val_bpb	memory_gb	status	description
a1b2c3d	0.997900	44.0	keep	baseline
b2c3d4e	0.993200	44.2	keep	increase LR to 0.04
c3d4e5f	1.005000	44.0	discard	switch to GeLU activation
d4e5f6g	0.000000	0.0	crash	double model width (OOM)

Experiment Ideas

Architecture Changes

Increase/decrease model depth
Change attention patterns (windowed, local, etc.)
Modify MLP activation functions
Add/remove normalization layers
Experiment with embedding sizes

Optimizer Tuning

Adjust learning rate
Try different optimizers (Adam, Muon, etc.)
Modify weight decay
Experiment with gradient clipping

Training Loop Modifications

Change batch size
Modify sequence length
Add regularization techniques
Implement learning rate schedules

Safety Rules

Rule	Detail
Fixed time budget	Each run = 5 minutes (configurable)
Single file to modify	Only edit train.py (or specified file)
No new dependencies	Use only existing packages
Read-only data prep	Never modify prepare.py
Timeout protection	Kill runs exceeding 2x time budget
Git branch isolation	All work on dedicated branch

Complexity Criterion

All else being equal, simpler is better:

Small improvement + ugly code → NOT worth it
Small improvement + deleted code → DEFINITELY keep
No improvement + simpler code → Keep (simplification win)

Weigh complexity cost against improvement magnitude.

Key Metrics

Metric	Goal	Notes
val_bpb	Lower is better	Validation bits per byte
val_loss	Lower is better	Alternative metric
peak_vram_mb	Monitor	Don't explode memory
MFU	Higher = better efficiency	Model FLOPS Utilization
tokens/sec	Higher = faster	Training throughput

Notifications

When user wakes up / returns:

Summary of experiments run
Best result achieved
Notable discoveries
Recommendations for next steps

Error Handling

Crashes

Easy fix (typo, missing import) → Fix and re-run
Fundamental issue → Log crash, skip idea

OOM (Out of Memory)

Reduce batch size
Reduce model size
Log as crash, try alternative

Timeout

Kill process after 2x budget
Log as failure, revert

Example Session

User: "Run autoresearch on nanogpt overnight"

Agent:
1. Sets up branch autoresearch/apr5
2. Runs baseline: val_bpb = 1.023
3. Tries LR=0.02: val_bpb = 1.015 ✓ KEEP
4. Tries depth=16: val_bpb = 1.008 ✓ KEEP
5. Tries GeLU: val_bpb = 1.010 ✗ DISCARD
6. Tries window attention: val_bpb = 1.002 ✓ KEEP
... (runs 100+ experiments overnight)

User returns to:
- 127 experiments completed
- Best val_bpb: 0.987
- Key insight: window attention + LR=0.015 works best

Advanced Usage

Multiple Agents

Run parallel experiments on different GPUs:

Agent 1: branch autoresearch/apr5-gpu0
Agent 2: branch autoresearch/apr5-gpu1

Custom Time Budget

Modify in prepare.py or via environment variable:

TIME_BUDGET=300  # 5 minutes in seconds

Research Domain Adaptation

Adapt the skill for:

NLP experiments
Computer vision
Reinforcement learning
Any iterative optimization task

Claude Code Integration

For complex autonomous experiments, use Claude Code in tmux instead of scripting the loop yourself. This gives you reasoning, code generation, and result analysis in one agent.

Pattern

1. Create CLAUDE.md with experiment context (goal, current best, iteration plan)
2. Launch Claude Code in tmux: claude --dangerously-skip-permissions
3. Handle startup dialogs (Enter for trust, Down+Enter for permissions)
4. Give the task: specific experiments to run, in priority order
5. Monitor with: tmux capture-pane -t <session> -p -S -30
6. Claude Code will: read code → write sweep scripts → run → analyze → report
7. After it reports, check results, then send next iteration task

When to Use Claude Code vs Scripted Loop

Approach	When	Pros	Cons
Claude Code + tmux	Complex multi-step experiments needing reasoning	Has agency, can fix own bugs	Slower, costs tokens
Scripted loop (bash/Python)	Simple sweep over known parameters	Fast, cheap, reproducible	No bug-fixing, no reasoning

Pitfalls

Claude Code sessions persist in tmux — clean up with tmux kill-session -t <name>
Send multi-line tasks with Enter at the end (the ❯ prompt shows Claude is ready)
For long-running experiments, use notify_on_complete=true and check periodically

Experiment Design Pitfalls

Shared Baselines

When comparing topologies/architectures, always use identical targets (e.g., same data, same seeds, same loss function). Generating fresh targets per topology introduces confounds (different graphs produce different smoothness patterns).

Avoiding the "All Same" Result

If all topologies produce identical results, check:

Task is hard enough (model shouldn't trivially memorize)
Topology actually constrains computation (within-layer mixing is often too weak)
Enough propagation steps (diameter + 2 minimum)
Signal structure matches graph structure (nearby nodes should have correlated targets)

Goldilocks Zone

For communication topology experiments, too LITTLE communication (ring) → information can't spread. Too MUCH communication (dense) → signal dilution. Just RIGHT (expander) → optimal propagation. Start with N=64, d=4 to find the sweet spot, then scale.

Statistical Significance (CRITICAL for papers)

Stable train metrics ≠ meaningful test differences. Always run ≥3 seeds and check if test-metric rankings are consistent across seeds. If train-loss ordering is stable but test-metric ordering flips, the experiment is underpowered — report honestly as "inconclusive at current scale."

Minimum for top-tier venues (NeurIPS/ICLR/ICML):

≥5 seeds for paired statistical tests (paired t-test or Wilcoxon signed-rank)
Report mean ± std and p-values, not just best single run
If flat baseline beats all structured variants, the structural overhead may exceed its benefit at current scale

Do NOT claim a method "wins" based on train loss alone — train loss can be consistent while test metrics are noise.

See references/moe-topology-experiment-lessons.md for a concrete case study (H-MoE-Topo at N=64 vs N=256).

Related Skills

arxiv-search: Find relevant papers for ideas
skill-extractor: Capture patterns from successful experiments
meta-cognitive-reflection: Reflect on research strategy

Resources

Remember: NEVER STOP until manually interrupted. The human expects you to continue working indefinitely.