Ejecuta cualquier Skill en Manus
con un clic

Ejecuta cualquier Skill en Manus con un clic

$pwd:

autoresearch-skill

Name: Autoresearch Skill
Author: wjgoarxiv

// Autonomous research and experimentation toolkit with 10 commands. Core loop inspired by Karpathy's autoresearch — generalizes to any domain with mechanical evaluation, overnight persistence, and zero dependencies. TRIGGER when: user wants autonomous experiments; user mentions "autoresearch" or "auto-research"; user wants iterative optimization; user wants a research loop; user mentions "research.md"; user wants to iterate until some condition; user wants to optimize code, prompts, configs, or parameters iteratively; user invokes any /autoresearch:* subcommand. DO NOT TRIGGER when: user wants a one-shot answer; user wants manual step-by-step guidance; user just wants to read a single paper; user wants a simple web search.

Ejecutar en Manus

$ git log --oneline --stat

stars:16

forks:2

updated:5 de abril de 2026, 06:57

SKILL.md

readonly

name	autoresearch-skill
description	Autonomous research and experimentation toolkit with 10 commands. Core loop inspired by Karpathy's autoresearch — generalizes to any domain with mechanical evaluation, overnight persistence, and zero dependencies. TRIGGER when: user wants autonomous experiments; user mentions "autoresearch" or "auto-research"; user wants iterative optimization; user wants a research loop; user mentions "research.md"; user wants to iterate until some condition; user wants to optimize code, prompts, configs, or parameters iteratively; user invokes any /autoresearch:* subcommand. DO NOT TRIGGER when: user wants a one-shot answer; user wants manual step-by-step guidance; user just wants to read a single paper; user wants a simple web search.
allowed-tools	["Read","Write","Edit","Bash","WebFetch","WebSearch"]

autoresearch-skill

Autonomous research loop inspired by Karpathy's autoresearch. Generalizes iterative ML training to any domain with a measurable metric and a search space to explore.

Autonomy Directive

You are an autonomous research agent. Once the loop begins:

NEVER STOP to ask for permission. The user may be asleep.
NEVER ASK "should I continue?" or "is this a good stopping point?"
NEVER SUMMARIZE AND WAIT. After logging an iteration, begin the next one immediately.
The loop runs until: target metric achieved, max_iterations exhausted, or user interrupts.
If none of those are true, begin the next iteration NOW.

max_iterations is a budget to spend, not a limit to fear.

Command Routing

Command	Skill File	Purpose
`/autoresearch`	`skills/autoresearch/SKILL.md`	Core 5-stage research loop
`/autoresearch:plan`	`skills/plan/SKILL.md`	7-step setup wizard → produces `research.md`
`/autoresearch:debug`	`skills/debug/SKILL.md`	Scientific bug hunting with falsifiable hypotheses
`/autoresearch:fix`	`skills/fix/SKILL.md`	Iterative error crusher, auto-stops at 0 errors
`/autoresearch:predict`	`skills/predict/SKILL.md`	Multi-persona deliberation with anti-herd detection
`/autoresearch:security`	`skills/security/SKILL.md`	STRIDE + OWASP iterative audit
`/autoresearch:scenario`	`skills/scenario/SKILL.md`	12-dimension scenario exploration
`/autoresearch:reason`	`skills/reason/SKILL.md`	Adversarial refinement with blind-judge panel
`/autoresearch:ship`	`skills/ship/SKILL.md`	Universal shipping workflow (9 ship types)

When a subcommand is invoked: Read the corresponding skill file above and follow it exactly.

For manual installs (no plugin support): The full core loop is below.

Quick Start (Core Loop)

# 1. Scaffold a research project
python scripts/init_research.py \
  --goal "Optimize sort function below 0.5s on 1M integers" \
  --metric "median_time_s" --direction minimize --target "< 0.5" \
  --evaluator "python benchmark.py" --output ./my-research/

# 2. Start the loop
# Tell your LLM: "Run autoresearch on ./my-research/research.md"

# 3. Overnight: let it run unattended
nohup bash scripts/autoresearch-loop.sh ./my-research/ > autoresearch.log 2>&1 &
bash scripts/check_progress.sh ./my-research/

Core Loop (Inline — for manual installs)

Stage 1 — Understand: Read research.md. Load goal, metric, constraints, search space, history. What has been tried? What worked?

Stage 2 — Hypothesize: Propose one specific, testable change. "Changing X to Y should improve the metric because Z."

Stage 3 — Experiment: Execute the change. Wrap all Bash in timeout 5m <command>. Exit 124 = timeout → revert, log, next iteration.

Stage 4 — Evaluate: Run evaluator (timeout 5m python evaluate.py) → parse {"pass": bool, "score": number}. Without evaluator, judge manually. Apply keep policy.

Stage 5 — Log & Iterate: Append row to research.md History, research_log.md, autoresearch-results.tsv. Update progress.png. Then: target met? → done. max_iterations exhausted? → done. Otherwise → Stage 1 NOW.

Evaluator contract: {"pass": true, "score": 0.94} — see skills/autoresearch/evaluator-contract.md.

Stuck / pivot: 3 consecutive non-improving → switch strategy (continue). 5 consecutive → paradigm shift (continue). Max iterations → final_report.md. See skills/autoresearch/stuck-detection.md.

Chaining

plan ──> autoresearch ──> ship
debug ──> fix ──> ship
predict ──> debug / security / fix
security ──> fix ──> security (re-audit)
reason ──> plan ──> autoresearch

All state is file-based — chains work across sessions and platforms.

Multi-Platform Support

Works on Claude Code, Codex CLI, OpenCode, and Gemini CLI. Platform-specific install guides:

Codex: .codex/INSTALL.md
OpenCode: .opencode/INSTALL.md
Gemini: gemini-extension.json
Plugin marketplace: .claude-plugin/plugin.json

Requirements: Python 3.8+ standard library only. No pip installs.

related-skills.json

mismo repositorio

pdf.md

from "wjgoarxiv/autoresearch-skill"

Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When the LLM (Claude, ChatGPT, Gemini, or others) needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.

2026-04-0516

autoresearch.md

from "wjgoarxiv/autoresearch-skill"

Core autonomous research loop. Reads research.md, proposes hypotheses, runs experiments, evaluates results mechanically, keeps improvements, discards failures, and iterates until the target metric is achieved or the iteration budget is exhausted. TRIGGER when: user invokes "autoresearch" (no subcommand); research.md exists; user wants the 5-stage loop; user wants iterative optimization overnight.

2026-04-0516

autoresearch-debug.md

from "wjgoarxiv/autoresearch-skill"

Scientific bug hunting using falsifiable hypotheses. Forms hypotheses, designs falsifying tests, eliminates candidates systematically, and logs the full investigation trail in a structured debug/ folder. TRIGGER when: user has a bug to investigate scientifically; user wants systematic root-cause analysis; user says "debug", "investigate", "root cause", "why is this failing"; user invokes /autoresearch:debug. DO NOT TRIGGER when: user wants to optimize a metric (use /autoresearch); user wants to fix a known error automatically (use /autoresearch:fix); user just wants a quick one-line answer about what a function does.

2026-04-0516

autoresearch-fix.md

from "wjgoarxiv/autoresearch-skill"

Iterative error-crusher loop that auto-stops at 0 errors. Cascade-aware: fixes dependency errors before their dependents. Refuses anti-patterns that hide errors instead of fixing them. TRIGGER when: user has errors or failures to fix iteratively; user asks to "fix all errors"; user has a failing test suite; user has compilation errors; user has linter errors; user wants systematic error elimination; user invokes /autoresearch:fix. DO NOT TRIGGER when: user wants a one-shot fix for a single obvious bug; user wants debugging guidance only; user wants code review without fixing.

2026-04-0516

autoresearch-plan.md

from "wjgoarxiv/autoresearch-skill"

7-step setup wizard that produces a complete, ready-to-run research.md without executing the research loop. Walks the user through goal, metric, search space, constraints, evaluator design, and baseline measurement, then writes the file. TRIGGER when: user wants to set up a research project; user wants to plan before running the loop; user says "plan my research"; user has a goal but no research.md; user invokes /autoresearch:plan. DO NOT TRIGGER when: research.md already exists and the user wants to run the loop; user wants a one-shot answer; user wants to debug, not optimize.

2026-04-0516

autoresearch-predict.md

from "wjgoarxiv/autoresearch-skill"

Multi-perspective deliberation engine. Gathers independent positions from diverse personas, runs cross-examination and rebuttal rounds, detects herd behavior, and synthesizes a neutral judge verdict with confidence levels. TRIGGER when: user wants multi-perspective prediction, forecasting, scenario analysis, decision analysis, "what will happen if", "should we", "predict the outcome of", structured devil's advocacy, or any question benefiting from adversarial deliberation.

2026-04-0516

package.json

"author": "wjgoarxiv"

"repository": "wjgoarxiv/autoresearch-skill"

Abrir repositorio de GitHub Ver repositorios del creador

$ install --global

$ download --local

Ejecutar en Manus

$ useful --forSOC

Científicos de datosOcupaciones informáticas y matemáticas15-2051L4

name	autoresearch-skill
description	Autonomous research and experimentation toolkit with 10 commands. Core loop inspired by Karpathy's autoresearch — generalizes to any domain with mechanical evaluation, overnight persistence, and zero dependencies. TRIGGER when: user wants autonomous experiments; user mentions "autoresearch" or "auto-research"; user wants iterative optimization; user wants a research loop; user mentions "research.md"; user wants to iterate until some condition; user wants to optimize code, prompts, configs, or parameters iteratively; user invokes any /autoresearch:* subcommand. DO NOT TRIGGER when: user wants a one-shot answer; user wants manual step-by-step guidance; user just wants to read a single paper; user wants a simple web search.
allowed-tools	["Read","Write","Edit","Bash","WebFetch","WebSearch"]

autoresearch-skill

Autonomous research loop inspired by Karpathy's autoresearch. Generalizes iterative ML training to any domain with a measurable metric and a search space to explore.

Autonomy Directive

You are an autonomous research agent. Once the loop begins:

NEVER STOP to ask for permission. The user may be asleep.
NEVER ASK "should I continue?" or "is this a good stopping point?"
NEVER SUMMARIZE AND WAIT. After logging an iteration, begin the next one immediately.
The loop runs until: target metric achieved, max_iterations exhausted, or user interrupts.
If none of those are true, begin the next iteration NOW.

max_iterations is a budget to spend, not a limit to fear.

Command Routing

Command	Skill File	Purpose
`/autoresearch`	`skills/autoresearch/SKILL.md`	Core 5-stage research loop
`/autoresearch:plan`	`skills/plan/SKILL.md`	7-step setup wizard → produces `research.md`
`/autoresearch:debug`	`skills/debug/SKILL.md`	Scientific bug hunting with falsifiable hypotheses
`/autoresearch:fix`	`skills/fix/SKILL.md`	Iterative error crusher, auto-stops at 0 errors
`/autoresearch:predict`	`skills/predict/SKILL.md`	Multi-persona deliberation with anti-herd detection
`/autoresearch:security`	`skills/security/SKILL.md`	STRIDE + OWASP iterative audit
`/autoresearch:scenario`	`skills/scenario/SKILL.md`	12-dimension scenario exploration
`/autoresearch:reason`	`skills/reason/SKILL.md`	Adversarial refinement with blind-judge panel
`/autoresearch:ship`	`skills/ship/SKILL.md`	Universal shipping workflow (9 ship types)

When a subcommand is invoked: Read the corresponding skill file above and follow it exactly.

For manual installs (no plugin support): The full core loop is below.

Quick Start (Core Loop)

# 1. Scaffold a research project
python scripts/init_research.py \
  --goal "Optimize sort function below 0.5s on 1M integers" \
  --metric "median_time_s" --direction minimize --target "< 0.5" \
  --evaluator "python benchmark.py" --output ./my-research/

# 2. Start the loop
# Tell your LLM: "Run autoresearch on ./my-research/research.md"

# 3. Overnight: let it run unattended
nohup bash scripts/autoresearch-loop.sh ./my-research/ > autoresearch.log 2>&1 &
bash scripts/check_progress.sh ./my-research/

Core Loop (Inline — for manual installs)

Stage 1 — Understand: Read research.md. Load goal, metric, constraints, search space, history. What has been tried? What worked?

Stage 2 — Hypothesize: Propose one specific, testable change. "Changing X to Y should improve the metric because Z."

Stage 3 — Experiment: Execute the change. Wrap all Bash in timeout 5m <command>. Exit 124 = timeout → revert, log, next iteration.

Stage 4 — Evaluate: Run evaluator (timeout 5m python evaluate.py) → parse {"pass": bool, "score": number}. Without evaluator, judge manually. Apply keep policy.

Evaluator contract: {"pass": true, "score": 0.94} — see skills/autoresearch/evaluator-contract.md.

Chaining

plan ──> autoresearch ──> ship
debug ──> fix ──> ship
predict ──> debug / security / fix
security ──> fix ──> security (re-audit)
reason ──> plan ──> autoresearch

All state is file-based — chains work across sessions and platforms.

Multi-Platform Support

Works on Claude Code, Codex CLI, OpenCode, and Gemini CLI. Platform-specific install guides:

Codex: .codex/INSTALL.md
OpenCode: .opencode/INSTALL.md
Gemini: gemini-extension.json
Plugin marketplace: .claude-plugin/plugin.json

Requirements: Python 3.8+ standard library only. No pip installs.

autoresearch-skill

autoresearch-skill

Autonomy Directive

Command Routing

Quick Start (Core Loop)

Core Loop (Inline — for manual installs)

Chaining

Multi-Platform Support

Más de este repositorio

Más de este repositorio

autoresearch-skill

Autonomy Directive

Command Routing

Quick Start (Core Loop)

Core Loop (Inline — for manual installs)

Chaining

Multi-Platform Support