Run any Skill in Manus with one click

autoresearch

Autonomous iterative research with mechanical quality gates — multi-round loops, per-round verification, agent doesn't self-decide completion (gate does). **Use this proactively for ≥3 research questions or multi-source verification** — even if the user just says 'research X'. For quick lookups (1–2 questions, single source), use web-research instead. Triggers: 'autoresearch', '自动研究', '深度调研', 'deep research', '多轮调研', 'comprehensive research'. Works standalone or under Mercury dispatch.

Run Skill in Manus

Overview

Install command

npx skills add https://github.com/392fyc/Mercury --skill autoresearch

Copy and paste this command into Claude Code to install the skill

Source

392fyc/Mercury

Stars6

Forks1

UpdatedMay 9, 2026 at 03:32

SKILL.md

readonly

Autoresearch Protocol

Purpose

Autonomous iterative research for comprehensive investigations. Inspired by Karpathy autoresearch philosophy: the agent does NOT decide when research is complete -- only the mechanical quality gate does. This is slightly more relaxed than the original NEVER STOP directive: the loop terminates when all gate metrics pass, but the agent may never self-declare completion or skip the gate.

When This Applies

researchScope === "deep" (Mercury dispatch)
Research questions >= 3
Cross-verification across >= 3 independent sources required
Architectural decision analysis (comparing alternatives)
User invokes /autoresearch or says "自动研究" / "深度调研"

For lighter research (1-2 questions, single-source verification), use the web-research skill instead.

Iron Rules

NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE -- confidence is not evidence.
Every factual claim requires a source URL or an explicit UNVERIFIED tag -- no exceptions.
Agent self-reports are not evidence -- use independent verification (subagent or checklist).
"Should work" / "probably" / "I believe" are banned -- use "verified at [URL]" or "UNVERIFIED".
Never paste raw WebSearch/WebFetch output into the report or conversation -- extract claim + URL + 1-sentence evidence only. Raw search dumps bloat context and have caused session stops (Issue #215). Use the Search Worker Protocol (below) to keep raw results out of the autoresearch agent's own context.

Rationalization Prevention

Excuse	Reality
"Should be done now"	Run the quality gate
"I'm confident in these findings"	Confidence != Evidence
"The search results confirmed it"	Show the URL and cited text
"I covered the main points"	Check question_answer_rate >= 0.9
"Further research would be diminishing returns"	Only the quality gate decides that

Invocation & Bootstrap

Argument Parsing

/autoresearch <topic>

Optional directives (append to topic or set in dispatch prompt):

MAX_ROUNDS: N -- hard cap on iterations (default: 10)
QUESTIONS: Q1; Q2; Q3 -- explicit research questions (otherwise auto-generated)

Environment Detection (do this FIRST)

Check if Mercury_KB/04-research/ exists in the workspace:
- YES -> Mercury mode
  - Report: Mercury_KB/04-research/RESEARCH-{TOPIC}-{ID}.md
  - State: Mercury_KB/04-research/.research-state/
  - If a TaskBundle is in the dispatch prompt, read task metadata from it
  - RESULTS_FILE: results-{ISSUE_NUM}.jsonl (issue number from TaskBundle)
- NO -> Standalone mode
  - Report: .research/reports/RESEARCH-{TOPIC}-{DATE}.md
  - State: .research/state/
  - RESULTS_FILE: results.jsonl (no issue number in standalone mode)
  - Create directories using Bash tool: mkdir -p .research/reports .research/state (Claude Code runs in bash shell on all platforms including Windows)

Research Manifest

On Round 1, create research-manifest.json in the state directory:

{
  "topic": "Your research topic",
  "questions": ["Q1: ...", "Q2: ...", "Q3: ..."],
  "max_rounds": 10,
  "started_at": "2026-04-05T12:00:00Z",
  "mode": "standalone"
}

If no QUESTIONS directive was provided, decompose the topic into 3-7 focused research sub-questions before starting.

Create the initial report file with the topic as H1 and questions as an H2 checklist.

Research Loop

You are in a loop. DO NOT declare completion. DO NOT summarize prematurely. Only the mechanical quality gate (Step 5) can end this loop. You may NOT judge "good enough" -- the gate decides.

Round N:
  1. RESTORE  -- Read research-manifest.json + {RESULTS_FILE} + report
                (for reports > 200 lines, read only the section for the current question)
  2. PLAN     -- Pick 1-3 unanswered or weakest questions for this round
  3. SEARCH   -- Dispatch one worker sub-agent per selected question via
                Agent() (see "Search Worker Protocol" below). Worker does
                WebSearch + WebFetch (minimum 3 searches, different angles)
                and returns a compressed summary under 500 tokens. If Agent()
                is unavailable (nested subagent / Codex mode), call WebSearch
                directly but extract only claim + URL + 1-sentence evidence;
                never leave raw search result text in conversation context.
  4. WRITE    -- Update report with findings, cite every claim with [URL]
                or mark UNVERIFIED. Document contradictions between sources.
  5. GATE     -- Run mechanical quality gate (see below)
  6. LOG      -- Append round JSON to {RESULTS_FILE}
  7. BRANCH   -- ALL gate metrics PASSED -> go to VERIFICATION
                ANY metric FAILED -> go to Round N+1
                Round N = max_rounds -> go to VERIFICATION with gaps flagged

Return Contract

When autoresearch terminates and returns control to the calling agent (Mercury main session, standalone user, or orchestrator), the completion return message MUST be a content-kind constraint, not a literal field list. The constraint is: no raw findings, no raw search snippets, no full report body — only summary metadata and pointers.

Concretely, the return MUST include at minimum:

Report file path — a repo-relative path (or Mercury_KB/04-research/...-prefixed path under Mercury orchestration) to the final research report. Do NOT return absolute filesystem paths — they leak host info and are not portable across environments. Example: Mercury_KB/04-research/RESEARCH-topic-001.md or .research/reports/RESEARCH-topic-2026-05-09.md.
Verdict + per-metric scores — the gate metrics from the final round (question_answer_rate, citation_density, unverified_rate) and the verification verdict (PASS / PARTIAL / FAIL / mechanical_only).
Gap list — any research questions that remain unanswered or UNVERIFIED after all rounds, or an explicit "None" if none remain.
Optional one-line topic recap — a single sentence restating what was researched (omit if the calling context already contains the topic).

Explicitly forbidden in the return message:

Raw findings text or raw search result snippets
Raw round-by-round log entries or per-round source lists
Full report body or large excerpts from the report
Full verification checklist content (only the final verdict score, not the checklist rows)

Permitted summary metadata (conforming to this contract): The Rounds: N line, the per-metric breakdown table, and the orchestrator JSON receipt (Mercury Integration mode) are all allowed — they are summary metadata, not raw findings. The two canonical implementations of this contract are:

### Final Output template (in the Termination & Output section below) — the human-readable summary format used in standalone and interactive invocations.
## Mercury Integration JSON receipt — the machine-readable format used when running under Mercury orchestrator dispatch.

Both satisfy the content-kind constraint. Use whichever matches the invocation context; in Mercury mode, emit both (the human-readable summary for the session transcript and the JSON receipt for the orchestrator record_receipt flow).

All raw research artifacts live in .research/reports/ (standalone mode) or Mercury_KB/04-research/ (Mercury mode) and are addressable by file path. The calling agent reads the file directly if it needs detail — the return message is a pointer + verdict, not a data dump.

Cross-link: the Search Worker Protocol below enforces the same discipline inside each research round (keeping intermediate search I/O out of the autoresearch agent's context). This section extends that discipline to the final completion return to the calling agent.

Search Worker Protocol

Why: WebSearch/WebFetch return 1-3K tokens per call. A typical research round runs 9-15 searches, injecting 15-45K tokens of raw HTML/snippet text into the autoresearch agent's own context window. Over 4+ rounds this causes context pressure and has triggered session stops (Issue #215, #101 Gap 4).

Fix: isolate search I/O inside a worker sub-agent whose only job is to search and return a compressed summary. Raw search output lives and dies inside the worker's isolated context; only the summary flows back to autoresearch.

Dispatch pattern (run once per question per round when Agent() is available):

Agent(
  description: "autoresearch worker Q{n} round {r}",
  subagent_type: "general-purpose",
  prompt: |
    You are a search worker for autoresearch.
    Round: {r}
    Question: {full question text}
    Prior findings (if any): {one-line recap, max 100 tokens}

    Your ONLY job: perform 3-5 WebSearch/WebFetch calls using varied query
    angles and return a compressed summary.

    MANDATORY output format -- under 500 tokens total, nothing else:

    ## Findings for Q{n}
    - Claim 1: <one sentence> [source URL] | UNVERIFIED: <reason if no source>
    - Claim 2: <one sentence> [source URL] | UNVERIFIED: <reason if no source>
    - Claim 3: <one sentence> [source URL] | UNVERIFIED: <reason if no source>
    - Contradiction (if any): <one sentence> [URL A] vs [URL B]
    - Unanswered aspect (if any): <one sentence>

    HARD RULES:
    - DO NOT paste raw search result snippets, titles, or metadata beyond the URL
    - DO NOT narrate your search process ("I searched for...", "I found...")
    - Each claim MUST end with EITHER `[source URL]` OR `UNVERIFIED: <reason>` — never fabricate a URL to satisfy the format. Missing evidence should be explicitly marked UNVERIFIED, not papered over.
    - Count your output tokens; if over 500, cut the weakest claims before returning.
    - If fewer than 3 substantive findings exist, return what you have and mark the remaining slots UNVERIFIED with the reason.
)

The autoresearch agent ingests only the <500 token summary per worker call. Over 4 rounds × 3 questions × 500 tokens = 6K tokens of search state, versus 60-180K tokens under the old inline pattern.

Explore guardrail

When the autoresearch worker or orchestrator invokes the Explore subagent (e.g. for codebase traversal or file discovery), the Explore dispatch prompt MUST include these constraints:

Token cap: cap return at ~5K tokens (caller-stated soft cap).
Path-only preference: when matches exceed 20 files, return file paths only (one per line) — do not include file contents, snippets, or surrounding context beyond the path.
No raw file contents: never paste raw file contents into the return. Use file:line citations with at most a 1-line context excerpt per citation.
Overflow behavior (mandatory fallback): if any of the above thresholds are exceeded, the return MUST switch to path-only mode AND emit a single explicit fallback line at the top: [guardrail-fallback: <reason>; matches=<N>; tokens≈<T>; raw output suppressed — caller may re-dispatch with narrower scope]. Do not silently truncate or arbitrarily summarize — the caller must know fallback was triggered so they can re-dispatch with tighter scope.

These constraints preserve the main session's context budget. Violation risks are the same as raw-search injection: context pressure and session stops (Issue #215, #101 Gap 4).

Fallback (nested subagent mode, Codex mode, or Agent() not available): the primary context protection is much weaker in this mode because the nested agent cannot offload search I/O to a worker. Apply this reduced-budget protocol:

Cap the search budget to 1-2 WebSearch calls per question (not the normal 3-5). Breadth is sacrificed intentionally to protect the main context window from bloat.
Extract claim + URL + 1-sentence evidence immediately and write the extracted entries to the report before the current turn ends. Do this inline, not in a subsequent turn.
Allow UNVERIFIED gaps explicitly: if 1-2 searches do not cover the question, mark the remaining slots as UNVERIFIED: only N search calls permitted in fallback mode (or other specific reason) rather than extending the search budget to chase full coverage. Coverage gaps are preferable to context pressure that stops the session entirely.
Do NOT reference the raw search results again in later turns. After extraction, treat the raw output as write-once, read-once ephemeral data — no quoting, no summarizing, no re-citing. Only the extracted claim + URL entries survive into subsequent turns.
Fallback mode terminates via max_rounds, not via gate-pass. The quality gate's unverified_rate <= 0.1 threshold is intentionally hard to satisfy in fallback mode, and that is by design. The expected termination path in fallback mode is: run all max_rounds iterations, hit the "max rounds reached → VERIFICATION with gaps flagged" branch, and emit a report with explicit UNVERIFIED items. This means:
- Set MAX_ROUNDS lower in fallback mode (2-3 instead of 5-10) to avoid burning turns on a gate that will never pass
- Do NOT retry indefinitely hoping to reduce unverified_rate below 0.1 — the budget cap makes that impossible by construction
- The final output summary must flag "mode: fallback" so downstream consumers know the report has intentionally reduced citation density
- Worker mode is the only path where gate-pass termination is likely. If the caller needs a gate-passing report, they MUST provide a top-level Agent()-capable context.

Quality Gate -- Mechanical Counting

After updating the report, evaluate by counting (not self-assessment):

Step-by-step counting procedure

Read research-manifest.json -> count total_questions
Read the report file. For each question, check:
- Has >= 2 sentences of substantive answer (not just "mentioned")
- Has at least 1 source URL in that section
- If both -> count as answered
Count all declarative factual statements -> total_claims
Count claims with [URL] or inline source reference -> cited_claims
Count literal UNVERIFIED markers -> unverified_count

Compute and check

Metric	Formula	Threshold
`question_answer_rate`	answered / total_questions	>= 0.9
`citation_density`	cited_claims / total_claims	>= 0.75
`unverified_rate`	unverified_count / total_claims	<= 0.1
`iteration_depth`	current round number	>= 4

ALL FOUR must pass. If any fails, the gate FAILS. Continue to next round.

Recommended metrics (informational, not blocking)

Metric	Target
`source_diversity`	>= 4 unique domains cited

Results JSONL

Each round, append one JSON line to {RESULTS_FILE} (determined during environment detection):

{
  "round": 1,
  "timestamp": "2026-04-05T12:30:00Z",
  "questions_targeted": ["Q1", "Q3"],
  "sources_found": 5,
  "sources_verified": 4,
  "question_answer_rate": 0.6,
  "citation_density": 0.75,
  "unverified_rate": 0.1,
  "iteration_depth": 1,
  "gate_passed": false,
  "notes": "Q2 and Q5 need deeper investigation"
}

On the final round, add: termination_reason, verification_verdict, verification_score.

Context Recovery

If the session is new or resumed mid-research:

Read research-manifest.json for topic and questions
Read {RESULTS_FILE} -- find the last round metrics
Identify the lowest-scoring dimensions
Focus the current round on those gaps

This eliminates dependency on conversation context window for continuity.

Verification

When the gate passes (or max rounds reached), run verification:

Step A: Mechanical Checklist (MANDATORY -- always runs)

Re-read the final report. For each research question, confirm:

Question has a dedicated section in the report
Section contains >= 2 unique source URLs from different domains
No UNVERIFIED claims remain without justification for why verification was impossible
Contradictions between sources are documented (not suppressed)

Write the checklist results to verification-{TOPIC}.md in the state directory.

Step B: Adversarial Review (OPTIONAL -- attempted if Agent() is available)

IF you are the top-level agent (not running inside another subagent):

Spawn a verification subagent:

Agent(
  description: "Verify autoresearch report quality",
  prompt: [see below]
)

Verification prompt:
  You are a Research Quality Verification Agent. You are READ-ONLY.
  Read the report at [report path].
  Read research-manifest.json for the original questions.
  Read results.jsonl for iteration history.

  Evaluate on a 1-5 scale:
  1. Question Coverage -- Are all research questions substantively answered?
  2. Citation Density -- Do factual claims cite sources?
  3. Actionability -- Can the findings be acted upon?
  4. Risk Honesty -- Are limitations and uncertainties clearly stated?

  Weights: coverage=0.3, citation=0.25, actionability=0.25, risk_honesty=0.2
  Pass threshold: weighted average >= 4.0

  Return: VERDICT (PASS/PARTIAL/FAIL) + per-dimension scores + gaps list.
  Do NOT modify any files.

IF Agent() is NOT available (subagent context, Codex, or fork mode): Skip Step B. Log "verification_mode": "mechanical_only" in results.jsonl. Mechanical verification from Step A is sufficient for standalone operation.

Termination & Output

Condition	Action
Gate passed + verification PASS	Complete -- print summary
Gate passed + verification PARTIAL	Address gaps, re-verify
Gate passed + verification FAIL	Continue research rounds
Max rounds reached	Flag incomplete items + print summary
Human interruption	Save state + print current progress

Final Output

When research terminates, print a summary to the conversation:

## Autoresearch Complete

- **Topic**: ...
- **Rounds**: N
- **Gate Metrics**: question_answer_rate=X, citation_density=X, unverified_rate=X
- **Verification**: PASS/PARTIAL/FAIL (or mechanical_only)
- **Report**: [file path]
- **Gaps**: [list any remaining gaps, or "None"]

State Externalization

All research state lives in files, not in conversation memory:

File	Purpose
`research-manifest.json`	Topic, questions, config
`{RESULTS_FILE}`	Per-round metrics log (`results.jsonl` standalone, `results-{ISSUE_NUM}.jsonl` Mercury)
`RESEARCH-{TOPIC}-*.md`	The research report
`verification-{TOPIC}.md`	Verification checklist results

This means:

A new session can pick up where a previous one left off
Context window exhaustion does not lose progress
Multiple agents can read the same state

Mercury Integration

When running under Mercury orchestrator (auto-detected via Mercury_KB/04-research/ existence):

Report and state files use Mercury KB paths instead of .research/
TaskBundle fields (researchScope, readScope, definitionOfDone) are read from the dispatch prompt
Results JSONL uses issue number: results-{ISSUE_NUM}.jsonl
Receipt JSON format follows Mercury SoT workflow
On completion, output a JSON receipt for the orchestrator record_receipt flow

The skill auto-detects this. No manual configuration needed.

Mercury JSON receipt schema

When running under Mercury orchestrator, emit a final JSON receipt of this shape (this is one canonical implementation of the Return Contract):

{
  "topic": "<research topic>",
  "rounds": "<int>",
  "report_path": "<repo-relative path under Mercury_KB/04-research/>",
  "verdict": "PASS|PARTIAL|FAIL|mechanical_only",
  "metrics": {
    "question_answer_rate": "<float>",
    "citation_density": "<float>",
    "unverified_rate": "<float>",
    "iteration_depth": "<int>"
  },
  "gaps": ["<gap 1>", "..."],
  "termination_reason": "gate_passed|max_rounds|interrupted"
}

The report_path field MUST be a repo-relative path (no absolute filesystem paths — see Return Contract for rationale). The receipt content is summary metadata only — no raw findings, no raw search snippets, no full report body.

name	autoresearch
description	Autonomous iterative research with mechanical quality gates — multi-round loops, per-round verification, agent doesn't self-decide completion (gate does). Use this proactively for ≥3 research questions or multi-source verification — even if the user just says 'research X'. For quick lookups (1–2 questions, single source), use web-research instead. Triggers: 'autoresearch', '自动研究', '深度调研', 'deep research', '多轮调研', 'comprehensive research'. Works standalone or under Mercury dispatch.
user-invocable	true
allowed-tools	WebSearch, WebFetch, Read, Write, Grep, Glob, Agent, Bash

name	autoresearch
description	Autonomous iterative research with mechanical quality gates — multi-round loops, per-round verification, agent doesn't self-decide completion (gate does). Use this proactively for ≥3 research questions or multi-source verification — even if the user just says 'research X'. For quick lookups (1–2 questions, single source), use web-research instead. Triggers: 'autoresearch', '自动研究', '深度调研', 'deep research', '多轮调研', 'comprehensive research'. Works standalone or under Mercury dispatch.
user-invocable	true
allowed-tools	WebSearch, WebFetch, Read, Write, Grep, Glob, Agent, Bash

autoresearch

More from this repository

More from this repository

Autoresearch Protocol

Purpose

When This Applies

Iron Rules

Rationalization Prevention

Invocation & Bootstrap

Argument Parsing

Environment Detection (do this FIRST)

Research Manifest

Research Loop

Return Contract

Search Worker Protocol

Explore guardrail

Quality Gate -- Mechanical Counting

Step-by-step counting procedure

Compute and check

Recommended metrics (informational, not blocking)

Results JSONL

Context Recovery

Verification

Step A: Mechanical Checklist (MANDATORY -- always runs)

Step B: Adversarial Review (OPTIONAL -- attempted if Agent() is available)

Termination & Output

Final Output

State Externalization

Mercury Integration

Mercury JSON receipt schema

Autoresearch Protocol

Purpose

When This Applies

Iron Rules

Rationalization Prevention

Invocation & Bootstrap

Argument Parsing

Environment Detection (do this FIRST)

Research Manifest

Research Loop

Return Contract

Search Worker Protocol

Explore guardrail

Quality Gate -- Mechanical Counting

Step-by-step counting procedure

Compute and check

Recommended metrics (informational, not blocking)

Results JSONL

Context Recovery

Verification

Step A: Mechanical Checklist (MANDATORY -- always runs)

Step B: Adversarial Review (OPTIONAL -- attempted if Agent() is available)

Termination & Output

Final Output

State Externalization

Mercury Integration

Mercury JSON receipt schema