Run any Skill in Manus with one click

autoresearch

Stars1

Forks0

UpdatedApril 11, 2026 at 01:21

Iterative optimization loop. Evaluates an artifact (prompt, code, config) against test cases with binary assertions, analyzes failures, generates targeted variants, and promotes winners. Use when optimizing any artifact for higher eval pass rates.

Installation

Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.

Run Skill in Manus

Source

sighup

sighup/autoresearch

View GitHub Repository View Creator Repositories

Download

Run Skill in Manus

Related occupationsSOC

Based on SOC occupation classification

Software Quality Assurance Analysts and TestersComputer and Mathematical Occupations·SOC 15-1253

File Explorer

4 files

SKILL.md

readonly

Run any Skill with one click

name	autoresearch
description	Iterative optimization loop. Evaluates an artifact (prompt, code, config) against test cases with binary assertions, analyzes failures, generates targeted variants, and promotes winners. Use when optimizing any artifact for higher eval pass rates.
argument-hint	[path/to/artifact \| find \| clean] [optional goal or constraints]
disable-model-invocation	true
allowed-tools	Bash(autoresearch-runner) Bash(cp ) Bash(rm .autoresearch/) Bash(mv .autoresearch/) Bash(mkdir ) Bash(git ) Agent Read Write Edit Glob Grep AskUserQuestion

AutoResearch: Iterative Optimization

You help users set up and run iterative optimization of prompts, code, configs, or any artifact that can be assessed with binary assertions.

User arguments: $ARGUMENTS

All working state lives under .autoresearch/ in the user's current working directory.

Cleanup mode

If the user's argument is "clean" or "cleanup" (e.g. /autoresearch clean), skip everything else. Instead:

Check if .autoresearch/ exists. If not, tell the user there's nothing to clean up.
Show the user what's in .autoresearch/ — number of result files, candidates, history entries, and the current config.
Ask what they want to do:
- Remove everything — delete the entire .autoresearch/ directory
- Keep assertions and test cases — delete prompts/, results/, and config.json but preserve assertions.py and test_cases.jsonl (useful for re-running later with a different prompt)
- Cancel — do nothing

After cleanup, confirm what was removed.

Discovery mode

If the user's argument is "find", "scout", or "discover" (e.g. /autoresearch find), skip Phase 1 and help them find candidates in their repo.

Read ${CLAUDE_PLUGIN_ROOT}/skills/autoresearch/references/candidates.md for the discovery heuristics.
Scan the repo using those heuristics:
- Use Glob for file pattern matches (prompts, configs, CI files)
- Use Grep for inline prompt detection and slowness signals
- Check for existing eval infrastructure
Rank candidates by setup effort (easy first) and signal strength. Cap at 5-8 results.
Present findings in this format:

I found [N] candidates in this repo. Ranked by how easy they'd be to start:

1. [Name] — [path]

Type: [prompt / performance / quality / LLM integration]

Signal: [one-line reason]

Setup: [easy (prompt mode) / medium (custom runner)]

Start with: /autoresearch [path]

2. [Name] ...

Which one do you want to pursue? I can either start the setup now, or you can run /autoresearch <path> later.

If the user picks one, proceed to Phase 1 with that artifact. Otherwise, end — they can come back when ready.

Resume mode

If .autoresearch/config.json already exists and the user didn't pass "clean", the setup is already done. Skip to Phase 2 to launch the optimization loop.

Phase 1: Guided Setup

Walk the user through setup interactively. Check what exists, ask about what's missing, and help them build what they need.

Step 1: Identify the artifact

If the user passed a file path as the first argument (e.g. /autoresearch src/prompts/summarizer.txt), use that file.
Otherwise, check if .autoresearch/config.json has an "artifact" or "prompt" field and use that.
If neither, ask the user:

What do you want to optimize? This can be a prompt file, code, config, or any artifact. Give me a file path, or describe what it does and I'll help you find it.

Read the artifact once identified. You need to understand what it does to help with assertions and test cases.

Step 1b: Determine if a custom runner is needed

If the artifact is a prompt file (text that will be used as a system prompt for Claude), no custom runner is needed — the built-in SDK assessment works directly.

If the artifact is anything else (code, config, scripts), you need a custom runner — a shell command that takes the artifact, runs it or applies it, and produces output for assertions to grade. Ask the user:

This looks like [code/config/etc.], not a prompt. To assess it, I need a command that runs or applies it and produces measurable output.

For example, if you're optimizing test performance, the runner might be bash run_tests.sh which runs the test suite and reports timing.

What command should I use to assess each variant? It will receive these environment variables:

AUTORESEARCH_ARTIFACT — path to the artifact

AUTORESEARCH_TEST_ID — which test case is being run

AUTORESEARCH_TEST_INPUT — the test case input text

Its stdout becomes the text that assertions check.

If the user needs help writing the runner script, help them create one. Read ${CLAUDE_PLUGIN_ROOT}/skills/autoresearch/references/custom_runner_example.sh for the expected contract.

Step 2: Identify or create assertions

Check .autoresearch/config.json for an "assertions" path, then fall back to .autoresearch/assertions.py.

If no assertions file exists, guide the user through creating one. Read the artifact and ask:

I've read your [prompt/code/config]. To optimize it, I need to know what "good output" looks like. Here's what I noticed:

[list 3-5 properties you observed, e.g. "It should produce markdown with specific sections", "Tests should all pass", "Execution time should be under a threshold"]

Which of these matter most? Are there other properties you want to enforce? I'll turn these into assertions — binary pass/fail checks that each test case must satisfy.

Based on the user's response, generate .autoresearch/assertions.py. Read ${CLAUDE_PLUGIN_ROOT}/skills/autoresearch/references/assertions_format.py for the expected format.

Good assertions are:

Structural — Check that sections, formats, or patterns exist (not subjective quality)
Binary — Unambiguous pass or fail
Independent — Each tests one thing
Named clearly — assert_has_error_handling not assert_check_3
Pure stdlib — Only use re, json, string, etc. No external dependencies.

Step 3: Identify or create test cases

Check .autoresearch/config.json for a "test_cases" path, then fall back to .autoresearch/test_cases.jsonl.

If no test cases file exists, guide the user through creating them. Ask:

What kinds of inputs will this prompt handle? Describe the categories or give me a few examples. I'll generate a test suite.

Tips for good test cases:

Cover each category the prompt should handle

Mix simple and complex inputs

Include edge cases where the prompt might struggle

Use realistic inputs — the closer to real usage, the better

Based on the user's response, generate .autoresearch/test_cases.jsonl. Each line should be {"id": "...", "input": "...", "category": "..."}.

Aim for 10-20 test cases across 3-5 categories. Fewer test cases means faster cycles; more means higher confidence.

Step 4: Confirm and initialize

Summarize the setup for the user:

Here's what I've set up:

Artifact: [path] — [brief description of what it does]

Runner: [built-in SDK / custom command]

Assertions: [count] checks — [list names]

Test cases: [count] cases across [count] categories — [list categories]

Ready to start the optimization loop?

Wait for confirmation before proceeding.

Then initialize .autoresearch/:

Prompt mode (no custom runner):

mkdir -p .autoresearch/prompts/candidates .autoresearch/prompts/history .autoresearch/results
cp <source-prompt> .autoresearch/prompts/current.txt

{
  "artifact": "<path to source prompt>",
  "assertions": "<path to assertions file>",
  "test_cases": "<path to test cases file>",
  "model": "sonnet"
}

Custom runner mode:

mkdir -p .autoresearch/history .autoresearch/results

{
  "artifact": "<path to artifact being optimized>",
  "runner": "<shell command to run assessment>",
  "assertions": "<path to assertions file>",
  "test_cases": "<path to test cases file>"
}

The model field is optional (defaults to "sonnet", only used in prompt mode). The runner field triggers custom runner mode.

Phase 2: Launch Optimization Loop

After setup is complete (or when resuming an existing config), launch the optimization loop in a subagent:

Read the loop instructions from ${CLAUDE_PLUGIN_ROOT}/agents/optimization-loop.md (skip the YAML frontmatter)
Use the Agent tool with:
- prompt: The loop instructions you just read, followed by any user constraints from $ARGUMENTS (e.g. "target 95% pass rate", "focus on auth category")
- run_in_background: false
- Do NOT set subagent_type — use the default general-purpose agent

Running in the foreground keeps the user in the loop — they can see progress, approve permissions, and the loop agent can spawn its own parallel subagents for candidate evaluation.

name	autoresearch
description	Iterative optimization loop. Evaluates an artifact (prompt, code, config) against test cases with binary assertions, analyzes failures, generates targeted variants, and promotes winners. Use when optimizing any artifact for higher eval pass rates.
argument-hint	[path/to/artifact \| find \| clean] [optional goal or constraints]
disable-model-invocation	true
allowed-tools	Bash(autoresearch-runner) Bash(cp ) Bash(rm .autoresearch/) Bash(mv .autoresearch/) Bash(mkdir ) Bash(git ) Agent Read Write Edit Glob Grep AskUserQuestion