تشغيل أي مهارة في Manus بنقرة واحدة

$pwd:

autoresearch-plan

Name: Autoresearch Plan
Author: wjgoarxiv

// 7-step setup wizard that produces a complete, ready-to-run research.md without executing the research loop. Walks the user through goal, metric, search space, constraints, evaluator design, and baseline measurement, then writes the file. TRIGGER when: user wants to set up a research project; user wants to plan before running the loop; user says "plan my research"; user has a goal but no research.md; user invokes /autoresearch:plan. DO NOT TRIGGER when: research.md already exists and the user wants to run the loop; user wants a one-shot answer; user wants to debug, not optimize.

تشغيل في Manus

$ git log --oneline --stat

stars:١٦

forks:٢

updated:٥ أبريل ٢٠٢٦ في ٠٦:٥٦

SKILL.md

readonly

related-skills.json

نفس المستودع

autoresearch-skill.md

from "wjgoarxiv/autoresearch-skill"

Autonomous research and experimentation toolkit with 10 commands. Core loop inspired by Karpathy's autoresearch — generalizes to any domain with mechanical evaluation, overnight persistence, and zero dependencies. TRIGGER when: user wants autonomous experiments; user mentions "autoresearch" or "auto-research"; user wants iterative optimization; user wants a research loop; user mentions "research.md"; user wants to iterate until some condition; user wants to optimize code, prompts, configs, or parameters iteratively; user invokes any /autoresearch:* subcommand. DO NOT TRIGGER when: user wants a one-shot answer; user wants manual step-by-step guidance; user just wants to read a single paper; user wants a simple web search.

2026-04-0516

pdf.md

from "wjgoarxiv/autoresearch-skill"

Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When the LLM (Claude, ChatGPT, Gemini, or others) needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.

2026-04-0516

autoresearch.md

from "wjgoarxiv/autoresearch-skill"

Core autonomous research loop. Reads research.md, proposes hypotheses, runs experiments, evaluates results mechanically, keeps improvements, discards failures, and iterates until the target metric is achieved or the iteration budget is exhausted. TRIGGER when: user invokes "autoresearch" (no subcommand); research.md exists; user wants the 5-stage loop; user wants iterative optimization overnight.

2026-04-0516

autoresearch-debug.md

from "wjgoarxiv/autoresearch-skill"

Scientific bug hunting using falsifiable hypotheses. Forms hypotheses, designs falsifying tests, eliminates candidates systematically, and logs the full investigation trail in a structured debug/ folder. TRIGGER when: user has a bug to investigate scientifically; user wants systematic root-cause analysis; user says "debug", "investigate", "root cause", "why is this failing"; user invokes /autoresearch:debug. DO NOT TRIGGER when: user wants to optimize a metric (use /autoresearch); user wants to fix a known error automatically (use /autoresearch:fix); user just wants a quick one-line answer about what a function does.

2026-04-0516

autoresearch-fix.md

from "wjgoarxiv/autoresearch-skill"

Iterative error-crusher loop that auto-stops at 0 errors. Cascade-aware: fixes dependency errors before their dependents. Refuses anti-patterns that hide errors instead of fixing them. TRIGGER when: user has errors or failures to fix iteratively; user asks to "fix all errors"; user has a failing test suite; user has compilation errors; user has linter errors; user wants systematic error elimination; user invokes /autoresearch:fix. DO NOT TRIGGER when: user wants a one-shot fix for a single obvious bug; user wants debugging guidance only; user wants code review without fixing.

2026-04-0516

autoresearch-predict.md

from "wjgoarxiv/autoresearch-skill"

Multi-perspective deliberation engine. Gathers independent positions from diverse personas, runs cross-examination and rebuttal rounds, detects herd behavior, and synthesizes a neutral judge verdict with confidence levels. TRIGGER when: user wants multi-perspective prediction, forecasting, scenario analysis, decision analysis, "what will happen if", "should we", "predict the outcome of", structured devil's advocacy, or any question benefiting from adversarial deliberation.

2026-04-0516

package.json

"author": "wjgoarxiv"

"repository": "wjgoarxiv/autoresearch-skill"

فتح مستودع GitHub عرض مستودعات المنشئ

$ install --global

$ download --local

تشغيل في Manus

$ useful --forSOC

علماء الأحياء، جميع الآخرونعلوم الحياة والطبيعة والاجتماع19-1029L4

name	autoresearch:plan
description	7-step setup wizard that produces a complete, ready-to-run research.md without executing the research loop. Walks the user through goal, metric, search space, constraints, evaluator design, and baseline measurement, then writes the file. TRIGGER when: user wants to set up a research project; user wants to plan before running the loop; user says "plan my research"; user has a goal but no research.md; user invokes /autoresearch:plan. DO NOT TRIGGER when: research.md already exists and the user wants to run the loop; user wants a one-shot answer; user wants to debug, not optimize.
allowed-tools	["Read","Write","Edit","Bash"]

autoresearch:plan — Research Setup Wizard

A 7-step interview that produces a complete research.md (and optionally evaluate.py) before a single experiment runs. The wizard is conversational — ask each step, wait for the answer, then proceed. Do not batch all questions at once.

Wizard Protocol

One step at a time. Present the step title and question(s). Wait for the user's response. Summarize what you recorded ("Got it — I'll set metric: accuracy, direction: maximize"). Then proceed to the next step.

Do not skip steps. Each step produces a concrete artifact that feeds Step 7. If the user's answer is vague, probe once for specificity, then record your best interpretation and note it as an assumption.

Step 1 — Goal Clarification

Probe for specificity. Vague goals produce useless research loops.

Ask:

"What are you trying to improve or discover?"
"What does success look like in concrete terms — not 'better', but what number or outcome?"
"Is there anything this work must NOT break?"

Probe rules:

If the answer contains words like "better", "faster", "improve" without a reference point → ask "compared to what baseline?"
If no domain is mentioned → ask "what system/file/model/prompt are we working on?"
If multiple goals are stated → ask "if you could only achieve one of these, which one?"

Record: goal_statement (1-2 sentences, specific and measurable)

Step 2 — Metric Definition

What to measure, how to measure it, and what direction counts as progress.

Ask:

"What is the single number that determines if this experiment succeeded or failed?"
"Are you maximizing or minimizing it?"
"What value would make you stop and say 'we're done'? That's the target."
"Is this metric noisy? (e.g., varies between runs due to randomness or timing)"

Guide the user if stuck:

Performance → latency (ms), throughput (req/s), memory (MB) — direction: minimize
Quality → accuracy (%), F1, LLM-judge score (1-10) — direction: maximize
Cost → tokens, dollars, lines of code — direction: minimize

Record: metric_name, direction (maximize/minimize), target_value, noise_runs (1 if deterministic, 3-5 if noisy)

Step 3 — Search Space Mapping

Enumerate what can change and, critically, what must not.

Ask:

"What files, parameters, configs, or components can the agent modify?"
"What must never change? (test sets, APIs, data formats, production files)"
"Are there any values with hard limits? (e.g., latency must never exceed 2s even if the metric improves)"

Probe rules:

If the allowed scope is very broad → ask "can you narrow it? Broad search spaces waste iterations."
If no forbidden list is given → explicitly confirm: "So the agent has free rein except for what you just listed — is that right?"

Record: allowed_changes (bullet list), forbidden_changes (bullet list), guard (optional hard constraint)

Step 4 — Constraint Elicitation

Scope the loop before it starts.

Ask:

"How many iterations should the agent run? (default: 20 — more = more thorough, takes longer)"
"Do you want the agent to pause for your review at any point, or run fully unattended?"
"Any resource limits? (time per experiment, memory, API rate limits, cost caps)"

If the user wants overnight/unattended:

Set pause_every: never
Suggest: nohup bash scripts/autoresearch-loop.sh ./research-dir/ > autoresearch.log 2>&1 &
Remind: "You can monitor progress anytime with bash scripts/check_progress.sh"

If the user wants periodic reviews:

Ask: "Every how many iterations?" → set pause_every: N

Record: max_iterations, pause_every, time_budget_per_experiment (default: 5 minutes), any resource constraints

Step 5 — Evaluator Design

Can measurement be automated? This determines loop speed and quality.

Ask: "Can the success metric be measured by running a script? For example, python evaluate.py that outputs a number."

If YES — help write the evaluator:

Guide the user to produce a script that prints:

{"pass": true, "score": 0.94}

Ask clarifying questions:

"Where is the test data / benchmark?"
"What command runs the current implementation?"
"What command extracts the metric from the output?"

Offer to write a evaluate.py starter template based on their answers. Use the appropriate example from below as a starting point, adapt it to their domain, and write it to the research directory.

Template — timing/benchmark (minimize):

#!/usr/bin/env python3
import json, subprocess, statistics, time
times = []
for _ in range(3):
    t0 = time.perf_counter()
    subprocess.run(["python", "TARGET_SCRIPT.py"], check=True)
    times.append(time.perf_counter() - t0)
median = statistics.median(times)
print(json.dumps({"pass": median < TARGET_VALUE, "score": median}))

Template — accuracy/quality (maximize):

#!/usr/bin/env python3
import json, subprocess
result = subprocess.run(["python", "test_suite.py"], capture_output=True, text=True)
score = float(result.stdout.strip().split("score:")[-1].strip())
print(json.dumps({"pass": score > TARGET_VALUE, "score": score}))

Keep policy: Ask: "Keep only if strictly better than best so far (score_improvement), or keep anything that passes the threshold (pass_only)?"

If NO — record manual evaluation:

Note in research.md: Evaluator: _(none — agent judges manually)_

Explain: "The agent will evaluate each experiment using its own judgment. This is slower and less reliable than a script — consider adding a script later."

Record: evaluator_command (or none), keep_policy

Step 6 — Baseline Measurement (Dry-Run Verify Gate)

Run the evaluator NOW to establish iteration 0. This is mandatory before writing research.md.

If an evaluator was designed in Step 5:

# Run the evaluator dry run
python evaluate.py

Expected output: a JSON line like {"pass": true, "score": 0.73}

If the evaluator runs successfully:

Record the score as the baseline (iteration 0)
Say: "Baseline confirmed: [metric_name] = [score]. This is your iteration 0."
Proceed to Step 7.

If the evaluator fails (non-zero exit, invalid JSON, crash):

Do NOT proceed to Step 7.
Diagnose the error: read the stderr, identify the cause.
Fix evaluate.py and re-run.
Repeat until the evaluator runs cleanly.
Only then proceed to Step 7.

If no evaluator (manual evaluation):

Ask the user to measure the current state manually: "Before we start, what is the current value of [metric_name]?"
Record their answer as the baseline.
Proceed to Step 7.

Record: baseline_score (iteration 0 value)

Step 7 — research.md Generation

Write the fully populated research.md using all recorded values.

If scripts/init_research.py is available:

python scripts/init_research.py \
  --goal "GOAL_STATEMENT" \
  --metric "METRIC_NAME" \
  --direction "DIRECTION" \
  --target "TARGET_VALUE" \
  --evaluator "EVALUATOR_COMMAND" \
  --output ./research-dir/

If the script is not available, write research.md directly using this structure:

# Research: GOAL_TITLE

## Goal
GOAL_STATEMENT

## Success Metric
- **Metric:** METRIC_NAME
- **Target:** TARGET_VALUE
- **Direction:** DIRECTION

## Constraints
- **Max iterations:** MAX_ITERATIONS
- **Time budget per experiment:** 5 minutes
- **Pause for review every:** PAUSE_EVERY
- **Evaluator:** EVALUATOR_COMMAND
- **Keep policy:** KEEP_POLICY
- **Guard:** GUARD (if any)
- **Noise runs:** NOISE_RUNS
- **Min delta:** 0

## Current Approach
BASELINE_DESCRIPTION

## Search Space
- **Allowed changes:** ALLOWED_CHANGES
- **Forbidden changes:** FORBIDDEN_CHANGES

## Context & References
REFERENCES (if any)

---

## History
| # | Change | Metric | Result | Timestamp |
|---|--------|--------|--------|-----------|
| 0 | Baseline | BASELINE_SCORE | -- | TODAY |

After writing:

Confirm the file was written: "research.md is ready at [path]."
If evaluate.py was written, confirm that too.
Print the next command:

To start the loop, tell your agent:
  "Run autoresearch on ./research-dir/research.md"

Or for overnight unattended:
  nohup bash scripts/autoresearch-loop.sh ./research-dir/ > autoresearch.log 2>&1 &
  bash scripts/check_progress.sh ./research-dir/

Chain suggestion: "When the loop completes, run /autoresearch:ship to publish the results."

Output Checklist

Before declaring the wizard complete, verify:

research.md written with all sections populated (no TBD or TODO placeholders)
Baseline score recorded in the History table (iteration 0)
Evaluator dry-run passed (or manual baseline confirmed)
evaluate.py written if automated evaluation was chosen
Next-step command printed for the user

autoresearch-plan

المزيد من هذا المستودع

المزيد من هذا المستودع

autoresearch:plan — Research Setup Wizard

Wizard Protocol

Step 1 — Goal Clarification

Step 2 — Metric Definition

Step 3 — Search Space Mapping

Step 4 — Constraint Elicitation

Step 5 — Evaluator Design

If YES — help write the evaluator:

If NO — record manual evaluation:

Step 6 — Baseline Measurement (Dry-Run Verify Gate)

Step 7 — research.md Generation

Output Checklist

autoresearch:plan — Research Setup Wizard

Wizard Protocol

Step 1 — Goal Clarification

Step 2 — Metric Definition

Step 3 — Search Space Mapping

Step 4 — Constraint Elicitation

Step 5 — Evaluator Design

If YES — help write the evaluator:

If NO — record manual evaluation:

Step 6 — Baseline Measurement (Dry-Run Verify Gate)

Step 7 — research.md Generation

Output Checklist