一键在 Manus 中运行任何 Skill

$pwd:

autoresearch

Name: Autoresearch
Author: ruizrica

// Autonomous Goal-directed Iteration. Apply Karpathy's autoresearch principles to ANY task. Loops autonomously — modify, verify, keep/discard, repeat. Invoke with /skill:autoresearch or when user says "work autonomously", "iterate until done", "keep improving", or "run overnight".

在 Manus 中运行

$ git log --oneline --stat

stars:195

forks:33

updated:2026年3月21日 05:07

文件资源管理器

4 个文件

SKILL.md

readonly

related-skills.json

同仓库

qa-device-management.md

from "ruizrica/agent-pi"

Boot and control iOS Simulators and Android Emulators for QA testing using agent-device CLI. Manage sessions, capture screenshots, launch apps, and control devices across platforms. Invoke when user says "bring up simulators", "boot the device", "take a screenshot", "launch the app", "compare iOS and Android", "test both platforms", "visual parity check", or any task requiring device/simulator lifecycle management.

2026-03-25195

qa-scroll.md

from "ruizrica/agent-pi"

QA test skill for verifying scroll-based media feeds — video autoplay, scroll navigation, mute/unmute, and playback progression. Uses the CDP + agent-device dual-driver architecture. Works with any React Native app using expo-video or similar video player libraries. Invoke when user says "test scroll feed", "test video playback", "QA the feed", "verify video autoplay", "run scroll tests", "test media player", or any task requiring scroll-based media feed verification.

2026-03-25195

qa-setup.md

from "ruizrica/agent-pi"

Verify and install all dependencies for the QA Automation skill package. Checks for agent-device, agent-browser, Node.js, iOS/Android tools, and the 'ws' WebSocket library. Installs missing tools automatically. Invoke when user says "check qa setup", "install qa tools", "verify qa dependencies", "set up qa automation", "prepare for testing", or any task requiring QA tool verification.

2026-03-25195

qa-state-persistence.md

from "ruizrica/agent-pi"

QA test skill for verifying UI state persistence across navigation. Tests that state changes (likes, bookmarks, cart items, form inputs) survive scrolling away and returning. Uses CDP + agent-device dual-driver architecture. Works with any React Native app that uses list-based feeds or scrollable content. Invoke when user says "test state persistence", "test like state", "verify bookmark persists", "QA the state", "test data survives scroll", "verify UI state", or any task requiring state persistence verification across navigation.

2026-03-25195

qa-test-flows.md

from "ruizrica/agent-pi"

Automated UI test flow framework using CDP + agent-device for native apps and agent-browser for web apps. Zero-framework approach — bash scripts orchestrate simulators with no Detox, Maestro, or Appium dependencies. Provides test lifecycle, screenshot capture, assertion helpers, and JSON report generation. Invoke when user says "run tests", "test the app", "smoke test", "regression test", "verify flows", "run e2e tests", or any task requiring automated UI testing.

2026-03-25195

qa-web.md

from "ruizrica/agent-pi"

QA test skill for web applications using agent-browser. Tests forms, navigation, responsive layouts, state persistence via cookies/localStorage, and visual regression. The web counterpart to agent-device — together they cover native + web testing. Invoke when user says "test the web app", "test localhost", "QA the website", "test the form", "verify responsive layout", "run web tests", "test browser", or any task requiring automated web application testing.

2026-03-25195

package.json

"author": "ruizrica"

"repository": "ruizrica/agent-pi"

打开 GitHub 仓库查看创作者相关仓库

$ install --global

$ download --local

在 Manus 中运行

$ useful --forSOC

数据科学家计算机与数学类职业15-2051L4

name	autoresearch
description	Autonomous Goal-directed Iteration. Apply Karpathy's autoresearch principles to ANY task. Loops autonomously — modify, verify, keep/discard, repeat. Invoke with /skill:autoresearch or when user says "work autonomously", "iterate until done", "keep improving", or "run overnight".
allowed-tools	Bash(git:) Bash(npm:) Bash(npx:*) Read Write Edit ask_user show_plan show_research subagent_create_batch dispatch_agent commander_task commander_mailbox show_report

Autoresearch — Autonomous Goal-directed Iteration

Inspired by Karpathy's autoresearch. Applies constraint-driven autonomous iteration to ANY work — not just ML research.

Core idea: You are an autonomous agent. Modify -> Verify -> Keep/Discard -> Repeat.

When to Activate

User invokes /skill:autoresearch or /autoresearch
User says "work autonomously", "iterate until done", "keep improving", "run overnight"
Any task requiring repeated iteration cycles with measurable outcomes

Phase 1: Understand (Do This First — Before ANY Work)

Before touching any files, deeply understand the goal. Do NOT rush into iteration.

Read relevant files — Scan the codebase to build context around the user's goal. Understand what exists, what patterns are in use, and what's realistic.
Identify ambiguities — Based on the goal and codebase context, what's unclear?
- Is the success metric obvious or ambiguous?
- Is the scope (which files to modify) clear?
- Are there constraints the user hasn't mentioned?
- Are there multiple valid interpretations?

Ask clarifying questions — If ANY ambiguity exists, use ask_user to ask targeted questions:

ask_user {
  question: "I have a few questions before I build the research plan:",
  mode: "questions",
  options: [
    { label: "1. What metric should define success? (e.g. test coverage %, build time ms, bundle size KB)" },
    { label: "2. Which files/directories are in scope for modification?" },
    { label: "3. Are there any approaches to avoid or constraints I should know about?" },
    { label: "4. What does 'done' look like — a specific target, or iterate until interrupted?" }
  ]
}

Tailor questions to the specific goal. Don't ask about what's already clear. Ask about genuine ambiguities.

Skip if crystal clear — If the goal is unambiguous (clear metric, scope, exit criteria), skip questions and proceed to Phase 2. State briefly why no questions are needed.
Synthesize understanding — Form a concrete statement: Goal, Metric (what + direction + verify command), Scope (in/out), Constraints, Exit criteria.
Save research session — Create .context/research-sessions/<session-id>.json with the initial session data: goal, metric, scope, clarifying Q&A, status "understanding". Store the session_id for updates throughout the lifecycle.

Phase 2: Plan (Present Before Executing)

Now write and present a research plan for user approval. Do NOT start iterating without approval.

Establish baseline — Run the verification command to get a starting metric value.

Write the research plan — Create .context/autoresearch-plan.md:

# Autoresearch Plan: <goal summary>

## Goal
<Concrete goal statement>

## Metric
- **Measuring:** <what>
- **Direction:** <higher/lower is better>
- **Verify command:** `<command>`
- **Baseline:** <current value>
- **Target:** <target value or "continuous improvement">

## Scope
- **In scope:** <files/directories to modify>
- **Read only:** <files for context only>
- **Out of scope:** <excluded areas>

## Strategy
Ordered approaches, most to least promising:
1. <First approach — why promising>
2. <Second approach — what it explores>
3. <Third approach — alternative angle>
4. <Fourth approach — radical idea>
5. <Fifth approach — simplification play>

## Iteration Plan
- **Mode:** <bounded (N) / unbounded>
- **Estimated time per iteration:** <seconds/minutes>
- **When stuck:** Re-read plan, combine near-misses, try opposites

## Exit Criteria
- <When to stop>

Present for approval:
```
show_plan { file_path: ".context/autoresearch-plan.md", title: "Autoresearch Plan: <goal>" }
```
- Approved → proceed to Phase 3
- Declined → revise based on feedback and re-present
Update session — Set status to "planning", save plan content and baseline metric.

Phase 3: Setup & Begin

With understanding confirmed and plan approved, set up tracking and start.

Create results log — Create autoresearch-results.tsv (see references/results-logging.md)
Record baseline — Log the baseline metric from Phase 2 as iteration #0
Commander tracking — If available, create task group and broadcast start (see Commander Integration below)
Update session — Set status to "researching"
Begin the loop — Start iterating immediately. No further confirmation needed.

The Loop

Read references/autonomous-loop-protocol.md for full protocol details.

LOOP (FOREVER or N times):
  1. Review: Read current state + git history + results log
  2. Ideate: Pick next change based on goal, past results, what hasn't been tried
  3. Modify: Make ONE focused change to in-scope files
  4. Commit: Git commit the change (before verification)
  5. Verify: Run the mechanical metric (tests, build, benchmark, etc.)
  6. Decide:
     - IMPROVED -> Keep commit, log "keep", advance
     - SAME/WORSE -> Git revert, log "discard"
     - CRASHED -> Try to fix (max 3 attempts), else log "crash" and move on
  7. Log: Record result in results log
  8. Repeat: Go to step 1.
     - If unbounded: NEVER STOP. NEVER ASK "should I continue?"
     - If bounded (N): Stop after N iterations, print final summary

Critical Rules

Loop until done — Unbounded: loop until interrupted. Bounded: loop N times then summarize.
Read before write — Always understand full context before modifying
One change per iteration — Atomic changes. If it breaks, you know exactly why
Mechanical verification only — No subjective "looks good". Use metrics
Automatic rollback — Failed changes revert instantly. No debates
Simplicity wins — Equal results + less code = KEEP. Tiny improvement + ugly complexity = DISCARD
Git is memory — Every kept change committed. Agent reads history to learn patterns
When stuck, think harder — Re-read files, re-read goal AND .context/autoresearch-plan.md for planned strategy, try next untried approach from the plan, combine near-misses, try radical changes. Don't ask for help unless truly blocked by missing access/permissions

Principles Reference

See references/core-principles.md for the 7 generalizable principles from autoresearch.

Commander Integration (Task Tracking & Visibility)

When Commander is available, autoresearch MUST track every iteration as a Commander task. This gives the dashboard full visibility into autonomous work — just like the tasks extension does for manual workflows.

Setup Phase (Phase 3) — Create Task Group

After establishing the baseline and getting plan approval, create a Commander task group for this research session:

commander_task {
  operation: "group:create",
  group_name: "Autoresearch: <goal summary>",
  initiative_summary: "<full goal description with metric and scope>",
  total_waves: 1,
  working_directory: "<cwd>",
  tasks: []
}

Store the returned group_id — all iteration tasks will be added to this group.

Send an initial mailbox status broadcast:

commander_mailbox {
  operation: "send",
  from_agent: "autoresearch",
  to_agent: "commander",
  body: "Autoresearch started: <goal>. Baseline metric: <value>. Scope: <files>. Plan approved.",
  message_type: "status"
}

Per-Iteration — Create → Claim → Complete

Before modifying (step 3 of each loop iteration), create and claim a Commander task:

commander_task { operation: "create", description: "Iteration #N: <planned change>", working_directory: "<cwd>", group_id: <group_id> }
commander_task { operation: "claim", task_id: <task_id>, agent_name: "autoresearch" }

After logging results (step 7 of each loop iteration), complete the task with the outcome:

commander_task { operation: "complete", task_id: <task_id>, result: "<status>: <description>. Metric: <old> → <new> (delta: <delta>)" }

Also add a comment to the task with detailed results:

commander_task { operation: "comment:add", task_id: <task_id>, body: "Status: <keep|discard|crash>\nMetric: <value> (delta: <delta>)\nCommit: <hash or '-'>\nDescription: <what was tried>", agent_name: "autoresearch" }

Note: Use complete for ALL outcomes (keep, discard, crash). Discards and crashes are expected in autoresearch — they're not failures. Reserve fail only for unrecoverable errors that halt the entire loop.

Status Broadcasts — Every ~5 Iterations

Every 5 iterations, send a mailbox status update AND add a comment to the group:

commander_mailbox {
  operation: "send",
  from_agent: "autoresearch",
  to_agent: "commander",
  body: "Autoresearch progress — Iteration #N: metric at <value> (baseline: <baseline>). Keeps: X | Discards: Y | Crashes: Z",
  message_type: "status"
}

Research Complete — Report & Implementation Handoff (MANDATORY)

When the loop ends (bounded mode reaching N, or goal achieved):

Final mailbox broadcast with full summary:

commander_mailbox {
  operation: "send",
  from_agent: "autoresearch",
  to_agent: "commander",
  body: "Autoresearch complete (N iterations). Baseline: <X> → Final: <Y> (delta: <Z>). Keeps: A | Discards: B | Crashes: C. Best iteration: #M — <description>",
  message_type: "result"
}

Compile findings & next steps — Extract prioritized, actionable implementation items from the research. Update the session file with findings, next steps array, and final metric.
Research report — Present via show_report framed as a handoff:

show_report {
  title: "Research Complete — Ready for Implementation: <goal>",
  summary: "## Research Results\n\n...\n\n## Prioritized Next Steps\n\n1. <action item>\n2. ...\n\n## Recommended Implementation Approach\n\n<how to implement>"
}

Ask about implementation — Use ask_user to offer three choices:
- Implement now → spawn a team of builder agents via subagent_create_batch
- Save & pause → set session to "paused", resume later via /research
- Done → mark session "complete"
Implementation (if chosen) — Update session to "implementing", create Commander task group, dispatch builders, track completion. When done, present final comprehensive report covering research results AND implementation work. Set session to "complete".
Preserve the plan — Leave .context/autoresearch-plan.md intact. Leave the session file for browsing via /research.

Graceful Degradation

All Commander calls are optional. If Commander is unavailable:

Skip commander_task and commander_mailbox calls silently
The local autoresearch-results.tsv log remains the primary record
The show_report call still works (it only needs git, not Commander)
Never let a Commander error interrupt the autonomous loop

Adapting to Different Domains

Domain	Metric	Scope	Verify Command
Backend code	Tests pass + coverage %	`src/*/.ts`	`npm test`
Frontend UI	Lighthouse score	`src/components/**`	`npx lighthouse`
ML training	val_bpb / loss	`train.py`	`uv run train.py`
Blog/content	Word count + readability	`content/*.md`	Custom script
Performance	Benchmark time (ms)	Target files	`npm run bench`
Refactoring	Tests pass + LOC reduced	Target module	`npm test && wc -l`

Adapt the loop to your domain. The PRINCIPLES are universal; the METRICS are domain-specific.

Session Persistence

Every autoresearch session is saved to .context/research-sessions/<session-id>.json. This enables:

Resume later — pick up where you left off via /research command
Browse history — see all past research sessions in the research browser
Track lifecycle — from understanding through implementation completion

Update the session file at every major transition: understand → plan → research → implement → complete. On every "keep" iteration or every ~5 iterations, append iteration data to the session. This creates a complete record of the research lifecycle that can be browsed and resumed.

autoresearch

同仓库更多 Skills

Autoresearch — Autonomous Goal-directed Iteration

When to Activate

Phase 1: Understand (Do This First — Before ANY Work)

Phase 2: Plan (Present Before Executing)

Phase 3: Setup & Begin

The Loop

Critical Rules

Principles Reference

Commander Integration (Task Tracking & Visibility)

Setup Phase (Phase 3) — Create Task Group

Per-Iteration — Create → Claim → Complete

Status Broadcasts — Every ~5 Iterations

Research Complete — Report & Implementation Handoff (MANDATORY)

Graceful Degradation

Adapting to Different Domains

Session Persistence

Autoresearch — Autonomous Goal-directed Iteration

When to Activate

Phase 1: Understand (Do This First — Before ANY Work)

Phase 2: Plan (Present Before Executing)

Phase 3: Setup & Begin

The Loop

Critical Rules

Principles Reference

Commander Integration (Task Tracking & Visibility)

Setup Phase (Phase 3) — Create Task Group

Per-Iteration — Create → Claim → Complete

Status Broadcasts — Every ~5 Iterations

Research Complete — Report & Implementation Handoff (MANDATORY)

Graceful Degradation

Adapting to Different Domains

Session Persistence

同仓库更多 Skills