with one click
start
// Use when starting work on any task, when the user mentions metaswarm, or when the user wants to begin tracked development work
// Use when starting work on any task, when the user mentions metaswarm, or when the user wants to begin tracked development work
Interactive project setup — detects your project, configures metaswarm, writes project-local files
Diagnostic status report — shows metaswarm installation state, project setup, and potential issues
4-phase execution loop for work units - IMPLEMENT, VALIDATE, ADVERSARIAL REVIEW, COMMIT
Create comprehensive GitHub issues with TDD plans, acceptance criteria, and agent instructions for autonomous PR lifecycle management
Address PR review feedback systematically — fetch inline comments, review bodies, handle outside-diff-range comments, resolve threads with proper attribution
Monitor a PR through to merge — handle CI failures, review comments, and thread resolution automatically until all checks pass
| name | start |
| description | Use when starting work on any task, when the user mentions metaswarm, or when the user wants to begin tracked development work |
| auto_activate | true |
| triggers | ["work on issue","start issue","start task","use metaswarm","@metaswarm","agent-ready label"] |
This skill coordinates a swarm of specialized AI agents to autonomously handle GitHub Issues from creation to merged PR.
# User triggers via any of:
@beads start #123
bd start 123
/beads-start 123
bd ready # Show tasks ready to work
bd list # Show all tasks
bd stats # Show project statistics
bd doctor # Check system health
| Agent | Role | Spawned When |
|---|---|---|
| Issue Orchestrator | Main coordinator per Issue | Issue receives agent-ready label |
| Researcher Agent | Codebase exploration | Orchestrator creates research task |
| Architect Agent | Implementation planning | Research complete |
| Product Manager Agent | Use case & user benefit review | Design review gate (parallel) |
| Designer Agent | UX/API design review | Design review gate (parallel) |
| Security Design Agent | Security threat modeling | Design review gate (parallel) |
| CTO Agent | TDD readiness & plan review | Design review gate (parallel) |
| Coder Agent | TDD implementation | Design review gate approved |
| Code Review Agent | Internal code review | Implementation complete |
| Security Auditor | Security review (code) | Implementation complete |
| Release Engineer Agent | Safe delivery from merge through production | QA approves PR, PR reaches merge readiness |
| PR Shepherd | PR lifecycle management | PR created |
See ./agents/ directory for detailed agent definitions.
For complex features created via brainstorming, an automatic Design Review Gate ensures quality before implementation:
Design Document Created
│
▼
┌─────────────────────────────────────────────────┐
│ DESIGN REVIEW GATE │
│ │
│ Spawns in PARALLEL: │
│ • Architect Agent (technical architecture) │
│ • Designer Agent (UX/API design) │
│ • UX Reviewer (user flows, integration WUs) │
│ • CTO Agent (TDD readiness) │
│ │
│ ALL must approve to proceed │
└─────────────────────────────────────────────────┘
│
├── Any NEEDS_REVISION? → Iterate on design (max 3x)
│
ALL APPROVED
│
▼
Create BEADS Epic → Begin Implementation
The gate is automatically triggered when:
superpowers:brainstorming completes and commits a design doc/review-design <path-to-design.md>| Agent | Focus Areas |
|---|---|
| Product Manager | Use case clarity, user benefits, scope, success metrics |
| Architect | Service architecture, dependencies, patterns, integration |
| Designer | API design, UX flows, developer experience, consistency |
| Security Design | Threat modeling, auth/authz, data protection, OWASP Top 10 |
| UX Reviewer | User flows, text wireframes, integration WUs, empty/error states |
| CTO | TDD readiness, codebase alignment, completeness, risks |
See the design-review-gate skill for full details.
When multiple Claude Code sessions are active on the same repository (e.g., parallel worktrees), metaswarm automatically enters Team Mode. In Team Mode, agents behave as persistent teammates with context retention across sessions and direct inter-agent messaging for coordination. Mode detection is automatic based on the presence of concurrent sessions.
For the full Team Mode protocol — including message routing, context sharing, and conflict resolution — see ./guides/agent-coordination.md.
After the Architect creates an implementation plan and before it reaches the Design Review Gate, the plan passes through the Plan Review Gate. Three adversarial reviewers validate the plan independently:
| Reviewer | Focus |
|---|---|
| Feasibility | Technical viability, dependency risks, resource constraints |
| Completeness | Missing work units, untested edge cases, gaps in Definition of Done |
| Scope & Alignment | Plan stays within issue scope, aligns with codebase conventions |
All 3 must APPROVE before the plan proceeds. See the plan-review-gate skill for the full skill definition.
After design review approval, implementation follows the 4-phase execution loop per work unit. This replaces the previous linear "implement then review" flow with rigorous independent validation and adversarial review.
Trust nothing. Verify everything. Review adversarially.
Before submitting to the Design Review Gate, the orchestrator runs a pre-flight checklist covering architecture, dependency graph, API contracts, security, UI/UX, and external dependencies. This catches structural issues (missing service layer, wrong dependency graph, oversized WUs) before spending agent cycles on review.
For each work unit (a discrete, spec-driven change with DoD items):
.coverage-thresholds.json). Never trust subagent self-reports. Quality gates are blocking state transitions, not advisory.adversarial-review-rubric.md. When external tools are configured, cross-model review ensures the writer is always reviewed by a different AI model (see External Tools section below).SERVICE-INVENTORY.md and Project Context Document.On FAIL: fix → re-validate → spawn fresh reviewer (max 3 retries → escalate to human). There is NO path from FAIL to COMMIT without passing through the retry loop.
For simple tasks, the standard linear flow (implement → code review → PR) works fine.
Work Unit Decomposition: Break the implementation plan into discrete work units, each with:
Independent Validation: The orchestrator runs tsc, eslint, and vitest directly — it does NOT ask the coding subagent "did the tests pass?" and accept the answer.
Adversarial Review: Fundamentally different from collaborative code review. The reviewer is an independent auditor checking spec compliance, not a helpful colleague suggesting improvements. Binary PASS/FAIL verdict. Evidence required (file:line references).
Fresh Reviewer Rule: On re-review after FAIL, a NEW reviewer instance is spawned with no memory of the previous review. This prevents anchoring bias.
Human Checkpoints: Planned pauses at critical boundaries (schema changes, security code, first use of new patterns). The orchestrator waits for explicit human approval before continuing.
Final Comprehensive Review: After all work units pass, a cross-unit review catches integration issues that per-unit reviews miss.
See orchestrated-execution skill for the complete pattern, including work unit structure, parallel execution, recovery protocol, and anti-patterns.
When external AI CLI tools are configured (.metaswarm/external-tools.yaml), the orchestrator can delegate implementation and review tasks to OpenAI Codex CLI and Google Gemini CLI. This enables cost savings through cheaper models and cross-model adversarial review that eliminates single-model blind spots.
External tools slot directly into the existing 4-phase execution loop:
The orchestrator adapts based on tool availability:
| Available Tools | Escalation Chain | Max Attempts |
|---|---|---|
| Both Codex + Gemini | A(2) → B(2) → Claude(1) → user | 5 |
| One tool only | Tool(2) → Claude(1) → user | 3 |
| No tools | Claude → user (existing behavior) | unchanged |
Each escalated model receives the previous model's branch as a reference. See the external-tools skill for the full skill definition.
/external-tools-health
Checks installation, authentication, and reachability of all configured adapters.
The visual-review skill enables agents to take screenshots of web pages, presentations, and UIs using Playwright for visual inspection. This bridges the gap where agents cannot see rendered output.
The skill is triggered when tasks involve visual output (web UIs, Reveal.js presentations, landing pages, email templates). It captures screenshots at configurable viewport sizes, and agents analyze them for layout, typography, colors, spacing, and content issues.
npx playwright install chromium
For remote/headless environments, the skill serves screenshots via HTTP file server so users can view them in their local browser.
See the visual-review skill for the complete workflow.
GitHub Issue #123 (agent-ready label)
│
▼
┌─────────────────────────────────────┐
│ Issue Orchestrator │
│ Creates BEADS epic, delegates work │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ Research Phase │
│ Researcher Agent explores codebase │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ Planning Phase │
│ Architect Agent creates plan │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ Plan Review Gate │
│ 3 adversarial reviewers: │
│ Feasibility, Completeness, │
│ Scope & Alignment │
│ ALL 3 must approve │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ External Dependency Detection │
│ Scans spec for API keys/creds │
│ Prompts user to configure them │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ Plan Validation (Pre-Flight) │
│ Architecture, deps, API contracts │
│ Security, UI/UX, external deps │
└─────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────────────┐
│ DESIGN REVIEW GATE (PARALLEL) │
│ │
│ ┌─────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────┐ ┌───────┐ │
│ │ PM │ │ Architect│ │ Designer │ │ Security │ │UX Revw.│ │ CTO │ │
│ │(users) │ │ (tech) │ │ (UX/API) │ │ (threats)│ │(flows) │ │ (TDD) │ │
│ └─────────┘ └──────────┘ └──────────┘ └──────────┘ └────────┘ └───────┘ │
│ │
│ ALL SIX must approve (max 3 iterations) │
└──────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ Work Unit Decomposition │
│ Break plan into work units w/ DoD │
│ Build dependency graph │
└─────────────────────────────────────┘
│
▼
┌───────────────────────────────────────────────────────────────┐
│ ORCHESTRATED EXECUTION LOOP (per work unit) │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────────┐ ┌──────┐ │
│ │IMPLEMENT │───→│ VALIDATE │───→│ ADVERSARIAL │──→│COMMIT│ │
│ │(Coder) │ │(Orchest.)│ │ REVIEW │ │ │ │
│ └──────────┘ └──────────┘ └──────┬───────┘ └──────┘ │
│ ▲ │ FAIL │
│ └─────────────────────────────────┘ │
│ │
│ Trust nothing. Verify everything. Review adversarially. │
└───────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ Final Comprehensive Review │
│ Cross-unit integration check │
│ Full test suite + type check │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ PR Creation (Auto-Shepherd) │
│ bin/create-pr-with-shepherd.sh │
│ → Auto-invokes pr-shepherd skill │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ PR Shepherd (Automatic) │
│ Monitors CI, handles reviews, │
│ resolves threads automatically │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ Human Approval & Merge │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ Release Engineer │
│ Pre-merge verify → merge → CI → │
│ deploy → post-deploy QA → release │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ Close Epic & Extract Learnings │
└─────────────────────────────────────┘
GTG is the final merge gate. It consolidates CI status, comment classification, and thread resolution into a single deterministic check. Agents should use it as the primary readiness signal:
# Check if PR is ready to merge
gtg <PR_NUMBER> --format json \
--exclude-checks "Merge Ready (gtg)" \
--exclude-checks "CodeRabbit" \
--exclude-checks "Cursor Bugbot" \
--exclude-checks "claude"
Statuses: READY (merge), ACTION_REQUIRED (fix comments), UNRESOLVED_THREADS (resolve threads), CI_FAILING (fix CI). The action_items array tells agents exactly what to fix.
GTG reports, agents act: GTG does not resolve threads or fix code. After addressing feedback, agents must resolve threads themselves via the GraphQL mutation documented in handle-pr-comments.md (Section 3: Resolving Review Threads). GTG will report READY on the next check once threads are resolved.
If the CI check is stale: gh workflow run gtg.yml -f pr_number=<PR_NUMBER>
When a PR is created via bin/create-pr-with-shepherd.sh, the script outputs instructions to start monitoring:
/pr-shepherd <pr-number>For manually-created PRs, invoke /pr-shepherd <pr-number> to start the monitoring cycle.
# Create epic for GitHub Issue
bd create "Feature: User Auth" --type epic --issue 123
# Create task under epic
bd create "Research auth patterns" --type task --parent bd-abc123
# Add dependency
bd dep add <blocked-task> <blocking-task>
# Update status
bd update <task-id> --status open|in_progress|blocked|closed
# Close with reason
bd close <task-id> --reason "Completed successfully"
# Show ready (unblocked) tasks
bd ready --json
# List all tasks under epic
bd list --parent <epic-id>
# Show blocked tasks
bd blocked
# Show task details
bd show <task-id> --json
# Waiting for human input
bd label add <task-id> waiting:human
# Waiting for CI
bd label add <task-id> waiting:ci
# Agent failed, needs intervention
bd label add <task-id> agent:failed
# Review iteration tracking
bd label add <task-id> review:iteration-1
# Check sync status
bd sync --status
# Pull updates from main
bd sync --from-main
# Export to JSONL
bd export
# Check Issue has agent-ready label
gh issue view 123 --json labels | jq '.labels[].name' | grep agent-ready
# Get Issue details
ISSUE=$(gh issue view 123 --json title,body,number)
# Create epic linked to Issue
bd create "$(echo $ISSUE | jq -r .title)" --type epic --issue 123 --json
gh issue comment 123 --body "🤖 Agent claiming this issue. BEADS epic created."
Use the Task tool to spawn the Issue Orchestrator agent:
Task({
subagent_type: "general-purpose",
description: "Issue Orchestrator for #123",
prompt: `You are the ISSUE ORCHESTRATOR agent.
Read the agent definition at:
./agents/issue-orchestrator.md
Your task:
- Epic ID: <epic-id>
- GitHub Issue: #123
- Begin the orchestration workflow
Follow the workflow phases exactly as specified.`,
});
# Mark task as waiting
bd update <task-id> --status blocked
bd label add <task-id> waiting:human
# Post to GitHub Issue
gh issue comment <number> --body "$(cat <<'EOF'
## 🤖 Agent Request: <type>
**Task**: <task-id>
**Question**: <clear question>
### Options
1. **Option A**: <description>
2. **Option B**: <description>
### Agent Recommendation
<recommendation>
---
Reply: `@beads approve <task-id>` or `@beads respond <task-id> <option>`
EOF
)"
# Approve a blocked task
@beads approve bd-abc123
# Respond with choice
@beads respond bd-abc123 "Use option A"
# Request changes
@beads request-changes bd-abc123 "Need more error handling"
# Defer to later
@beads defer bd-abc123 "Discuss in Monday standup"
// Spawn Researcher first
const researchResult = await Task({
subagent_type: "general-purpose",
description: "Research for issue #123",
prompt: researcherPrompt,
});
// Then spawn Architect with research output
const planResult = await Task({
subagent_type: "general-purpose",
description: "Planning for issue #123",
prompt: architectPrompt + researchResult,
});
// Spawn Code Review and Security Audit in parallel
const [reviewResult, securityResult] = await Promise.all([
Task({
subagent_type: "general-purpose",
description: "Code review for #123",
prompt: codeReviewPrompt,
}),
Task({
subagent_type: "general-purpose",
description: "Security audit for #123",
prompt: securityAuditPrompt,
}),
]);
ALL agents MUST prime their context before starting ANY work. This prevents bad assumptions and ensures alignment with established patterns.
# General prime (loads critical rules + gotchas)
bd prime
# Prime for specific files you'll modify
bd prime --files "src/lib/services/*.ts" "src/api/routes/*.ts"
# Prime for specific topic
bd prime --keywords "authentication" "jwt"
# Prime for work type
bd prime --work-type planning # Before planning
bd prime --work-type implementation # Before coding
bd prime --work-type review # Before reviewing
bd prime --work-type research # Before exploring
# Combined (most thorough)
bd prime --files "<files>" --keywords "<topic>" --work-type <type>
The prime command outputs relevant facts categorized as:
Run self-reflection to extract learnings:
# Fetch recent PR comments (metaswarm-specific GitHub integration)
GITHUB_TOKEN=$(gh auth token) npx tsx scripts/beads-fetch-pr-comments.ts --days 7
# Use self-reflect skill to evaluate and add learnings
/self-reflect
# Compact closed issues (semantic summarization via beads plugin)
bd compact
Or spawn Knowledge Curator agent:
Task({
subagent_type: "general-purpose",
description: "Extract learnings from epic",
prompt: `Review completed epic <epic-id> and extract learnings.
FIRST: Run \`bd prime --work-type review\` to load context.
Then analyze:
- What patterns were used?
- What gotchas were discovered?
- What should future agents know?
Use the knowledge capture service to store learnings.`,
});
For large epics, orchestrators can spawn sub-orchestrators. The pattern:
Epic (Issue Orchestrator)
├── Sub-Epic A (Sub-Orchestrator)
│ ├── Task A1
│ └── Task A2
├── Sub-Epic B (Sub-Orchestrator)
│ ├── Task B1
│ └── Task B2
└── Integration Task (blocked by A + B)
When to decompose: If an epic has more than 5-7 tasks or spans multiple domains (e.g., frontend + backend + schema), split into sub-epics.
# Create sub-epics under parent epic
bd create "Sub-Epic: API layer" --type epic --parent <parent-epic-id>
bd create "Sub-Epic: UI components" --type epic --parent <parent-epic-id>
# Each sub-epic gets its own orchestrator
# The parent orchestrator coordinates completion
Each sub-epic's orchestrator follows the same workflow (research → plan → review → implement → PR) independently. The parent orchestrator monitors progress and coordinates the final integration.
Before closing an epic, verify ALL:
.coverage-thresholds.jsonSERVICE-INVENTORY.md updated with all new services/factories/modules# Check task status
bd show <task-id> --json
# Check for orphaned agent
# If agent failed, reset and retry
bd update <task-id> --status open
bd label remove <task-id> agent:failed
# Run doctor to detect
bd doctor
# If found, restructure dependencies
bd dep remove <task1> <task2>
# Check sync status
bd sync --status
# Force export
bd export
# Pull from main
bd sync --from-main
skills/start/ # This skill (main orchestration)
├── SKILL.md # This file
├── agents/ # Agent definitions
│ ├── issue-orchestrator.md # Main coordinator (runs 4-phase loop)
│ ├── researcher-agent.md # Codebase exploration
│ ├── architect-agent.md # Implementation planning
│ ├── product-manager-agent.md # Use case & user benefit review
│ ├── designer-agent.md # UX/API design review
│ ├── security-design-agent.md # Security threat modeling
│ ├── cto-agent.md # TDD readiness review
│ ├── coder-agent.md # TDD implementation
│ ├── code-review-agent.md # Internal code review (collaborative + adversarial modes)
│ ├── security-auditor-agent.md # Security review (implementation)
│ ├── release-engineer-agent.md # Merge → deploy → verify → release
│ └── pr-shepherd-agent.md # PR lifecycle management
├── guides/ # Development guides
│ ├── agent-coordination.md # Team Mode, inter-agent messaging
│ ├── git-workflow.md # Branch naming, commit conventions
│ ├── testing-patterns.md # TDD workflow, mock strategies
│ ├── coding-standards.md # Language idioms, naming conventions
│ ├── worktree-development.md # Parallel development with worktrees
│ └── build-validation.md # Pre-push checks, CI pipeline
├── rubrics/ # Review rubrics
│ ├── plan-review-rubric.md # Used by CTO Agent
│ ├── code-review-rubric.md # Used by Code Review Agent (collaborative mode)
│ ├── adversarial-review-rubric.md # Used by Code Review Agent (adversarial mode)
│ ├── security-review-rubric.md # Used by Security Auditor Agent
│ └── release-engineering-rubric.md # Used by Release Engineer Agent
└── references/ # Reference docs for other tools
├── codex-tools.md # OpenAI Codex CLI reference
├── cursor-tools.md # Cursor tools reference
└── opencode-tools.md # OpenCode tools reference
skills/orchestrated-execution/ # 4-phase execution loop pattern
└── SKILL.md
skills/design-review-gate/ # Design review gate orchestrator
└── SKILL.md
skills/brainstorming-extension/ # Hooks brainstorming to review gate
└── SKILL.md
skills/plan-review-gate/ # 3 adversarial reviewers validate plans
└── SKILL.md
skills/external-tools/ # External AI tool delegation
├── SKILL.md
├── adapters/
│ ├── _common.sh # Shared adapter helpers (14 functions)
│ ├── codex.sh # OpenAI Codex CLI adapter
│ └── gemini.sh # Google Gemini CLI adapter
└── rubrics/
└── external-tool-review-rubric.md # Used by cross-model adversarial review
skills/visual-review/ # Playwright-based visual review
└── SKILL.md
commands/ # Slash commands (invoked as /metaswarm:command-name)
├── start-task.md # /metaswarm:start-task
├── prime.md # /metaswarm:prime
├── review-design.md # /metaswarm:review-design
├── self-reflect.md # /metaswarm:self-reflect
├── pr-shepherd.md # /metaswarm:pr-shepherd
├── handle-pr-comments.md # /metaswarm:handle-pr-comments
├── create-issue.md # /metaswarm:create-issue
└── metaswarm-setup.md # /metaswarm:metaswarm-setup
templates/ # Project scaffolding templates
├── CLAUDE.md # Full CLAUDE.md template for new projects
├── CLAUDE-append.md # Metaswarm section to append to existing CLAUDE.md
├── UI-FLOWS.md # User flow and wireframe documentation template
├── gitignore # Standard Node.js/TypeScript ignores
├── SERVICE-INVENTORY.md # Service/factory/module tracking template
└── ci.yml # CI pipeline template
.beads/ # Runtime state (in user's project)
├── beads.db # SQLite database
├── issues.jsonl # Issue/task data
└── knowledge/ # Curated learnings
├── codebase-facts.jsonl
├── patterns.jsonl
└── anti-patterns.jsonl