| name | agent-swarm |
| plugin | agent-loops |
| description | (Industry standard: Parallel Agent) Primary Use Case: Work that can be partitioned into independent sub-tasks running concurrently across multiple agents. Parallel multi-agent execution pattern. Use when: work can be partitioned into independent tasks that N agents can execute simultaneously across worktrees. Includes routing (sequential vs parallel), merge verification, and correction loops. |
| allowed-tools | Bash, Read, Write |
Dependencies
This skill requires Python 3.8+ and standard library only. No external packages needed.
To install this skill's dependencies:
pip-compile ./requirements.in
pip install -r ./requirements.txt
See ../../requirements.txt for the dependency lockfile (currently empty — standard library only).
Agent Swarm
Parallel or pipelined execution across multiple agents and worktrees. The orchestrator partitions work, dispatches to agents, and verifies/merges the results.
When to Use
- Large features that can be split into independent work packages
- Bulk operations (tests, docs, migrations, RLM distillation) that benefit from parallelism
- Multi-concern work where specialists handle different aspects simultaneously
Process Flow
- Plan & Partition -- Break work into independent tasks. Define boundaries clearly.
- Route -- Decide execution mode:
- Sequential Pipeline -- Tasks depend on each other (A -> B -> C)
- Parallel Swarm -- Tasks are independent (A | B | C)
2.5. Interactively Determine CLI and Model (ask once during bootstrap): Before dispatching the swarm workers, you must ask the user:
- "Which LLM CLI engine would you like to run the swarm workers through?" (Options:
agy, claude, copilot, gemini, llama).
- "Which specific model should be used?" (Options/defaults per engine, e.g.,
Gemini 3.5 Flash (Low) or gemini-3.5-flash for agy).
- Construct the
swarm_run.py invocation with --engine and --model matching their choices, appending < /dev/null to prevent TTY input halts (SIGTTIN).
- Dispatch -- Create a worktree per task. Assign each to an agent:
- CLI agent (Claude, Gemini, Copilot, Antigravity) using the selected setup
- Deterministic script
- Human
- Execute -- Each agent works in isolation. No cross-worktree communication.
- Verify & Merge (Trust But Verify & TDD) -- Orchestrator checks each worktree's output against acceptance criteria. No blind trust is allowed.
- TDD Enforcement: Prioritize running unit and integration tests to ensure no regressions were introduced.
- Delta Inspection: Check modified files directly for stubs, stales, or placeholders.
- Verify Quality: If verification fails, generate a correction packet, reject, and re-dispatch.
- Pass -> Merge into main branch
- Seal -- Bundle all merged artifacts
- Retrospective -- Did the partition strategy work? Was parallelism effective?
Worker Selection
Each worktree can be assigned to a different worker type based on task complexity:
| Worker | Cost | Best For |
|---|
| High-reasoning CLI (Opus, Ultra, GPT-5.3) | High | Complex logic, architecture |
| Fast CLI (Haiku, Flash 2.0) | Low | Tests, docs, routine tasks |
| Low-cost CLI (gpt-5-mini, gemini-3.5-flash) | Low | Standard low-cost reasoning tier |
| Free CLI: llama gemma-4-12b | $0 | Self-hosted local inference, zero-cost batch jobs |
| Deterministic Script | None | Formatting, linting, data transforms |
| Human | N/A | Judgment calls, creative decisions |
Cost Optimization Strategy: For bulk summarization or distillation jobs, use --engine llama (local Gemma 4) if you have local Metal/CUDA acceleration set up. It is the only truly zero-cost path. Cloud CLIs like --engine copilot (gpt-5-mini) or --engine agy (gemini-3.5-flash) are low-cost but paid (consuming AI Credits or per-token billing). Use --workers 2 for cloud CLIs (rate-limit safe) and --workers 1 for local llama to avoid context swapping on 16GB Macs.
Implementation: ./../scripts/swarm_run.py
The ./../scripts/swarm_run.py script is the universal engine for executing this pattern. It is driven by Job Files (.md with YAML frontmatter).
Key Features
- Resume Support -- Automatically saves state to
.swarm_state_<job>.json. Use --resume to skip already processed items.
- Intelligent Retry -- Exponential backoff for rate limits.
- Verification Skip -- Use
check_cmd in the job file to short-circuit work if a file is already processed (e.g. exists in cache).
- Dry Run -- Test your file discovery and template substitution without cost.
- Engine Flag --
--engine [claude|gemini|copilot|agy] switches CLI backends at runtime.
Usage
source ~/.zshrc
python ./scripts/swarm_run.py \
--engine copilot \
--job ./resources/jobs/my_job.job.md \
--files-from checklist.md \
--resume --workers 2
python ./scripts/swarm_run.py \
--engine gemini \
--job ./resources/jobs/my_job.job.md \
--files-from checklist.md \
--resume --workers 5
python ./scripts/swarm_run.py \
--job ./resources/jobs/my_job.job.md \
[--dir some/dir] [--resume] [--dry-run]
Job File Schema
---
model: haiku
workers: 2
timeout: 120
ext: [".md"]
post_cmd: "python ./scripts/my_post_cmd.py --file {file} --summary {output}"
check_cmd: "python ./scripts/check_cache.py --file {file}"
vars:
profile: project
---
Prompt for the agent goes here.
IMPORTANT for Copilot engine: The copilot CLI ignores stdin when -p is used.
Instead, the instruction is prepended to the file content automatically by ./scripts/swarm_run.py.
Do NOT use tool calls or filesystem access - rely only on the content provided via stdin.
Known Engine Quirks
Copilot CLI
- No
-p flag -- Copilot ignores stdin when -p is present. ./scripts/swarm_run.py automatically prepends the prompt to the file content instead.
- Auth token scope -- Use
source ~/.zshrc to load your token. gh auth token returns a PAT without Copilot permissions, causing auth failures under concurrency.
- Rate limits -- Use
--workers 2 maximum. Higher concurrency trips GitHub's anti-abuse systems and surfaces as authentication errors.
- Concurrent writes -- If using a shared JSON post-cmd output (e.g. cache), ensure the writer script uses
fcntl.flock for atomic writes. See inject_summary.py.
Gemini CLI
- Accepts
-p "prompt" flag normally
- Supports higher concurrency (5-10 workers)
- Model auto-upgrade:
haiku -> gemini-3-pro-preview
Checkpoint Reconciliation
If a batch run is interrupted partway through and the output store (e.g. cache JSON) is partially corrupted, reconcile the checkpoint before resuming:
completed = [f for f in st['completed'] if f in actual_output_keys]
st['failed'] = {}
Then rerun with --resume.
Constraints
- Each worker execution must be independent
- Post-commands must be idempotent if using resume
- Orchestrator owns the overall job state
{file} in post_cmd is shell-quoted automatically -- filenames with apostrophes are safe
- Asynchronous Benchmark Metric Capture: Orchestrators MUST capture and log
total_tokens and duration_ms from worker agents to a centralized timing.json log immediately as subtasks complete, rather than waiting for the entire swarm batch to finish.
Diagram
See: ./assets/resources/agent_swarm.mmd