mit einem Klick
swarm-local-e2e
// Guide for running local E2E tests with API server, Docker lead/worker containers, task creation, log verification, UI dashboard, and cleanup
// Guide for running local E2E tests with API server, Docker lead/worker containers, task creation, log verification, UI dashboard, and cleanup
| name | swarm-local-e2e |
| description | Guide for running local E2E tests with API server, Docker lead/worker containers, task creation, log verification, UI dashboard, and cleanup |
Run full end-to-end tests of the agent swarm locally with a real API server and Docker containers.
This skill should be invoked in two modes:
User-requested QA: The user asks you to run E2E tests, verify a feature, or QA a specific flow. Follow the steps below targeting what they asked for.
Automated change verification: After implementing changes that touch the API, runner, polling, task lifecycle, session logs, Docker entrypoint, or worker/lead behavior — use this skill proactively to verify the changes work end-to-end. Determine what's testable based on the diff:
You do not need to run every step — pick the subset relevant to the changes being tested.
open -a OrbStack if needed).env with API_KEY and PORT configured.env.docker-lead with lead config (AGENT_ID, CLAUDE_CODE_OAUTH_TOKEN, MCP_BASE_URL).env.docker with worker config (AGENT_ID, CLAUDE_CODE_OAUTH_TOKEN or OPENROUTER_API_KEY, MCP_BASE_URL)Check .env for the configured port — do not assume 3013:
grep ^PORT= .env
Use this value as $PORT throughout. In worktrees, each worktree may have a different port. Always verify and use the value from .env.
Also verify the Docker env files match:
grep MCP_BASE_URL .env.docker-lead .env.docker
# Both should point to http://host.docker.internal:$PORT
If they don't match, update them before starting containers.
# Kill any existing API process on your port
lsof -ti :$PORT | xargs kill 2>/dev/null
# Clean DB for fresh state
rm -f agent-swarm-db.sqlite agent-swarm-db.sqlite-wal agent-swarm-db.sqlite-shm
# Start API server
bun run start:http &
# Wait ~3s for startup, confirm "MCP HTTP server running on http://localhost:$PORT/mcp"
bun run docker:build:worker
This builds agent-swarm-worker:latest from the current code. Rebuild after every code change.
Use a unique container name to avoid conflicts with other worktrees (e.g. include branch name or feature):
docker run --rm -d \
--name e2e-lead-$(git branch --show-current | tr '/' '-') \
--env-file .env.docker-lead \
-e AGENT_ROLE=lead \
-e MAX_CONCURRENT_TASKS=1 \
-p 3201:3000 \
agent-swarm-worker:latest
Wait ~15s, then verify:
docker logs e2e-lead-$(git branch --show-current | tr '/' '-') 2>&1 | tail -5
# Should see: "[lead] Polling for triggers (0/1 active)..."
If port 3201 is taken by another worktree, pick a different host port (e.g. -p 3211:3000).
docker run --rm -d \
--name e2e-worker-$(git branch --show-current | tr '/' '-') \
--env-file .env.docker \
-e MAX_CONCURRENT_TASKS=1 \
-p 3203:3000 \
agent-swarm-worker:latest
Wait ~15s, then verify:
docker logs e2e-worker-$(git branch --show-current | tr '/' '-') 2>&1 | tail -5
# Should see: "[worker] Polling for triggers (0/1 active)..."
Use context-mode execute (not curl directly due to hook restrictions):
const headers = { 'Authorization': 'Bearer $API_KEY', 'Content-Type': 'application/json' };
const agents = await (await fetch('http://localhost:$PORT/api/agents', { headers })).json();
for (const a of agents.agents) {
console.log(`${a.name} | isLead: ${a.isLead} | status: ${a.status} | id: ${a.id}`);
}
Should show both lead and worker registered as idle. Save the agent IDs for task creation.
const t = await (await fetch('http://localhost:$PORT/api/tasks', {
method: 'POST', headers,
body: JSON.stringify({ task: 'Say hello. Call store-progress with status completed.', agentId: LEAD_ID })
})).json();
console.log('Task:', t.id, '| status:', t.status);
Important: Use agentId (not assignedTo) to assign tasks. Wrong param silently creates an unassigned task.
const t = await (await fetch('http://localhost:$PORT/api/tasks', {
method: 'POST', headers,
body: JSON.stringify({ task: 'Say hello. Call store-progress with status completed.' })
})).json();
console.log('Pool task:', t.id, '| status:', t.status);
Workers auto-claim unassigned tasks at poll time. Leads do not auto-claim pool tasks.
# Watch lead logs (use your container name)
docker logs -f e2e-lead-$(git branch --show-current | tr '/' '-') 2>&1 | tail -20
# Watch worker logs
docker logs -f e2e-worker-$(git branch --show-current | tr '/' '-') 2>&1 | tail -20
Poll task status:
const t = await (await fetch('http://localhost:$PORT/api/tasks/<task-id>', { headers })).json();
console.log(t.status); // pending → in_progress → completed/failed
const logs = await (await fetch('http://localhost:$PORT/api/tasks/<task-id>/session-logs', { headers })).json();
console.log('Log count:', logs.logs.length);
// Should be > 0 for completed tasks
For log isolation verification (multiple sequential tasks from same agent):
const [l1, l2] = await Promise.all([
fetch('http://localhost:$PORT/api/tasks/<task1>/session-logs', { headers }).then(r => r.json()),
fetch('http://localhost:$PORT/api/tasks/<task2>/session-logs', { headers }).then(r => r.json()),
]);
const s1 = [...new Set(l1.logs.map(l => l.sessionId))];
const s2 = [...new Set(l2.logs.map(l => l.sessionId))];
console.log('Unique sessionIds:', s1[0] !== s2[0]); // Should be true
Start the dashboard to visually verify tasks, logs, and agent status:
cd ui && pnpm run dev &
# Defaults to port from APP_URL in .env (check with: grep APP_URL ../.env)
If the UI port is taken by another worktree, start on an alternate:
cd ui && pnpm run dev --port 5276
The UI connects to the API via VITE_API_URL (check ui/.env or defaults to http://localhost:$PORT).
Use agent-browser or qa-use to automate UI checks:
# Quick visual gut-check with agent-browser
agent-browser --url http://localhost:5175 snapshot
# Or use qa-use to verify specific flows
qa-use explore http://localhost:5175
Things to verify in the UI:
# Stop containers (use your branch-specific names)
docker stop e2e-lead-$(git branch --show-current | tr '/' '-') e2e-worker-$(git branch --show-current | tr '/' '-') 2>/dev/null
# Stop API server
lsof -ti :$PORT | xargs kill 2>/dev/null
# Stop UI dev server (if started)
lsof -ti :5175 | xargs kill 2>/dev/null
ERROR: Cannot connect to the Docker daemon
Fix: open -a OrbStack and wait ~5s.
docker: Error response from daemon: Conflict. The container name "..." is already in use
Another worktree has a container with the same name. Either stop it (docker stop <name>) or use a different name suffix.
agentId (not assignedTo) — wrong param silently creates an unassigned taskin_progress (e.g. from a manual poll call that consumed the trigger)docker restart <container-name>docker logs <container> 2>&1 | grep "capacity"/api/poll (not POST)X-Agent-ID header with a valid agent UUIDlsof -i :3013 # Check what's using the port
If another worktree is running, set a different PORT in .env and update MCP_BASE_URL in .env.docker* to http://host.docker.internal:<new-port>.
completed or failed, not just in_progress)claudeSessionId is set on the task: GET /api/tasks/<id> should show itsession_logs table directlyDirect API cancellation (POST /api/tasks/<id>/cancel) updates the DB but doesn't kill the Claude process inside Docker. Use docker restart <container> to force-stop.
Use simple tasks like "Say hello" for E2E tests. Complex tasks waste time and API credits.
The dashboard auto-polls every 5 seconds. If data looks stale, hard-refresh (Cmd+Shift+R) or check VITE_API_URL points to the correct API port.
Canonical AgentMail send-message API reference for swarm agents. Pins the base URL, required field names, text-only rendering workaround, BCC policy, and ready-to-copy curl / swarm-script examples so agents do not rediscover the API surface at runtime.
How to interact with Kapso WhatsApp from the swarm — read inbound webhook payloads (text AND media), fetch message history, send free-form messages within the 24h session window (and template messages outside it), mark-as-read, show the typing indicator, send reactions, download media, verify webhook signatures, and resolve contacts to swarm users. Canonical reference for ANY Kapso interaction beyond the thin `send-whatsapp-message` / `reply-whatsapp-message` MCP tools — for templates, media, reactions, typing, mark-as-read, signature verify, contact resolution, conversation history, drop to the REST recipes here. Use whenever a task references a WhatsApp message routed through Kapso, or when a workflow needs to reply on WhatsApp.
How to manage the user registry — creating users for new Slack/GitHub/GitLab/Linear identities, managing aliases, resolving users across platforms. Use when a new human interacts with the swarm or when user identity needs updating.
Use the swarm KV store (Redis-like, namespaced) for cross-task / cross-session / per-page state. Auto-scoped to your context (Slack thread / PR / Linear issue / agent / page). Use for counters, cursors, page state. Do NOT use for secrets (`swarm_config`), embedded knowledge (`memory`), or files (`agent-fs`).
Serve interactive web content (HTML pages, dashboards, approval flows, static reports, custom Hono apps) to a public URL via localtunnel. Use when the user asks to "create an artifact for X", "host this for me", "make me a tunneled URL", "spin up a web server for X", "publish this report so I can see it", "share this file/page publicly", "expose this dashboard", "give me a live link", or anything that needs a browser-reachable URL pointing at agent-generated content. Wraps the `agent-swarm artifact` CLI plus the `createArtifactServer` SDK; covers static directories, custom Hono apps, daemonization (nohup / PM2), HTTP Basic auth, and the in-page swarm Browser SDK.
Create a pull request (GitHub) or merge request (GitLab) from the current branch