| name | test-ledger-runner |
| description | Runs test suites, tracks results in logs/test-ledger.md, and verifies tests promised in plans exist—without editing product code or tests. Use for tester subagent or when reporting pytest/jest/npm test outcomes. |
Test ledger and execution (tester)
Rules
- Do not edit source, tests, or config to “make green.” Investigate, reproduce, capture logs, and report to orchestrator.
- Use
python-venv-dependencies (or project’s JS/Java flow) before running suites.
Playwright install scope and execution
- Playwright Test setup is repo-scoped:
npm init playwright@latest should be run in each repository that needs Playwright tests.
- Browser binaries are usually machine-cached, but project dependency pinning and config are repository-local.
- Prefer repo-native package manager commands when present (npm/pnpm/yarn lockfile consistency).
- Baseline command sequence for Playwright suites:
npx playwright test
npx playwright test --project=chromium for focused repro
npx playwright test --debug for interactive failure investigation
npx playwright show-report and npx playwright show-trace <trace.zip> for evidence capture
Stitch-assisted UX pipeline tagging
- If a failure is tied to Stitch-assisted UX generation or downstream exported flows, mark notes with
stitch plus one short subtype label.
- Preferred labels:
stitch-setup, stitch-skill-missing, stitch-mcp, stitch-export, stitch-flow-regression.
- Include the source context in Notes when available (for example: skill used, DESIGN.md revision, or generated flow identifier).
Ledger file: logs/test-ledger.md
Append or update rows (keep newest entries at top):
| Suite / command | Date | Tests related (files) | Last result | Notes |
|---|
Group by suite/command and keep only 5 latest runs for a given test/suite.
Include: flaky markers, skipped tests, env prerequisites, and links to CI runs if applicable.
If Playwright is involved, include project-local setup status and relevant runtime versions when tied to failures.
Plan alignment
- If a plan promises tests, verify they exist and match the described behavior; note gaps without implementing them.
Long / heavy suites and cloud
- If a suite is too heavy for the current session (duration, resource limits), add Cloud: REQUEST: in your HANDOFF with objective, branch/commit, commands, acceptance, timeout hint—orchestrator decides and may fill a Cloud Task Packet in
plans/orchestration-state.md.
Batch run at slice end (medium / high risk)
- When
plans/orchestration-state.md ## Risk tier is medium or high and the orchestrator (or run packet) marks the end of a coder slice (several related tasks landed in one batch), run the full automated suite once after the last coder in that slice, update logs/test-ledger.md, and report pass/fail—even if each coder already ran pytest. This is the independent verification gate (see .claude/agents/orchestrator.md § Verification gates).
- If the slice touched user-visible UX (menus, shortcuts, dialogs, navigator, subwindow chrome, themes, fullscreen), add a subsection Suggested manual smoke (3–8 bullets: concrete clicks and keys). The orchestrator may instead dispatch
ux for the checklist; if you are the only verification specialist, include smoke bullets in your HANDOFF either way.
Reporting
- Summarize: pass/fail counts, first failure with file:line, suspected category (logic, env, data), and recommended assignee (usually coder).
- When UX was in scope, include Suggested manual smoke bullets or state “none (non-UX slice)”.
- End with the structured HANDOFF → orchestrator block (see skill
team-orchestration-delegation).