| name | test-specialist |
| description | Test orchestrator for PwnDoc. Analyzes a feature or the full project to determine test gaps across backend (Jest), frontend unit (Vitest), and frontend E2E (Playwright), then delegates to /test-backend, /test-frontend-unit, and /test-frontend-e2e skills.
|
| argument-hint | [feature-name or 'gaps'] |
Test Specialist - Orchestrator
You are the PwnDoc test orchestrator. Your job is to analyze what tests are needed and coordinate the three test sub-skills.
Arguments
$ARGUMENTS - Either:
- A feature name (e.g., "audit", "client", "vulnerability") to test that feature across all layers
gaps or all to analyze and fill all test coverage gaps
How to Operate
Step 1: Analyze Scope
If feature name provided:
Identify all files related to that feature across all layers:
- Backend:
backend/src/routes/{feature}.js, backend/src/models/{feature}.js
- Frontend service:
frontend/src/services/{feature}.js
- Frontend pages:
frontend/src/pages/{feature}*/
- Frontend components: related components
- Existing tests in all three layers
If "gaps" or "all":
Scan all three test directories and compare against source files to build a complete gap list, then proceed to Step 1.5.
Step 1.5: Prioritize Scope (gaps/all mode only)
This step only applies when the argument is "gaps" or "all". Skip this step for feature-specific arguments.
After building the gap list, you MUST ask the user before proceeding:
-
Summarize gaps — Print a concise summary showing the number of missing tests per category:
Test gaps found:
- Backend: X missing test files
- Frontend Unit — Components: X missing (list names)
- Frontend Unit — Pages: X missing (list names)
- Frontend E2E: X missing specs (list names)
(Omit categories with no gaps, e.g. skip "Services" if all are covered.)
-
Ask the user to pick focus areas — Use AskUserQuestion with multiSelect: true to let the user choose which categories to work on. Options should be the categories that have gaps (e.g., "Frontend Unit — Components", "Frontend Unit — Pages", "Frontend E2E").
-
Batch limit — Cap the session to a maximum of 5 test files total. If the user's selected categories contain more than 5 missing items, either:
- Ask the user which specific items to prioritize, OR
- Default to the top 5 items from the Priority Order section below
Tell the user how many items you'll work on (e.g., "I'll create 5 test files this session based on priority order.").
-
Proceed — Only generate tests for the selected/prioritized items. Skip everything else.
Step 2: Determine What Needs Tests
For each layer, check what exists:
Backend (tests in backend/tests/):
- Already ~90% covered. Only delegate to
/test-backend if the feature has NO existing test file or if you're adding tests for a new feature.
- Check: Does
backend/tests/{feature}.test.js exist?
Frontend Unit (tests in frontend/tests/unit/):
- Check for missing service tests in
frontend/tests/unit/services/
- Check for missing component tests in
frontend/tests/unit/components/
- Check for missing page tests in
frontend/tests/unit/pages/
- Check for missing store tests in
frontend/tests/unit/stores/
Frontend E2E (tests in frontend/tests/e2e/):
- Check for missing
*.spec.js files for major user flows
- E2E tests cover critical user journeys, not every single page
Step 3: Delegate to Sub-Skills
In gaps/all mode: Only delegate for the items selected/prioritized in Step 1.5. Do NOT generate tests for unselected categories or items beyond the batch limit.
For each layer that needs tests, invoke the appropriate sub-skill with --no-run so they generate tests without running them (you will run tests in Step 4):
/test-backend {feature} --no-run — for backend API tests
/test-frontend-unit {type}:{name} --no-run — for frontend unit tests
service:audit --no-run, component:breadcrumb --no-run, page:profile --no-run, store:index --no-run
/test-frontend-e2e {feature} --no-run — for E2E specs
Step 4: Run and Verify (Fix Loop)
After ALL sub-skills have finished writing code, run the test cycle below.
CRITICAL RULES:
- ONE test run per cycle. Run
./pwndoc-cli test ONCE, capture the full output, then analyze the stored results. Do NOT run the command multiple times to "investigate" — all information you need is in the single run's output.
- NEVER run
pwndoc-cli in parallel. Only one instance of ./pwndoc-cli may run at a time across all agents and subagents. No two test commands may overlap.
- Fix agents must NEVER run tests. When you delegate fixes to subagents/parallel agents, they must ONLY edit code. They must NOT run
./pwndoc-cli test or any test command. Only YOU (the main test-specialist orchestrator) run tests, and only AFTER all fix agents have completed.
- Always use
./pwndoc-cli test commands. Never use npm run test or npx directly.
- Use
--coverage only for backend and frontend unit suites (not E2E).
Fix loop procedure:
-
Run tests once — Execute the relevant test suites and store the full output:
./pwndoc-cli test --backend if backend tests were created/modified
./pwndoc-cli test --frontend-unit if frontend unit tests were created/modified
./pwndoc-cli test --frontend-e2e if E2E tests were created/modified
-
Analyze stored results — Read through the captured output to identify all failures. Do NOT re-run tests to get more information.
-
Fix all failures — Delegate fixes to the relevant sub-skills using --fix mode with --no-run. Group failures by layer, then invoke in parallel:
/test-frontend-unit --fix {test-file} "{error}" --no-run for frontend unit test failures
/test-frontend-e2e --fix {spec-file} "{error}" --no-run for E2E spec failures
/test-backend --fix {test-file} "{error}" --no-run for backend test failures
Sub-skills must NOT run tests (ensured by --no-run). Wait for ALL to complete before proceeding.
-
Re-run tests once — After ALL fixes are written, run the test command again (once) to verify. Repeat from step 2 if there are still failures.
-
Maximum 2 iterations. If tests still fail after 2 fix cycles, report the remaining failures to the user rather than continuing to loop.
Coverage Thresholds
When running coverage tests or filling gaps, aim to reach these thresholds:
Backend (./pwndoc-cli test --backend --coverage):
- Statements:
75%
- Lines:
75%
- Functions:
70%
- Branches:
60% (advisory)
Frontend Unit (./pwndoc-cli test --frontend-unit --coverage):
- Statements:
65%
- Lines:
65%
- Functions:
60%
- Branches:
50% (advisory)
E2E (./pwndoc-cli test --frontend-e2e):
- No coverage percentages
- Focus on critical user flows (auth, audit CRUD, report generation)
- Validate stable pass/fail behavior
Automatic Threshold Enforcement
When adding features or filling gaps:
-
Run coverage: Execute ./pwndoc-cli test --backend --coverage or ./pwndoc-cli test --frontend-unit --coverage.
-
Parse results: Extract statements, lines, functions, and branches percentages from the coverage output.
-
Compare to thresholds: Check each metric against the targets above.
-
If below threshold:
- Identify uncovered paths from the coverage report.
- Prioritize: critical business logic (auth, permissions, CRUD), shared services/stores, error handling and validation, high-reuse components.
- Add targeted tests to close gaps.
- Re-run coverage once.
-
Completion rule:
- Task is complete when thresholds are met, or measurably improved (
>= 5% gain in any below-threshold metric).
- If thresholds are still not met after one gap-filling round, report current coverage and remaining gaps to the user instead of looping indefinitely.