| name | task-verification-workflow |
| description | Standardized workflow for verifying task completion, including success criteria validation, testing checklists, and fallback verification methods. Use before marking tasks complete, when progress tracking shows discrepancies, or when browser testing fails. Ensures all success criteria are verified before task completion. |
Task Verification Workflow
Overview
Prevent premature task completion by following a standardized verification workflow. Don't mark tasks complete without verifying all success criteria.
Pre-Completion Checklist
Before marking a task complete, verify:
Mandatory Skill Loading:
Implementation Verification:
Mandatory Skill Loading Validation
CRITICAL: Before marking task complete, verify ALL mandatory skills were loaded during implementation. Missing mandatory skills indicate incomplete verification.
Mandatory Skills Checklist
For Web Applications:
For Phaser Games:
For All Tasks:
Skill Loading Verification in Workflow
Before marking complete, verify:
- Mandatory skills were loaded: Check if required skills were used
- Skills were used correctly: Verify skills were applied properly
- No missing skills: Ensure no mandatory skills were skipped
Example Verification:
## Mandatory Skill Verification
**Task Type**: Phaser Game
**Skills Used During Implementation**:
- [x] phaser-game-testing: ✅ Used for test seam patterns
- [x] agent-browser: ✅ Used for browser automation
- [x] screenshot-handling: ✅ Used for visual verification
- [x] completion-marker-optimization: ✅ Used for completion marker
**Status**: All mandatory skills loaded and used correctly.
If Mandatory Skills Missing
If mandatory skills were not loaded:
- Document the gap: Note which skills were missing
- Assess impact: Evaluate if verification was incomplete
- Load skills now: Load missing skills and re-verify if needed
- Document limitation: Note skill loading gap in progress.txt
Example Documentation:
## Skill Loading Gap
**Missing Skills**: agent-browser was not loaded during implementation
**Impact**: Browser testing was not performed, only TypeScript verification
**Action**:
- Loaded agent-browser skill
- Performed browser testing
- Verified functionality works correctly
**Status**: Gap addressed, verification complete.
Success Criteria Validation
Critical: Don't mark complete if progress shows 0/X criteria.
-
Verify each criterion explicitly
- Don't assume criteria are met
- Test each one individually
- Document verification method
-
Use fallback verification if primary method fails
- Browser testing fails → Use console logs or code review
- Test seam unavailable → Use alternative verification
- See
references/fallback-strategies.md for alternatives
-
Document verification in progress.txt
- Note how each criterion was verified
- Include any issues encountered
- Note any fallback methods used
Criteria alignment
Before outputting the completion marker: Compare the current repo state to the orchestrator's stated success criteria (e.g. the "Next" line or checklist). Ensure each criterion is satisfied by the actual paths and content produced, or explicitly document why not. Do not complete if the orchestrator shows 0/N until criteria are met or you have documented and waived with reason.
Verification by task type
- Backend API tasks: Require at least one authenticated request to new/changed endpoints (e.g. register → login → GET/PATCH to the new route). Confirm status and expected shape. If no auth harness exists, document in progress how to verify manually. Do not mark complete based only on "server started" or an unauthenticated 401.
- Frontend / web tasks: Require at least one browser flow that matches the task (load agent-browser and screenshot-handling, run the app, perform a flow: open relevant screen, submit form, see expected UI). Do not output the completion marker until this is done or explicitly documented as impossible (e.g. backend down) with fallback. Do not substitute API-only verification for UI verification when the task is a UI task.
- When authenticated E2E is impossible (backend/auth down): Verify UI with mock data if appropriate, document the limitation in progress, and ensure the real API is used in code with no mocks left. Document and revert any temporary auth bypass or mocks.
⚠️ CRITICAL: Runtime Testing Requirements ⚠️
NEVER mark a task complete without demonstrating working functionality through runtime testing.
Protocol Violation: Marking tasks complete based on code review alone violates the "never mark complete without demonstrating working functionality" requirement.
Observed Issue: 25% of analyzed tasks were marked complete without runtime testing, leading to incomplete deliverables.
Mandatory Runtime Testing Enforcement
Before marking ANY task complete, you MUST:
-
Perform runtime functional test (MANDATORY - never skip)
- Code review alone is NOT sufficient
- TypeScript compilation passing is NOT sufficient
- Runtime testing is REQUIRED
-
Verify functionality works as expected
- Test the actual behavior, not just code structure
- Verify user-facing functionality works
- Test critical paths and interactions
-
Document runtime testing performed
- Note what was tested
- Document test results
- Include screenshots or evidence if applicable
If runtime testing cannot be performed:
- Document why testing was skipped
- Use alternative verification methods (see fallback strategies)
- Note limitation in progress.txt
- Do NOT mark complete unless alternative verification confirms functionality
Testing Requirements by Task Type
Web Applications (MANDATORY Requirements)
CRITICAL: Web applications MUST include browser testing with runtime verification.
Mandatory Verification Checklist:
Tools Required:
agent-browser (MANDATORY for web apps)
screenshot-handling (if visual verification needed)
Example Workflow:
npm run dev
agent-browser open http://localhost:5173
agent-browser snapshot -i
agent-browser click @e1
agent-browser fill @e2 "test"
agent-browser eval "document.querySelector('.result').textContent"
agent-browser console
Phaser Games (MANDATORY Requirements)
CRITICAL: Phaser games MUST include game runtime testing with test seam verification.
Mandatory Verification Checklist:
Tools Required:
phaser-game-testing (MANDATORY for Phaser games)
agent-browser (MANDATORY - Phaser games are web apps)
screenshot-handling (MANDATORY for visual tasks)
Example Workflow:
npm run dev
agent-browser open "http://localhost:5173?test=1&seed=42"
agent-browser eval "window.__TEST__?.ready || false"
agent-browser eval "window.__TEST__.commands.clickStartGame()"
agent-browser eval "window.__TEST__.commands.setTimer(5)"
agent-browser eval "window.__TEST__.commands.collectAnyCoin()"
agent-browser eval "window.__TEST__.gameState()"
mkdir -p screenshots/
agent-browser screenshot screenshots/game-test.png
Asset Generation Tasks (MANDATORY Requirements)
CRITICAL: Asset generation tasks MUST include integration verification and runtime testing.
Mandatory Verification Checklist:
Tools Required:
asset-integration-workflow (MANDATORY for asset tasks)
agent-browser (MANDATORY for runtime verification)
screenshot-handling (if visual verification needed)
Example Workflow:
ls -la public/assets/sprites/character.png
grep -r "character.png" src/
npm run dev
agent-browser open http://localhost:5173
agent-browser eval "window.__TEST__?.gameState()"
mkdir -p screenshots/
agent-browser screenshot screenshots/asset-verification.png
Backend Tasks
Mandatory Verification Checklist:
Tools Required:
- TypeScript compiler (MANDATORY)
- API testing tool (curl, Postman, or test client)
Error Handling Tasks
Mandatory Verification Checklist:
Tools Required:
agent-browser (for frontend error handling)
- Browser console inspection
Edge Cases
Always Test:
- Empty state handling
- Boundary conditions
- Error conditions
- Invalid input handling
- Network failure scenarios (if applicable)
When Testing Fails
Don't abandon testing on first failure:
-
Try alternative verification methods
- Test seam → Console logs → Screenshot → Code review
- See
references/fallback-strategies.md
-
Document why testing was skipped (if necessary)
- Note in progress.txt
- Explain fallback method used
-
Don't mark complete without verification
- At minimum, verify via code review
- Ensure TypeScript compilation passes
Task Reverification Workflow
When a task appears to be already complete, follow this standardized reverification workflow.
Observed Issue: Tasks may appear complete from code review but need functional verification to ensure they actually work.
Reverification Checklist
Before marking a task as "already complete", verify:
-
Read progress.txt entry for task
- Check if task was previously completed
- Review completion date and verification details
- Note any limitations or issues documented
-
Verify code still exists and matches description
- Code exists in expected location
- Code matches task description
- No regressions or breaking changes
-
Run functional tests (MANDATORY - never skip)
- Runtime functional test required
- Browser testing for web apps (MANDATORY)
- Game runtime testing for Phaser games (MANDATORY)
- Integration verification for assets (MANDATORY)
-
Verify TypeScript compilation
- Run
npx tsc --noEmit
- Fix any compilation errors
- Ensure no type errors
-
Check for regressions
- Verify existing functionality still works
- Test related features
- Check for breaking changes
-
Update task status appropriately
- Document reverification results
- Update progress.txt with findings
- Mark as verified or note issues found
Decision Framework: Verify vs Re-Implement
Verify Existing Implementation When:
- ✅ Code exists and matches task description
- ✅ Recent completion (within last few days)
- ✅ Code structure looks correct
- ✅ No obvious issues in code review
Re-Implement When:
- ❌ Code is missing or incomplete
- ❌ Code doesn't match task description
- ❌ Old completion (weeks/months ago)
- ❌ Code has obvious issues or bugs
- ❌ Functionality doesn't work in runtime testing
Functional Testing Requirements for Reverification
CRITICAL: Always test runtime functionality, even for "already complete" tasks.
Never skip functional verification - code review alone is insufficient.
Reverification Testing Workflow:
## Task Reverification: [Task Name]
**Status**: Appears complete from code review
**Reverification Steps**:
1. ✅ Code exists: [File location]
2. ✅ Code matches description: [Verification details]
3. ✅ TypeScript compilation: [Result]
4. ✅ Runtime functional test: [Test details]
- Browser testing: [Results]
- Test seam verification: [Results]
- Visual validation: [Screenshots]
5. ✅ No regressions: [Verification details]
**Decision**: Verify existing / Re-implement
**Action**: [What was done]
Example Reverification Workflow
Scenario: Task appears complete, need to verify
grep -A 10 "TASK-ID" tasks/progress.txt
grep -r "functionName" src/
read_file("src/scenes/GameScene.ts")
npx tsc --noEmit
npm run dev
agent-browser open "http://localhost:5173?test=1&seed=42"
agent-browser eval "window.__TEST__.commands.testFunctionality()"
agent-browser eval "window.__TEST__.gameState()"
agent-browser eval "window.__TEST__.commands.testExistingFeatures()"
Common Reverification Scenarios
Scenario 1: Code Exists, Needs Verification
- Code found in expected location
- Matches task description
- Action: Run functional tests, verify works correctly
- Decision: Verify existing implementation
Scenario 2: Code Exists, But Doesn't Work
- Code found but runtime testing fails
- Functionality doesn't work as expected
- Action: Fix issues or re-implement
- Decision: Re-implement or fix existing code
Scenario 3: Code Missing
- No code found for task
- Task marked complete but no implementation
- Action: Implement functionality
- Decision: Re-implement
Scenario 4: Code Exists, But Outdated
- Code exists but old completion date
- May have regressions or breaking changes
- Action: Verify functionality, fix if needed
- Decision: Verify and fix if needed
Verification Checklist Template
See references/verification-checklist.md for detailed checklist template.
Resources
references/verification-checklist.md - Detailed checklist template
references/fallback-strategies.md - Alternative verification methods