一键导入
gastrobrain-qa-manager
Systematize test execution, analyze failures, guide debugging, and maintain test suite health through structured checkpoint-driven processes
用 Codex 或 Claude 帮你安装 复制这段 Prompt,粘贴到 Codex、Claude 或其他助手里,让它检查 Skill 页面并帮你完成安装。
菜单
Systematize test execution, analyze failures, guide debugging, and maintain test suite health through structured checkpoint-driven processes
用 Codex 或 Claude 帮你安装 复制这段 Prompt,粘贴到 Codex、Claude 或其他助手里,让它检查 Skill 页面并帮你完成安装。
基于 SOC 职业分类
Implements database migrations using checkpoint-driven approach with verification at each step to ensure safe schema changes and data integrity
Organize, update, and enrich project documentation through checkpoint-driven audits and post-implementation updates ensuring consistency and completeness
Data-driven sprint planning and retrospective analysis for Gastrobrain using GitHub Project #3 issues, historical velocity patterns, and sprint estimation diary insights. Generates realistic sprint plans with capacity analysis, sequencing strategy, and risk assessment. Conducts structured sprint retrospectives with developer interviews, estimation accuracy analysis, and diary entry generation.
Execute Phase 1 (Analysis & Understanding) of issue roadmaps through systematic 5-checkpoint technical analysis
Generates actionable, phase-based implementation roadmaps for GitHub issues. Creates comprehensive markdown documents with checkbox lists for analysis, implementation, testing, and documentation phases. Applies Gastrobrain-specific testing, localization, and database conventions automatically.
Strategic partner for roadmap planning, priority setting, and project health management. Provides 'wide picture' perspective balancing feature development with code quality, technical debt, and long-term sustainability. Use for: 'Review the roadmap', 'Help me prioritize', 'Is my milestone balanced?', 'Should I focus on features or quality?', 'Strategic review of 0.X.Y'.
| name | gastrobrain-qa-manager |
| description | Systematize test execution, analyze failures, guide debugging, and maintain test suite health through structured checkpoint-driven processes |
| version | 1.0.0 |
| author | Gastrobrain Development Team |
| tags | ["testing","quality-assurance","debugging","coverage","test-health"] |
This skill systematizes test execution, failure analysis, debugging, and test suite health monitoring. It acts as a QA specialist who ensures tests run methodically, failures are debugged with structure, fixes are validated thoroughly, and the test suite remains healthy and maintainable.
Core philosophy: Testing is a discipline, not an afterthought. Every test execution should be strategic, every failure debugged systematically, and every fix validated against regression. Quality metrics should improve steadily over time.
Trigger phrases:
Test Execution:
Debugging:
Health Monitoring:
Use cases:
What this skill provides:
A Quality Assurance perspective that:
Gastrobrain-specific context:
Choose the appropriate level based on context:
| Level | Scope | Time | When to Use |
|---|---|---|---|
| 1: Quick | Specific tests + analyze | ~30s | During development, quick feedback |
| 2: Component | All tests in a directory | 2-5 min | After changing a component |
| 3: Full Suite | All unit + widget tests | 5-10 min | Pre-commit validation |
| 4: Integration | Everything + E2E | 10-15 min | Pre-merge validation |
What changed?
├── Single file → Level 1 (related tests only)
├── Multiple files in one component → Level 2 (component tests)
├── Multiple components → Level 3 (full suite)
└── Ready to merge → Level 4 (everything)
# Level 1: Quick Validation
flutter analyze
flutter test test/core/models/meal_test.dart
# Level 2: Component Testing
flutter test test/core/models/
flutter test test/core/services/
flutter test test/widgets/
# Level 3: Full Suite
flutter test
# Level 4: Integration
flutter test
flutter test integration_test/
Use when: Running tests at any level to validate changes.
Trigger: "Run tests" / "Verify changes"
|
Checkpoint 1: Test Selection (WAIT)
|
Checkpoint 2: Pre-Execution Checks (WAIT)
|
Checkpoint 3: Test Execution (WAIT)
|
Checkpoint 4: Results Analysis (WAIT)
|
Tests Complete → Debug if failures / Proceed if passing
Purpose: Choose appropriate test level and scope.
Actions:
Output format:
Test Execution Plan
CHECKPOINT 1: Test Selection
───────────────────────────────────────
Context: [What changed and why tests are needed]
Changes Detected:
- [file 1] ([type of change])
- [file 2] ([type of change])
Affected Components:
- [component 1]
- [component 2]
Recommended Test Level: [1/2/3/4] - [Level Name]
Tests to Run:
- [test category 1]: [specific test files or directories]
- [test category 2]: [specific test files or directories]
Estimated Duration: [X minutes]
Proceed with this test plan? (y/n/adjust)
Wait for user response before proceeding.
Purpose: Ensure the test environment is ready.
Actions:
flutter analyze for static analysisOutput format:
───────────────────────────────────────
CHECKPOINT 2: Pre-Execution Checks
Running pre-flight checks...
Static Analysis:
- flutter analyze: [X issues / No issues] [✓/✗]
Compilation:
- Test files compile: [Yes/No] [✓/✗]
Test Infrastructure:
- MockDatabaseHelper available: [Yes/No] [✓/✗]
- Test helpers available: [Yes/No] [✓/✗]
- Test fixtures accessible: [Yes/No] [✓/✗]
Environment:
- Flutter version: [version]
- Platform: [platform]
Pre-flight Status: [✓ READY / ✗ BLOCKED]
[If BLOCKED: Show specific issues and remediation steps]
Ready to execute? (y/n/fix issues)
Wait for user response before proceeding.
Purpose: Run the tests and collect results.
Actions:
Output format:
───────────────────────────────────────
CHECKPOINT 3: Test Execution
Executing: flutter test [args]
Results:
Passed: [X] tests
Failed: [Y] tests
Skipped: [Z] tests
Duration: [Xm Ys]
[If all passed:]
All tests passed! ✓
No failures to analyze.
Proceed to commit/merge? (y/n)
[If failures exist:]
Failed Tests:
1. [test_file.dart]:[line]
Test: "[test name]"
Error: [error message]
2. [test_file.dart]:[line]
Test: "[test name]"
Error: [error message]
[... list all failures ...]
Analyze failures? (y/n/re-run)
Wait for user response before proceeding.
Purpose: Categorize failures and determine next steps.
Actions:
Output format:
───────────────────────────────────────
CHECKPOINT 4: Results Analysis
Failure Analysis:
CRITICAL (blocks merge):
- [test]: [reason - e.g., null safety violation, data corruption]
IMPORTANT (should fix):
- [test]: [reason - e.g., expectation mismatch, logic error]
INVESTIGATE (may be flaky):
- [test]: [reason - e.g., timeout, intermittent failure]
Pattern Detection:
- [Pattern found, e.g., "All 3 failures in meal_service - likely same root cause"]
- [Pattern found, e.g., "Timeout in dialog test - possible flaky test"]
Recommended Action Plan:
1. [First action - fix critical failure]
2. [Second action - investigate pattern]
3. [Third action - check flaky test]
Proceed with debugging? (y/n/re-run specific tests)
Wait for user response before proceeding.
Use when: A test failure needs investigation and fixing.
Trigger: "Debug failing test" / Failure from test execution
|
Checkpoint 1: Failure Understanding (WAIT)
|
Checkpoint 2: Root Cause Hypothesis (WAIT)
|
Checkpoint 3: Investigation (WAIT)
|
Checkpoint 4: Fix Implementation (WAIT)
|
Checkpoint 5: Fix Validation (WAIT)
|
Test Fixed → Return to test execution
Purpose: Fully understand what the test expects and what went wrong.
Actions:
Output format:
Debugging Test Failure
CHECKPOINT 1: Failure Understanding
───────────────────────────────────────
Test File: [path/to/test.dart]
Test Name: "[test description]"
Line: [line number]
Test Purpose:
[What behavior is this test validating?]
Expected Behavior:
[What should happen according to the test]
Actual Behavior:
[What actually happened]
Error Message:
[Complete error message]
Stack Trace (key frames):
[Relevant stack trace lines]
Failure Type: [Null safety / Timeout / Expectation mismatch / Widget not found / Other]
Understanding clear? (y/n/need more context)
Wait for user response before proceeding.
Purpose: Form a hypothesis about why the test is failing.
Actions:
Output format:
───────────────────────────────────────
CHECKPOINT 2: Root Cause Hypothesis
Analyzing failure...
Recent Changes (potential causes):
- [file]: [change description] [relevance: High/Medium/Low]
- [file]: [change description] [relevance: High/Medium/Low]
Possible Causes:
1. [Code Issue] - Implementation is incorrect
Evidence: [what suggests this]
Likelihood: [High/Medium/Low]
2. [Test Issue] - Test expectation is wrong
Evidence: [what suggests this]
Likelihood: [High/Medium/Low]
3. [Setup Issue] - Test environment/mock problem
Evidence: [what suggests this]
Likelihood: [High/Medium/Low]
4. [Timing Issue] - Race condition or async problem
Evidence: [what suggests this]
Likelihood: [High/Medium/Low]
Primary Hypothesis: [#X - description]
Reasoning: [Why this is most likely]
Agree with hypothesis? (y/n/suggest alternative)
Wait for user response before proceeding.
Purpose: Verify the hypothesis with evidence.
Actions:
Output format:
───────────────────────────────────────
CHECKPOINT 3: Investigation
Investigation Plan:
Step 1: [What to check]
─────────────────────
Action: [Read file / Run command / Check state]
Expected if hypothesis correct: [what we expect to find]
[Execute step]
Finding: [what was found]
Verdict: [Confirms / Rejects / Inconclusive]
Step 2: [Next check]
─────────────────────
Action: [Read file / Run command / Check state]
Expected if hypothesis correct: [what we expect to find]
[Execute step]
Finding: [what was found]
Verdict: [Confirms / Rejects / Inconclusive]
[... continue investigation steps ...]
Hypothesis Status: [Confirmed / Rejected / Needs revision]
Root Cause: [Confirmed root cause description]
OR
Revised Hypothesis: [New hypothesis if original rejected]
Proceed to fix? (y/n/investigate more)
Wait for user response before proceeding.
Purpose: Apply the fix for the confirmed root cause.
Actions:
Output format:
───────────────────────────────────────
CHECKPOINT 4: Fix Implementation
Root Cause: [confirmed cause]
Fix Strategy: [Fix code / Fix test / Fix setup]
Fix Type:
- [ ] Code fix (implementation was wrong)
- [ ] Test fix (test expectation was wrong)
- [ ] Setup fix (mock/fixture was wrong)
- [ ] Both (code and test needed changes)
Files Modified:
- [file 1]: [change description]
- [file 2]: [change description]
Changes Made:
[file 1]:
[Show the specific changes - before/after or diff summary]
[file 2]:
[Show the specific changes]
Fix applied. Validate the fix? (y/n/revise)
Wait for user response before proceeding.
Purpose: Verify the fix resolves the issue without regression.
Actions:
Output format:
───────────────────────────────────────
CHECKPOINT 5: Fix Validation
Validation Plan:
1. Re-run failing test (must pass)
2. Run related tests (no regression)
3. Run component tests (broader check)
[4. Run full suite (if critical fix)]
Executing validation...
Phase 1 - Original Test:
[test name]: [PASS/FAIL]
Phase 2 - Related Tests:
[X/X] tests pass
Phase 3 - Component Tests:
[X/X] tests pass
[Phase 4 - Full Suite:]
[X/X] tests pass
Validation Result: [PASSED / FAILED]
[If PASSED:]
Fix validated! ✓
Summary:
- Root cause: [brief description]
- Fix applied: [what was changed]
- Regression check: Clean
Debugging complete! ✓
[If FAILED:]
Fix incomplete - [what still fails]
Options:
1. Revise the fix (go back to CP4)
2. Investigate further (go back to CP3)
3. Abandon and try different approach
Next action? (revise/investigate/new approach)
Wait for user response before proceeding.
Use when: Monitoring test suite quality and identifying improvements.
Trigger: "Check test suite health" / "Test metrics"
|
Checkpoint 1: Metrics Collection (WAIT)
|
Checkpoint 2: Health Assessment (WAIT)
|
Checkpoint 3: Improvement Plan (WAIT)
|
Health Report Complete
Purpose: Gather quantitative data about the test suite.
Actions:
Output format:
Test Suite Health Check
CHECKPOINT 1: Metrics Collection
───────────────────────────────────────
Running metrics collection...
Test Counts:
Total tests: [X]
Unit tests: [X] ([%])
Widget tests: [X] ([%])
Integration tests: [X] ([%])
Pass Rate:
Passed: [X] / [Total]
Failed: [X]
Skipped: [X]
Pass rate: [X%]
Execution Time:
Total duration: [Xm Ys]
Avg per test: [Xms]
Slowest test: [name] ([Xs])
Coverage:
Overall: [X%]
[Break down by component if available]
Metrics collected. Proceed to assessment? (y/n)
Wait for user response before proceeding.
Purpose: Evaluate test suite quality from the metrics.
Actions:
Output format:
───────────────────────────────────────
CHECKPOINT 2: Health Assessment
Health Indicators:
HEALTHY:
- [metric]: [value] [why healthy]
- [metric]: [value] [why healthy]
WATCH:
- [metric]: [value] [why concerning]
- [metric]: [value] [why concerning]
UNHEALTHY:
- [metric]: [value] [why problematic]
- [metric]: [value] [why problematic]
Test Distribution:
Unit : Widget : Integration = [X : Y : Z]
Assessment: [Balanced / Widget-light / Integration-heavy / etc.]
Ideal ratio: ~70:25:5
Coverage Gaps:
Components below 85%:
- [component]: [X%] - [what's missing]
- [component]: [X%] - [what's missing]
Flaky Test Candidates:
[Tests that have timed out or failed intermittently]
Overall Health: [GOOD / FAIR / NEEDS ATTENTION]
Assessment complete. Create improvement plan? (y/n)
Wait for user response before proceeding.
Purpose: Create actionable plan to improve test suite health.
Actions:
Output format:
───────────────────────────────────────
CHECKPOINT 3: Improvement Plan
Based on health assessment:
PRIORITY 1 - Fix Now:
1. [ ] [Task] - Impact: [description] - Est: [X hours]
2. [ ] [Task] - Impact: [description] - Est: [X hours]
PRIORITY 2 - Fix Soon:
3. [ ] [Task] - Impact: [description] - Est: [X hours]
4. [ ] [Task] - Impact: [description] - Est: [X hours]
PRIORITY 3 - Backlog:
5. [ ] [Task] - Impact: [description] - Est: [X hours]
6. [ ] [Task] - Impact: [description] - Est: [X hours]
Recommended Targets:
- Pass rate: Maintain >99%
- Coverage: Reach [X%] (current: [Y%])
- Flaky tests: Reduce to 0
- Execution time: Keep under [X minutes]
Create issues for these improvements? (y/n/select items)
Wait for user response before proceeding.
The QA Manager recognizes common failure patterns and provides targeted guidance.
Pattern: "Null check operator used on a null value"
"type 'Null' is not a subtype of type 'X'"
Common Causes:
1. Missing null check in production code
2. Mock not configured to return expected value
3. Test setup missing required field
Investigation Priority:
1. Check the variable that's null in stack trace
2. Trace where it should have been set
3. Check if mock returns null by default
Typical Fix: Add null check or fix mock setup
Pattern: "Test timed out after X seconds"
"pumpAndSettle timed out"
Common Causes:
1. Animation or timer preventing pumpAndSettle from completing
2. Async operation never completing
3. Infinite loop in widget rebuild
4. Missing pump() calls
Investigation Priority:
1. Check for animations (use pump(duration) instead of pumpAndSettle)
2. Verify all Futures complete
3. Check for setState loops
4. Run test in isolation
Typical Fix: Replace pumpAndSettle with explicit pump(duration)
Pattern: "Expected: X Actual: Y"
"Expected [X] items, found [Y]"
Common Causes:
1. Production code changed, test not updated
2. Test expectation was always wrong
3. Order-dependent assertion on unordered data
4. Locale/format difference
Investigation Priority:
1. Check git log for recent changes to tested code
2. Verify the expected value is correct
3. Check if data ordering matters
4. Check locale settings in test
Typical Fix: Update expectation or fix production code
Pattern: "No widget found with key [X]"
"Finder found zero widgets"
Common Causes:
1. Widget key changed in production code
2. Widget conditionally hidden (guard clause)
3. Widget not yet rendered (needs pump)
4. Looking in wrong widget subtree
Investigation Priority:
1. Verify key/text exists in current code
2. Check conditional rendering logic
3. Add pump() before finder
4. Check widget tree with debugDumpApp
Typical Fix: Update finder or fix rendering condition
Pattern: "setState() called after dispose()"
"Looking up a deactivated widget's ancestor"
Common Causes:
1. Async callback fires after widget disposed
2. Timer or stream not cancelled in dispose
3. Test navigates away before async completes
Investigation Priority:
1. Check dispose() method for cleanup
2. Check async callbacks for mounted check
3. Verify test awaits all async operations
Typical Fix: Add mounted check or cancel timers in dispose
See frameworks/failure_patterns.md for the complete catalog.
When a test fails intermittently:
| Cause | Symptoms | Fix |
|---|---|---|
| Timing dependency | Random timeouts | Use explicit pump durations |
| Shared state | Fails when run with others, passes alone | Isolate test setup |
| Platform dependency | Fails on CI but passes locally | Mock platform-specific code |
| Order dependency | Fails depending on test order | Fix test isolation |
| Network dependency | Fails when network slow | Mock all network calls |
1. Identify: Run test 10x to confirm flakiness
2. Isolate: Run test alone vs in suite
3. Diagnose: Match symptoms to known causes
4. Fix: Apply targeted fix
5. Verify: Run test 10x again to confirm stability
| Coverage Level | Assessment | Action |
|---|---|---|
| >90% | Excellent | Maintain |
| 85-90% | Good (target) | Continue improving |
| 75-85% | Acceptable | Prioritize gaps |
| <75% | Needs work | Create improvement plan |
Priority for coverage:
# Generate coverage report
flutter test --coverage
# View coverage (if lcov available)
genhtml coverage/lcov.info -o coverage/html
Always cover:
OK to skip:
When fixing a test or production code, assess regression risk:
| Change Type | Risk Level | Regression Scope |
|---|---|---|
| Test-only fix | Low | Same test file |
| Model change | Medium | All tests using that model |
| Service change | Medium-High | Service tests + widget tests using it |
| Database change | High | All data-dependent tests |
| UI structure change | Medium | Widget tests for that screen |
Fix Applied
|
├── Re-run failing test (must pass)
|
├── Risk: Low → Run same test file
├── Risk: Medium → Run component tests
├── Risk: High → Run full suite
|
└── Verify: 0 new failures introduced
Before marking a fix as complete:
See frameworks/regression_prevention.md for detailed framework.
Tests should mirror the lib/ directory structure:
test/
├── core/
│ ├── models/ (model unit tests)
│ │ ├── meal_test.dart
│ │ ├── recipe_test.dart
│ │ └── meal_type_test.dart
│ ├── services/ (service unit tests)
│ │ ├── meal_service_test.dart
│ │ └── recommendation_service_test.dart
│ └── database/ (database tests)
├── widgets/ (widget tests)
│ ├── meal_type_dropdown_test.dart
│ └── recipe_card_test.dart
├── screens/ (screen tests)
├── edge_cases/ (edge case tests)
│ ├── empty_states/
│ ├── boundary_conditions/
│ ├── error_scenarios/
│ ├── interaction_patterns/
│ └── data_integrity/
├── regression/ (regression tests)
├── helpers/ (test utilities)
│ ├── test_setup.dart
│ ├── mock_database_helper.dart
│ └── dialog_test_helpers.dart
└── fixtures/ (test data)
| Size | Assessment | Action |
|---|---|---|
| <100 lines | Fine | Normal |
| 100-300 lines | Good | Typical for comprehensive tests |
| 300-500 lines | Watch | Consider splitting by feature |
| >500 lines | Split | Break into focused test files |
{component}_test.dartgroup('[ComponentName]', () { ... })'displays all meal types'{screen}_{element}_field or {widget}_{element}_buttonSee examples/ directory for full walkthroughs:
example_1_systematic_execution.md - Running tests after a feature implementation
example_2_debugging_failure.md - Debugging a null safety test failure
example_3_health_monitoring.md - Test suite health check
Before marking test execution/debugging complete:
v1.0.0 (2026-02-07)