| name | test-review |
| description | Evaluates test suites for coverage gaps, TDD/BDD compliance, and anti-patterns. Use when auditing test quality or before a major release. |
| alwaysApply | false |
| category | testing |
| tags | ["testing","tdd","bdd","coverage","quality","fixtures"] |
| tools | [] |
| usage_patterns | ["test-audit","coverage-analysis","quality-improvement","gap-remediation"] |
| complexity | intermediate |
| model_hint | standard |
| estimated_tokens | 200 |
| progressive_loading | true |
| dependencies | ["pensive:shared","imbue:proof-of-work","imbue:review-core","imbue:structured-output"] |
| modules | ["modules/framework-detection.md","modules/coverage-analysis.md","modules/scenario-quality.md","modules/remediation-planning.md","modules/content-assertion-quality.md"] |
Table of Contents
Test Review Workflow
Evaluate and improve test suites with TDD/BDD rigor.
Quick Start
/test-review
Verification: Run pytest -v to verify tests pass.
When To Use
- Reviewing test suite quality
- Analyzing coverage gaps
- Before major releases
- After test failures
- Planning test improvements
When NOT To Use
- Writing new tests - use parseltongue:python-testing
- Updating existing tests - use sanctum:test-updates
Required TodoWrite Items
test-review:languages-detected
test-review:coverage-inventoried
test-review:scenario-quality
test-review:invariant-preservation
test-review:gap-remediation
test-review:evidence-logged
test-review:findings-verified
Progressive Loading
Load modules as needed based on review depth:
- Basic review: Core workflow (this file)
- Framework detection: Load
modules/framework-detection.md
- Coverage analysis: Load
modules/coverage-analysis.md
- Quality assessment: Load
modules/scenario-quality.md
- Remediation planning: Load
modules/remediation-planning.md
Workflow
Step 1: Detect Languages (test-review:languages-detected)
Identify testing frameworks and version constraints.
→ See: modules/framework-detection.md
Quick check:
find . -maxdepth 2 -name "Cargo.toml" -o -name "pyproject.toml" -o -name "package.json" -o -name "go.mod"
Verification: Run the command with --help flag to verify availability.
Step 2: Inventory Coverage (test-review:coverage-inventoried)
Run coverage tools and identify gaps.
→ See: modules/coverage-analysis.md
Quick check:
git diff --name-only | rg 'tests|spec|feature'
Verification: Run pytest -v to verify tests pass.
Step 3: Assess Scenario Quality (test-review:scenario-quality)
Evaluate test quality using BDD patterns and assertion checks.
→ See: modules/scenario-quality.md
Focus on:
- Given/When/Then clarity
- Assertion specificity
- Anti-patterns (dead waits, mocking internals, repeated boilerplate)
Step 4: Plan Remediation (test-review:gap-remediation)
Create concrete improvement plan with owners and dates.
→ See: modules/remediation-planning.md
Step 5: Log Evidence (test-review:evidence-logged)
Record executed commands, outputs, and recommendations.
→ See: imbue:proof-of-work
Test Quality Checklist (Condensed)
Invariant-Encoding Tests
Tests encode design invariants as well as verifying behavior.
A test that asserts "module A never imports from module B"
encodes a layer boundary. A test that
asserts "this function is pure" encodes a concurrency
model. These tests are load-bearing in ways that
coverage metrics cannot capture.
During review, check:
-
Were invariant-encoding tests removed or weakened?
A test that enforced an architectural boundary,
data structure constraint, or API contract should
not be deleted without naming the invariant being
abandoned and escalating to human judgment.
-
Were test expectations changed to match a broken
implementation? If an assertion value changed, ask:
did the requirement change, or did the agent change
the test to make its code pass? The latter is the
single most dangerous form of test tampering.
-
Are new invariants encoded as tests? When a design
decision is made (choice of data structure, module
boundary, error strategy), there should be at least
one test whose failure would signal that the
invariant was violated.
Red flag patterns:
| Pattern | Risk |
|---|
@pytest.mark.skip added to a passing test | Invariant being silently dropped |
| Assertion changed from specific to broad | Constraint being relaxed |
| Test renamed to describe new behavior | Old invariant erased from history |
| Test deleted "because it tested old code" | Invariant removed without replacement |
When invariant erosion is detected:
Do NOT approve. Flag as a BLOCKING quality issue and
present the three options to the human:
- Preserve: Revert the test change, fix the
implementation to satisfy the invariant
- Layer: Keep the invariant test, add the new
behavior alongside it (accepting inelegance)
- Revise: The invariant is genuinely wrong; remove
the old test AND write a new test encoding the
replacement invariant
This is a judgment call that models get wrong far too
often. Default to option 1 (preserve) when no human is
available.
Output Format
## Summary
[Brief assessment]
## Framework Detection
- Languages: [list] | Frameworks: [list] | Versions: [constraints]
## Coverage Analysis
- Overall: X% | Critical: X% | Gaps: [list]
## Quality Issues
[Q1] [Issue] - Location - Anchor: `verbatim source text at file:line` - Fix
## Remediation Plan
1. [Action] - Owner - Date
## Recommendation
Approve / Approve with actions / Block
Verification: Run the command with --help flag to verify availability.
Integration Notes
- Use
imbue:proof-of-work for reproducible evidence capture
- Reference
imbue:diff-analysis for risk assessment
- Format output using
imbue:structured-output patterns
Verify Findings Are Grounded (test-review:findings-verified)
Every finding must cite a real location and a verbatim anchor. Write
findings to .review/findings.json and confirm each citation resolves:
python plugins/imbue/scripts/citation_verifier.py \
--findings .review/findings.json --repo-root .
Drop or label UNVERIFIED any finding the verifier fails (exit 1); only
verified findings enter the report. See Skill(imbue:review-core) Step 5
and Skill(imbue:structured-output) for the schema.
Exit Criteria
- Frameworks detected and documented
- Coverage analyzed and gaps identified
- Scenario quality assessed
- Remediation plan created with owners and dates
- Evidence logged with citations
- Every reported finding carries a
Location + verbatim Anchor confirmed
by citation_verifier.py (exit 0), or unverified findings were dropped
or labeled UNVERIFIED
Troubleshooting
Common Issues
Tests not discovered
Ensure test files match pattern test_*.py or *_test.py. Run pytest --collect-only to verify.
Import errors
Check that the module being tested is in PYTHONPATH or install with pip install -e .
Async tests failing
Install pytest-asyncio and decorate test functions with @pytest.mark.asyncio