| name | test-consolidate |
| description | Consolidate verbose test suites by replacing repetitive unit tests with property-based tests, parameterized tests (rstest), or fuzz tests. Less code to maintain, same or better coverage. |
Test Consolidator
Analyze test modules for opportunities to replace many similar unit tests with
a single, more powerful testing construct — property-based tests, parameterized
tests, or fuzz tests. The goal is comprehensive coverage with minimal code to
maintain, review, and keep updated.
When to Use
- A test module has 5+ unit tests for the same function with different inputs
- Tests follow a pattern: call function with input X, assert output Y
- You notice copy-paste test bodies differing only in values
- Before a PR when test files are growing faster than source files
- After refactoring: old tests may cluster around the same behavior
- During periodic test hygiene alongside
/test-dedup
Philosophy
Tests should scale with behaviors, not with inputs.
If a function has 20 valid inputs worth testing, the answer is NOT 20 test
functions. The answer is one construct that covers all 20 — and ideally the
infinite space between them.
The hierarchy of consolidation (prefer higher):
-
Property-based test (proptest) — When you can state an invariant that
holds for ALL valid inputs. One proptest! replaces unbounded unit tests.
Maximum coverage, minimum code.
-
Parameterized test (rstest) — When you have a finite, important set of
(input, expected) pairs and no general invariant. One #[rstest] with a
#[case] list replaces N identical test bodies. Always prefer rstest
over table-driven tests — rstest gives you named sub-cases in test
output, better failure diagnostics, and no manual loop boilerplate.
-
Fuzz test — When exploring adversarial or untrusted input spaces. One
fuzz target can subsume hundreds of hand-crafted "weird input" tests.
-
Individual unit tests — The last resort. Only when each test truly
exercises unique setup, distinct error paths, or documents a specific bug.
Never consolidate for the sake of shorter code. The goal is less code to
maintain — meaning fewer places to update when the function signature changes,
fewer tests to rename when behavior evolves, and fewer copy-paste errors.
Analysis Process
Step 1: Identify Test Clusters
A test cluster is a group of tests that:
- Test the same function or method
- Have the same structure (setup → call → assert)
- Differ primarily in input values and expected outputs
- May also differ in minor setup variations
Scan the module and group tests into clusters. A single test can belong to
multiple clusters if it tests more than one function.
Cluster: parse_duration()
- test_parse_seconds → parse_duration("5s") == Duration::from_secs(5)
- test_parse_minutes → parse_duration("3m") == Duration::from_secs(180)
- test_parse_hours → parse_duration("2h") == Duration::from_secs(7200)
- test_parse_zero → parse_duration("0s") == Duration::ZERO
- test_parse_large → parse_duration("999h") == Duration::from_secs(3596400)
- test_parse_invalid_unit → parse_duration("5x") == Err(...)
- test_parse_empty → parse_duration("") == Err(...)
- test_parse_negative → parse_duration("-1s") == Err(...)
Step 2: Classify Each Cluster
For each cluster, determine the best consolidation strategy:
Can you state a universal property?
→ Property-based test. Examples:
- "parse then format roundtrips for all valid inputs"
- "output length is always ≤ input length"
- "sorted output is a permutation of input"
- "encoding never produces invalid UTF-8"
Is it a finite set of (input, expected) with no general property?
→ Parameterized test (rstest) or table-driven test. Examples:
- Known error codes mapping to messages
- Specific file extensions mapping to MIME types
- Configuration keys mapping to defaults
Are the tests exploring adversarial/edge inputs?
→ Fuzz test. Examples:
- "doesn't panic on any byte sequence"
- "doesn't allocate more than 10x input size"
Does each test have genuinely unique setup or assertions?
→ Keep as individual tests. Don't force consolidation.
Step 3: Evaluate Consolidation Candidates
For each cluster, answer:
| Question | If Yes | If No |
|---|
| Can I state one invariant covering all cases? | proptest | Next question |
| Are all test bodies structurally identical? | rstest | Partial consolidation |
| Do >3 tests share the same assertion pattern? | At minimum rstest | Probably keep individual |
| Would adding a new case require a new function? | Consolidate (adding cases should be trivial) | Fine as-is |
| Do tests differ only in values, not in logic? | Strong consolidation candidate | Keep separate |
Step 4: Choose the Right Tool
Property-Based (proptest) — Best for invariants
Use when: The assertion is about a property that holds regardless of input.
#[test] fn encode_ascii() { assert_eq!(encode("hello"), "aGVsbG8="); }
#[test] fn encode_empty() { assert_eq!(encode(""), ""); }
#[test] fn encode_unicode() { assert_eq!(encode("café"), "Y2Fmw6k="); }
#[test] fn decode_ascii() { assert_eq!(decode("aGVsbG8="), "hello"); }
#[test] fn decode_empty() { assert_eq!(decode(""), ""); }
#[test] fn decode_unicode() { assert_eq!(decode("Y2Fmw6k="), "café"); }
proptest! {
#[test]
fn roundtrip(input in "\\PC*") {
let encoded = encode(&input);
let decoded = decode(&encoded).unwrap();
prop_assert_eq!(input, decoded);
}
}
Parameterized (rstest) — Best for case tables
Use when: You have specific (input, expected) pairs with no general property.
#[test] fn status_200() { assert_eq!(status_text(200), "OK"); }
#[test] fn status_201() { assert_eq!(status_text(201), "Created"); }
#[test] fn status_400() { assert_eq!(status_text(400), "Bad Request"); }
#[test] fn status_404() { assert_eq!(status_text(404), "Not Found"); }
#[test] fn status_500() { assert_eq!(status_text(500), "Internal Server Error"); }
#[test] fn status_unknown() { assert_eq!(status_text(999), "Unknown"); }
#[test] fn status_zero() { assert_eq!(status_text(0), "Unknown"); }
#[rstest]
#[case(200, "OK")]
#[case(201, "Created")]
#[case(400, "Bad Request")]
#[case(404, "Not Found")]
#[case(500, "Internal Server Error")]
#[case(999, "Unknown")]
#[case(0, "Unknown")]
fn status_text_mapping(#[case] code: u16, #[case] expected: &str) {
assert_eq!(status_text(code), expected);
}
Fuzz Test — Best for adversarial input exploration
Use when: Tests are exploring "weird inputs" to find crashes.
#[test] fn parse_null_bytes() { let _ = parse(b"\x00\x00"); }
#[test] fn parse_huge_input() { let _ = parse(&vec![0xFF; 10_000]); }
#[test] fn parse_truncated() { let _ = parse(b"\x01\x02"); }
fuzz_target!(|data: &[u8]| {
let _ = parse(data);
});
Step 5: Check for Blockers
Before recommending consolidation, verify:
Step 6: Produce Consolidation Plan
For each cluster, specify:
- Current state: Number of tests, lines of test code
- Recommended strategy: Which tool and why
- What gets removed: Which specific tests are replaced
- What gets kept: Which tests survive and why
- New test code: The actual replacement test(s)
- Net effect: Lines removed vs added, maintenance burden change
Project-Specific Conventions
Test locations in this codebase
- Inline tests:
#[cfg(test)] mod tests { ... } at bottom of source file
- Separate test files:
src/stdx/*_tests.rs, src/engine/tests.rs
- Property tests: often in the same file, gated with
#[cfg(all(test, feature = "stdx-proptest"))]
- Kani proofs:
#[cfg(kani)] mod kani_proofs { ... } or in *_tests.rs
- Simulation tests:
tests/simulation/ directory
Feature gates
- Property tests:
stdx-proptest feature
- Kani proofs:
kani feature
- Simulation harnesses:
sim-harness, scheduler-sim
Dependencies to add if recommending new tools
- rstest: Add
rstest = "0.24" to [dev-dependencies] in Cargo.toml
(already present in eval-harness)
- proptest: Already available behind
stdx-proptest feature
- cargo-fuzz: External tool, no Cargo.toml change needed
Output Format
## Test Consolidation Report: [module/file]
### Cluster Analysis
#### Cluster 1: `function_name()` — N tests, M lines
**Tests in cluster:**
| # | Test | Input | Assertion |
|---|------|-------|-----------|
| 1 | `test_foo_basic` | "hello" | returns "HELLO" |
| 2 | `test_foo_empty` | "" | returns "" |
| 3 | `test_foo_unicode` | "café" | returns "CAFÉ" |
| ... | ... | ... | ... |
**Pattern detected:** All tests call `foo(input)` and assert exact output.
Inputs vary, assertion structure is identical.
**Recommended strategy:** Property-based test
**Rationale:** The invariant `foo(x).to_lowercase() == x.to_lowercase()` holds
for all inputs. A single proptest replaces all 8 unit tests.
**Consolidation:**
- REPLACE tests #1-#6 with proptest `prop_foo_case_invariant`
- KEEP `test_foo_empty` as anchor (documents empty-input behavior)
- KEEP `test_foo_regression_gh_42` (regression test, bug reference)
**Proposed code:**
```rust
proptest! {
#[test]
fn prop_foo_case_invariant(input in "\\PC{0,100}") {
prop_assert_eq!(foo(&input).to_lowercase(), input.to_lowercase());
}
}
Cluster 2: bar() — N tests, M lines
...
Summary
| Metric | Before | After | Change |
|---|
| Total tests | 34 | 12 | -22 (65%) |
| Test lines | 280 | 95 | -185 (66%) |
| Behaviors covered | 8 | 8 | No change |
| Input space covered | ~34 points | Continuous | Vastly improved |
Dependency Changes
Migration Order
- Add proptest for Cluster 1 (highest value: 8 tests → 1)
- Add rstest for Cluster 3 (6 tests → 1 parameterized)
- Add rstest for Cluster 5 (4 tests → 1 parameterized)
- Run full test suite to verify no coverage loss
- Delete subsumed unit tests
## Decision Heuristics
### When to prefer proptest over rstest
- You can state a property about the output without knowing the exact value
- The input space is continuous or very large
- Roundtrip properties exist (encode/decode, serialize/deserialize)
- Order/sorting/containment invariants exist
### When to prefer rstest over proptest
- Each case has a specific expected output that must be exact
- The set of important cases is finite and known
- The mapping is arbitrary (no mathematical relationship)
- You want each case to appear as a named sub-test in output
**Always prefer rstest over table-driven loops.** rstest gives you named
sub-cases in test output, better failure diagnostics with the exact failing
case visible in the test name, and no manual loop/assertion boilerplate.
There is no scenario where a hand-rolled `for` loop over a table is
preferable to `#[rstest]` with `#[case]` attributes.
### When to prefer fuzz over all else
- The function processes untrusted/external input
- The goal is "never panic" rather than "correct output"
- You've been writing tests that try to "trick" the parser
### When NOT to consolidate
- Each test has genuinely different setup logic (not just different values)
- Tests verify different error paths with different error types
- Tests are regression tests for specific bugs (keep the history)
- The "consolidated" version would be harder to understand than the originals
- There are only 2-3 tests — consolidation overhead isn't worth it
## Judgment Calls
- **Threshold**: Don't consolidate clusters of fewer than 4 tests unless the
consolidation is obviously cleaner (e.g., perfect roundtrip property).
- **Mixed clusters**: If 6 of 8 tests consolidate but 2 are genuinely unique,
consolidate the 6 and keep the 2. Don't force everything into one construct.
- **Error tests**: Tests for error cases often belong in a separate rstest or
table, not mixed into the happy-path property test. Group by success/failure.
- **Naming**: Consolidated tests should have clear names describing the
invariant or case set, not generic names like `test_all_cases`.
## Related Skills
- `/test-dedup` — Remove tests subsumed by higher-level tests (complementary)
- `/test-strategy` — Decide what kind of test to write for new code