بنقرة واحدة
mutation-testing
// Use when running mutation testing, killing mutants, verifying test quality, checking mutation score, or analyzing survivors after the test baseline is green
// Use when running mutation testing, killing mutants, verifying test quality, checking mutation score, or analyzing survivors after the test baseline is green
Use when implementing any feature or fix using TDD, before writing any implementation code
Use when domain logic leaks into API/Infrastructure, project references violate layer boundaries, or you need to decide between CQS (always), CQRS bus (complex domains), and DDD patterns (invariants and events).
Use when configuring Git hooks in .NET projects before team commits occur, to enforce commit message standards and code formatting automatically
Use before writing any test or implementation task, when observable behavior needs to be captured in business language scenarios and approved by the user before code begins
Use when writing tests from the outside-in, defining behavior before code, or any feature where tests should start from observable business behavior and let internal design emerge
Use when understanding project composition by language, measuring code change impact, or generating code statistics for CI/CD metrics
| name | mutation-testing |
| description | Use when running mutation testing, killing mutants, verifying test quality, checking mutation score, or analyzing survivors after the test baseline is green |
Add a third validation layer to Outside-In TDD workflow. Acceptance tests verify WHAT (observable behavior), Domain tests verify HOW (business rules), mutation testing verifies tests actually catch bugs.
Mutation testing introduces deliberate bugs (mutants) into source code, then runs the test suite. If tests fail, the mutant is killed ✓. If tests pass despite the bug, the mutant survives ✗ (test gap found).
Source code → introduce mutation → run tests
├── tests FAIL → mutant killed ✓
└── tests PASS → mutant survived ✗
A project with 100% code coverage can still have a 60% mutation score — meaning 40% of introduced bugs go undetected.
Run mutation testing after the relevant test baseline is green:
Never run on red baseline — mutation assumes tests work correctly first.
For .NET projects, Stryker.NET is the established mutation framework with excellent C# support. No config file needed — all options are passed via CLI.
Install (only if not already available):
# Check first — if this succeeds, skip installation entirely. Do NOT manipulate PATH.
dotnet stryker --version
# Only run if the above command fails (tool not found)
dotnet tool install -g dotnet-stryker
Run on changed code only (default workflow — use after every story):
# Mutate only files changed since main — fast, targeted
dotnet stryker \
--project src/YourProject.Domain/YourProject.Domain.csproj \
-tp tests/YourProject.UnitTests/YourProject.UnitTests.csproj \
--mutate "**/*.cs" --mutate "!**/*Marker.cs" --mutate "!**/DependencyInjection.cs" \
--since:main \
--break-at 100 \
-r json
--since:main— only mutants within git-diff vsmainare tested. Unchanged code produces no result. Fast.
Run full business logic (use before merge):
dotnet stryker \
--project src/YourProject.Core/YourProject.Core.csproj \
-tp tests/YourProject.UnitTests/YourProject.UnitTests.csproj \
--mutate "src/YourProject.Core/**/*.cs" \
--mutate "src/YourProject.Application/**/*.cs" \
--mutate "!**/*Marker.cs" --mutate "!**/DependencyInjection.cs" \
--break-at 100 \
--threshold-high 90 --threshold-low 80 \
-r json -r cleartext
Cumulative baseline — full picture after incremental runs:
# --with-baseline combines --since with a persistent baseline report
# Use this in CI to keep a full history while only re-testing changed code
dotnet stryker \
--project src/YourProject.Core/YourProject.Core.csproj \
-tp tests/YourProject.UnitTests/YourProject.UnitTests.csproj \
--with-baseline:main \
--break-at 100 \
-r json
--with-baseline=--since+ saves/loads a baseline report. Gives a complete score even when only changed files were re-tested.
Build a custom tool only when:
Architecture (3 modules):
+ → -, true → false, >= → >)For full custom tool reference, see Uncle Bob's empire-2025 mutation testing.
| Category | Examples |
|---|---|
| Arithmetic | + ↔ -, * ↔ /, ++ ↔ -- |
| Comparison | > ↔ >=, < ↔ <=, == ↔ != |
| Boolean | true ↔ false, && ↔ ||, !x ↔ x |
| Conditional | negate conditions, swap if/else branches |
| Constant | 0 ↔ 1, "" ↔ "mutant", null ↔ new() |
| Return value | return true → return false |
| Void method | remove method call entirely |
| LINQ | .Any() ↔ .All(), .First() ↔ .Last() |
Universal prerequisite — applies to every step, every scenario: Before any mutation activity (first run, CI setup, killing survivors, analyzing reports), the test suite for the affected scope must be green. If tests are failing, fix them first. Mutation results on a red baseline are meaningless — failing tests cannot kill mutants they already can't run.
Before running mutation testing, confirm:
Target critical business logic first:
Exclude from mutation:
Progressive scoping:
| Phase | Scope | Goal |
|---|---|---|
| Week 1-2 | One critical rule module | Baseline + learning |
| Week 3-4 | All core rule modules | Establish quality gate |
| Ongoing | Core + critical orchestration handlers | Full confidence |
During development (fast, on changed code only):
dotnet stryker \
--project src/YourProject.Core/YourProject.Core.csproj \
-tp tests/YourProject.UnitTests/YourProject.UnitTests.csproj \
--since:main \
--break-at 100 \
-r json
Before merge (full business logic scope):
dotnet stryker \
--project src/YourProject.Core/YourProject.Core.csproj \
-tp tests/YourProject.UnitTests/YourProject.UnitTests.csproj \
--mutate "src/YourProject.Core/**/*.cs" \
--mutate "src/YourProject.Application/**/*.cs" \
--mutate "!**/*Marker.cs" --mutate "!**/DependencyInjection.cs" \
--break-at 100 \
-r json -r cleartext
Metrics:
--sincenote: unchanged files produce no result — this is expected. Survivors and kills only apply to the diff scope.
Expected duration: --since run: ~1-3 min. Full run: ~5-15 min (depends on test suite speed).
Query survivors directly from the JSON report — do not read the full file:
jq '[.files | to_entries[] | {file: .key, survivors: [.value.mutants[] | select(.status == "Survived") | {mutator: .mutatorName, line: .location.start.line, replacement: .replacement}]}] | map(select(.survivors | length > 0))' \
StrykerOutput/$(ls -t StrykerOutput | head -1)/reports/mutation-report.json
For each surviving mutant:
>= → >, removed if branch)Equivalent mutant examples:
x = x + 0 changed to x = x + 1 (dead code)After classifying survivors, always include a targeted re-run command scoped to the files that contain real gaps — this confirms kills after you write new tests and gives reviewers a runnable artifact:
dotnet stryker \
--project <YourProject.Domain.csproj> \
-tp <path/to/YourProject.UnitTests/YourProject.UnitTests.csproj> \
--mutate "**/<FileWithRealGap>.cs" \
--mutate "!**/*Marker.cs" --mutate "!**/DependencyInjection.cs" \
--break-at 100 \
-r cleartext
For each real survivor (not equivalent):
Example:
Survivor: if (age >= 18) mutated to if (age > 18) → survived
// New test to kill the boundary mutant
[Fact]
public void WhenDriverIsExactly18_ShouldBeEligible()
{
var policy = new EligibilityPolicy();
var driver = new DriverInfo(Age: 18, LicenseYears: 1);
var vehicle = new VehicleInfo(Type: "sedan", Age: 1);
var result = policy.Evaluate(driver, vehicle);
Assert.True(result.IsEligible); // Fails if mutant uses `age > 18`
}
Present summary with before/after metrics:
Mutation Testing Report — Core Business Layer
═══════════════════════════════════════
Scope: YourProject.Core.Policies
Score: 68% → 82% (after killing survivors)
Killed: 82 / 100
Survived: 18 → 10
New tests added: 8
- Boundary tests for age/experience thresholds: 4
- Edge cases for vehicle type combinations: 3
- Null/empty validation: 1
Remaining survivors (equivalent mutants — documented):
- EligibilityPolicy.cs:L42 — removed log statement (no observable effect)
- DriverAge.cs:L15 — defensive null check (guaranteed non-null by type)
Document legitimate survivors in code comments or architecture decision records.
Set thresholds based on team policy and risk profile. Common practice is to start with a progressive threshold and tighten it over time.
| Score | Assessment | Action |
|---|---|---|
| High threshold met | Healthy signal | Keep survivor review discipline |
| Near threshold | Potential gaps | Add targeted tests for risky survivors |
| Below threshold | Quality risk | Block merge or require mitigation plan |
Equivalent mutants are the only legitimate exception — document them explicitly.
| Phase | Threshold | Enforcement |
|---|---|---|
| Week 1-2 | Baseline only | Measure, learn mutation categories |
| Week 3-4 | Team-defined threshold (e.g., 80%) | Block PR if below |
| Month 2 | Tightened threshold (e.g., 90%) | Ramp up |
| Steady state | Risk-based target per module | Block merge when policy is not met |
CI/CD integration:
# In CI pipeline - fail build if below 100%
dotnet stryker --break-at [team-threshold]
When the CI gate fails, it means survivors remain. Do not raise the threshold to pass — investigate each survivor first. Classify them as real gap (write a test) or equivalent mutant (document). Only equivalent mutants are an acceptable reason to adjust the threshold.
Mutation testing is the third validation layer:
1. Gherkin scenarios (WHAT) → Acceptance tests
2. Business rules (HOW) → Domain tests
3. Test effectiveness (REAL?) → Mutation testing
Workflow integration:
No. Fix failing tests first. Mutation assumes a green baseline.
Aggressive targets can be appropriate for critical logic, but thresholds are a policy decision. Equivalent mutants remain the only valid exception to survivor cleanup.
Never mutate repositories, adapters, and pure plumbing. Focus on business logic first.
Too slow. Run on feature completion or weekly. CI runs only on PR.
Rationalization. Most survivors are real gaps. Investigate each one.
Mutation score is a signal, not the goal. Focus on killing mutants that represent real behavioral gaps.
| Mistake | Fix |
|---|---|
| Running mutation on failing tests | Green baseline required — fix tests first |
| Mutating test files | Configure Stryker to mutate source only |
| Treating all survivors as equivalent | Only equivalent mutants are exempt — document them, kill the rest |
| Mutation testing without fast tests | Optimize test speed — slow tests = slow mutations |
| Not scoping mutations progressively | Start small (one policy), expand gradually |
| Accepting < 100% on business logic | 100% is the target — find the gap and test it |
Install / update Stryker.NET:
dotnet tool install -g dotnet-stryker
dotnet tool update -g dotnet-stryker
On changed code only (fast — during development):
dotnet stryker \
--project <YourProject.Domain.csproj> \
-tp <path/to/YourProject.UnitTests/YourProject.UnitTests.csproj> \
--since:main \
--break-at 100 \
-r json
Full business logic scope (before merge):
dotnet stryker \
--project <YourProject.Domain.csproj> \
-tp <path/to/YourProject.UnitTests/YourProject.UnitTests.csproj> \
--mutate "src/<YourProject>.Domain/**/*.cs" \
--mutate "src/<YourProject>.Application/**/*.cs" \
--mutate "!**/*Marker.cs" --mutate "!**/DependencyInjection.cs" \
--break-at 100 \
--threshold-high 100 --threshold-low 100 \
-r json -r cleartext
Cumulative baseline in CI (full picture + incremental speed):
dotnet stryker \
--project <YourProject.Domain.csproj> \
-tp <path/to/YourProject.UnitTests/YourProject.UnitTests.csproj> \
--with-baseline:main \
--break-at 100 \
-r json
Scope to a specific file or feature (debug a survivor):
dotnet stryker \
--project <YourProject.Domain.csproj> \
-tp <path/to/YourProject.UnitTests/YourProject.UnitTests.csproj> \
--mutate "**/<TargetFile>.cs" \
--mutate "!**/*Marker.cs" --mutate "!**/DependencyInjection.cs" \
--break-at 100 \
-r cleartext
Inspect JSON report:
jq '.' StrykerOutput/**/reports/mutation-report.json | head -n 120
Key CLI flags reference:
| Flag | Short | Purpose |
|---|---|---|
--project <name.csproj> | -p | Source project to mutate (filename only) |
--test-project <path> | -tp | Test project(s) — repeatable |
--mutate <glob> | -m | Include/exclude files (prefix ! to exclude) — repeatable |
--since:<committish> | Only test mutants in git-diff vs committish | |
--with-baseline:<committish> | Like --since + persist baseline for full cumulative report | |
--break-at <0-100> | -b | Exit code 1 if score < value |
--threshold-high <0-100> | Score ≥ this → green | |
--threshold-low <0-100> | Score < high but ≥ this → warning | |
--reporter <name> | -r | json, cleartext, dots, markdown, html — repeatable |
--concurrency <n> | -c | Parallel worker count |
--verbosity <level> | -V | error, warning, info, debug, trace |
REQUIRED BACKGROUND: superpowers-whetstone:outside-in-tdd — defines the two test streams (Application + Domain)
REQUIRED BACKGROUND: superpowers-whetstone:red-synthesize-green — TDD cycle that produces tests to mutate
WORKFLOW:
Run mutation testing after story completion, before PR/merge. Use as quality gate, not coverage metric.