| name | sim-run |
| description | Run deterministic simulation tests with progressive difficulty levels (sunny/stormy/radioactive) inspired by TigerBeetle VOPR — orchestrates seed management, workload selection, and invariant verification |
| user-invocable | true |
Run Deterministic Simulation Tests
Execute simulation tests with progressive difficulty levels, inspired by
TigerBeetle's VOPR three-level system. Manages seeds, selects workloads,
injects faults, and reports invariant violations.
Evidence Base
| Source | Principle |
|---|
| TigerBeetle VOPR | 3 levels: clean → faults → catastrophic; 3.3 sec sim = 39 min real-world |
| FoundationDB (SIGMOD 2021) | Tens of thousands of sims nightly, seeded determinism |
| sled (simulation.html) | Discrete-event simulation with priority queue |
| Eaton (practical DST guide) | Seeds break on code changes — use continuous exploration |
| CockroachDB (Jepsen lessons) | Workload design determines simulation effectiveness |
| Yuan et al. (OSDI 2014) | Error handling paths are highest-value targets |
When to Use
- Level 1 (Sunny Day): Before committing coordination changes (quick sanity check)
- Level 2 (Stormy): Nightly CI or before merging coordination PRs
- Level 3 (Radioactive): Before releases or after significant coordination changes
- When a coordination bug is suspected and you want to explore the state space
- When verifying a fix for a simulation-found bug
When NOT to Use
- Code that doesn't touch coordination, gossip, or shard lifecycle
- Detection engine changes (use
/run-fuzz instead)
- When simulation harness isn't implemented yet (use
/sim-scaffold first)
Prerequisites
Before running simulations, verify:
SimContext exists in gossip-contracts/src/sim/ (P1 prerequisite)
- At least one simulation harness exists (generated by
/sim-scaffold)
- The
test-support feature gate is configured
proptest is available as a dev-dependency
If prerequisites are missing, report what needs to be created and suggest
running /sim-scaffold first.
Procedure
Step 1: Select Level
| Level | Name | Faults | Seeds | Context |
|---|
| 1 | Sunny Day | None — perfect conditions | 10 | Per-PR, pre-commit, quick sanity |
| 2 | Stormy | Network partitions, lease expiry, message loss/reorder, clock skew | 100 | Nightly CI, pre-merge |
| 3 | Radioactive | Storage corruption, cascading failures, Byzantine faults, all Level 2 combined | 1000+ | Pre-release, deep exploration |
If the user doesn't specify a level:
- Default to Level 1 for quick checks
- Recommend Level 2 if the change touches lease/fence/split logic
- Recommend Level 3 if the change is a significant architectural modification
Step 2: Select Workload Templates
Each workload template exercises a specific coordination scenario. Select all
that are relevant to the changed code, or run all for comprehensive coverage.
| Template | Exercises | Priority |
|---|
| Full scan lifecycle | Acquire → checkpoint x N → complete | ALWAYS |
| Lease expiry during checkpoint | Acquire → delay → checkpoint with stale lease | HIGH — etcd Jepsen 3.4.3 pattern |
| Split during active scan | Acquire → checkpoint → split → verify children | HIGH — shard coverage invariant |
| Concurrent acquisition | 2+ workers acquire same shard | HIGH — fence monotonicity |
| Ambiguous failure | Network drop after mutation succeeds | MED — Jepsen Redis-Raft pattern |
| Park and unpark | Active → Parked → Active (reacquire) | MED — state machine completeness |
| Mass restart | All workers restart simultaneously | MED — Serf/memberlist edge case |
| Asymmetric partition | A→B works, B→A drops | LOW — Lifeguard false positives |
Step 3: Run Simulation
Execute tests with the selected level and seeds:
cargo test --features test-support -- --test-threads=1 sim::level1
SIM_SEEDS=100 SIM_LEVEL=2 cargo test --features test-support -- --test-threads=1 sim::level2
SIM_SEEDS=1000 SIM_LEVEL=3 cargo test --features test-support -- --test-threads=1 sim::level3
SIM_SEED=12345 cargo test --features test-support -- --test-threads=1 sim::specific_test
Key flags:
--test-threads=1: Required for deterministic execution (no thread interleaving)
SIM_SEEDS: Number of random seeds to explore
SIM_LEVEL: Fault injection intensity
SIM_SEED: Specific seed for reproduction
Step 4: Analyze Results
On success (all seeds pass):
SIMULATION REPORT — Level {N}
═════════════════════════════
Result: PASS
Seeds tested: {count}
Wall time: {duration}
Sim time: {total simulated time}
Time compression: {sim_time / wall_time}x
Workload coverage:
[x] Full scan lifecycle — {N} occurrences
[x] Lease expiry during checkpoint — {N} occurrences
[x] Split during active scan — {N} occurrences
[x] Concurrent acquisition — {N} occurrences
[ ] Asymmetric partition — 0 occurrences (not exercised)
Invariants verified:
- Mutual exclusion (lease) — checked {N} times
- Fence monotonicity — checked {N} times
- Shard coverage (no gaps) — checked {N} times
- Terminal irreversibility — checked {N} times
On failure:
SIMULATION REPORT — Level {N}
═════════════════════════════
Result: FAIL
Seeds tested: {count} ({failures} failures)
Unique bugs: {count}
Invariant violations:
┌─────────────────────────────┬───────┬─────────────────────┐
│ Invariant │ Count │ Seeds │
├─────────────────────────────┼───────┼─────────────────────┤
│ Mutual exclusion (lease) │ 3 │ 42, 1337, 99999 │
│ Fence monotonicity │ 1 │ 42 │
└─────────────────────────────┴───────┴─────────────────────┘
Reproduction commands:
SIM_SEED=42 cargo test --features test-support -- sim::level2::lease_expiry_during_checkpoint
SIM_SEED=1337 cargo test --features test-support -- sim::level2::concurrent_acquisition
Failure trace (seed=42):
t=0 Worker-1: acquire_shard(shard-A) → Ok(epoch=1)
t=5 Worker-2: acquire_shard(shard-A) → Ok(epoch=2)
t=7 FAULT: pause Worker-2 for 100 ticks
t=10 Worker-1: checkpoint(shard-A, epoch=1) → Ok ← VIOLATION: stale epoch accepted
t=107 Worker-2: resume
...
Step 5: Handle Failures
When a simulation failure is found:
- Minimize: Use proptest shrinking to find the minimal reproduction case
- Classify: Is this a real bug or a simulation artifact?
- Root cause: Trace the failure to specific code
- Fix: Apply the fix
- Verify: Re-run with the failing seed AND fresh seeds
- Do NOT save the specific seed as a regression test — seeds break on
code changes (Eaton, Antithesis). Instead, ensure the invariant check
catches the class of bug under continuous exploration.
Step 6: Coverage Assessment
After the run, assess workload coverage:
- Well-covered: Workload generated sufficient occurrences across seeds
- Under-covered: Workload generated fewer than 5 occurrences across all seeds
→ May need to tune workload generation probabilities
- Not exercised: Workload never triggered
→ May need explicit workload scenario or higher seed count
If critical workloads are under-covered, recommend:
- Increasing seed count
- Adding targeted workload scenarios
- Adjusting fault injection probabilities
Fault Injection Reference
Level 2 (Stormy) Faults
| Fault | Description | Implementation |
|---|
| Network partition | Messages between node subsets are dropped | NetworkFaults.partitions |
| Message loss | Random message drops (10-30%) | NetworkFaults.drop_rate |
| Message reorder | Messages arrive out of order | Random delay in event queue |
| Lease expiry | Force lease timeout mid-operation | Advance clock past lease deadline |
| Process pause | Freeze a worker for N ticks | Skip node in simulation loop |
| Clock skew | Nodes disagree on current time | Per-node clock offset |
Level 3 (Radioactive) Faults
All Level 2 faults plus:
| Fault | Description | Implementation |
|---|
| Storage corruption | Backend returns garbage data | Mock backend returns corrupted state |
| Cascading failure | Failure of one node triggers failures in others | Fault propagation rules |
| Byzantine | Node sends conflicting messages to different peers | Duplicate + modify messages |
| Split brain | Two partitions each believe they're the majority | Symmetric partition |
| Repeated crash-restart | Node crashes and restarts multiple times | Reset node state periodically |
Invariant Catalog
These are the core invariants that every simulation run must verify. They map
to the phase 2 spec invariant catalog.
| ID | Invariant | Check |
|---|
| S1 | Mutual exclusion — at most one active lease per shard | Count active leases per shard ≤ 1 |
| S2 | Fence monotonicity — epochs never decrease | Track max epoch per shard, assert non-decreasing |
| S3 | Terminal irreversibility — Done/Failed shards never transition | Assert no transitions from terminal states |
| S4 | Shard coverage — split children cover parent range exactly | Verify range algebra on split |
| S5 | Idempotency — duplicate OpId returns same result | Replay operations, assert identical results |
| S6 | Progress — under fair scheduling, work eventually completes | Assert completion within bounded time |
| L1 | Lease exclusivity — stale-epoch checkpoints are rejected | Attempt checkpoint with old epoch, assert rejection |
| L2 | Zombie rejection — restarted worker cannot use old lease | Kill and restart worker, assert old operations fail |
Seed Management Philosophy
Do NOT save specific seeds as regression tests.
Seeds break whenever code changes (Eaton survey, Antithesis blog). Instead:
- Run many fresh seeds continuously (exploration, not regression)
- When a bug is found, fix it and verify the fix with fresh seeds
- Ensure the invariant assertion catches the class of bug, not just the
specific sequence that triggered it
- The invariant check IS the regression test — not the seed
This is counterintuitive but well-established: FoundationDB, TigerBeetle,
and Antithesis all use continuous exploration over saved seeds.
Related Skills
/sim-review — Verify code is DST-compatible before running simulations
/sim-scaffold — Generate simulation harnesses
/jepsen-test — Complement DST with real-network cluster tests
/test-strategy — Choose between simulation and other test approaches