Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

sim-run

Étoiles2

Forks0

Mis à jour19 février 2026 à 16:50

Run deterministic simulation tests with progressive difficulty levels (sunny/stormy/radioactive) inspired by TigerBeetle VOPR — orchestrates seed management, workload selection, and invariant verification

Installation

Installer avec Codex ou Claude Copiez ce prompt, collez-le dans Codex, Claude ou un autre assistant, puis laissez-le vérifier la page du skill et l'installer pour vous.

Exécuter dans Manus

Source

ahrav

ahrav/scratch-scanner-rs

Ouvrir le dépôt GitHub Voir les dépôts du créateur

Téléchargement

Exécuter dans Manus

Métiers associésSOC

Basé sur la classification professionnelle SOC

Analystes en assurance qualité des logiciels et testeursProfessions informatiques et mathématiques·SOC 15-1253

SKILL.md

readonly

Plus depuis ce dépôt

même dépôt

doc-rigor

ahrav/scratch-scanner-rs

Write-then-verify documentation pipeline. Use when a user asks to improve comments or docs, explain algorithms or design choices, write or upgrade docstrings, or raise documentation quality for a codebase (especially Rust crates). Writes docs, then automatically verifies every claim against code reality using a fresh agent to eliminate confirmation bias.

2026-02-282

execute-review-findings

ahrav/scratch-scanner-rs

Use when you have code review findings, PR comments, or review reports that need to be systematically addressed — especially when there are multiple findings across different files and severities

2026-02-242

create-task

ahrav/scratch-scanner-rs

Use when creating any beads task — auto-researches the codebase, links related tasks, and produces a rich self-contained description from a structured template. Accepts minimal intent and outputs a complete task ready for agent implementation.

2026-02-242

execute-review-findings

ahrav/scratch-scanner-rs

Use when you have code review findings, PR comments, or review reports that need to be systematically addressed — especially when there are multiple findings across different files and severities

2026-02-242

plan-forge

ahrav/scratch-scanner-rs

Use when a task needs an implementation plan that is iteratively created and stress-tested through review-and-revise cycles before implementation begins — catches blind spots, incorrect codebase assumptions, unnecessary complexity, and performance pitfalls while changes are still cheap

2026-02-242

plan-review

ahrav/scratch-scanner-rs

Use when a markdown plan file exists and needs validation before implementation — catches design flaws, logic holes, footguns, unnecessary complexity, and performance concerns while changes are still cheap

2026-02-232

name	sim-run
description	Run deterministic simulation tests with progressive difficulty levels (sunny/stormy/radioactive) inspired by TigerBeetle VOPR — orchestrates seed management, workload selection, and invariant verification
user-invocable	true

Run Deterministic Simulation Tests

Execute simulation tests with progressive difficulty levels, inspired by TigerBeetle's VOPR three-level system. Manages seeds, selects workloads, injects faults, and reports invariant violations.

Evidence Base

Source	Principle
TigerBeetle VOPR	3 levels: clean → faults → catastrophic; 3.3 sec sim = 39 min real-world
FoundationDB (SIGMOD 2021)	Tens of thousands of sims nightly, seeded determinism
sled (simulation.html)	Discrete-event simulation with priority queue
Eaton (practical DST guide)	Seeds break on code changes — use continuous exploration
CockroachDB (Jepsen lessons)	Workload design determines simulation effectiveness
Yuan et al. (OSDI 2014)	Error handling paths are highest-value targets

When to Use

Level 1 (Sunny Day): Before committing coordination changes (quick sanity check)
Level 2 (Stormy): Nightly CI or before merging coordination PRs
Level 3 (Radioactive): Before releases or after significant coordination changes
When a coordination bug is suspected and you want to explore the state space
When verifying a fix for a simulation-found bug

When NOT to Use

Code that doesn't touch coordination, gossip, or shard lifecycle
Detection engine changes (use /run-fuzz instead)
When simulation harness isn't implemented yet (use /sim-scaffold first)

Prerequisites

Before running simulations, verify:

SimContext exists in gossip-contracts/src/sim/ (P1 prerequisite)
At least one simulation harness exists (generated by /sim-scaffold)
The test-support feature gate is configured
proptest is available as a dev-dependency

If prerequisites are missing, report what needs to be created and suggest running /sim-scaffold first.

Procedure

Step 1: Select Level

Level	Name	Faults	Seeds	Context
1	Sunny Day	None — perfect conditions	10	Per-PR, pre-commit, quick sanity
2	Stormy	Network partitions, lease expiry, message loss/reorder, clock skew	100	Nightly CI, pre-merge
3	Radioactive	Storage corruption, cascading failures, Byzantine faults, all Level 2 combined	1000+	Pre-release, deep exploration

If the user doesn't specify a level:

Default to Level 1 for quick checks
Recommend Level 2 if the change touches lease/fence/split logic
Recommend Level 3 if the change is a significant architectural modification

Step 2: Select Workload Templates

Each workload template exercises a specific coordination scenario. Select all that are relevant to the changed code, or run all for comprehensive coverage.

Template	Exercises	Priority
Full scan lifecycle	Acquire → checkpoint x N → complete	ALWAYS
Lease expiry during checkpoint	Acquire → delay → checkpoint with stale lease	HIGH — etcd Jepsen 3.4.3 pattern
Split during active scan	Acquire → checkpoint → split → verify children	HIGH — shard coverage invariant
Concurrent acquisition	2+ workers acquire same shard	HIGH — fence monotonicity
Ambiguous failure	Network drop after mutation succeeds	MED — Jepsen Redis-Raft pattern
Park and unpark	Active → Parked → Active (reacquire)	MED — state machine completeness
Mass restart	All workers restart simultaneously	MED — Serf/memberlist edge case
Asymmetric partition	A→B works, B→A drops	LOW — Lifeguard false positives

Step 3: Run Simulation

Execute tests with the selected level and seeds:

# Level 1 (Sunny Day) — quick sanity check
cargo test --features test-support -- --test-threads=1 sim::level1

# Level 2 (Stormy) — fault injection
SIM_SEEDS=100 SIM_LEVEL=2 cargo test --features test-support -- --test-threads=1 sim::level2

# Level 3 (Radioactive) — maximum chaos
SIM_SEEDS=1000 SIM_LEVEL=3 cargo test --features test-support -- --test-threads=1 sim::level3

# Reproduce a specific failure
SIM_SEED=12345 cargo test --features test-support -- --test-threads=1 sim::specific_test

Key flags:

--test-threads=1: Required for deterministic execution (no thread interleaving)
SIM_SEEDS: Number of random seeds to explore
SIM_LEVEL: Fault injection intensity
SIM_SEED: Specific seed for reproduction

Step 4: Analyze Results

On success (all seeds pass):

SIMULATION REPORT — Level {N}
═════════════════════════════
Result:           PASS
Seeds tested:     {count}
Wall time:        {duration}
Sim time:         {total simulated time}
Time compression: {sim_time / wall_time}x

Workload coverage:
  [x] Full scan lifecycle          — {N} occurrences
  [x] Lease expiry during checkpoint — {N} occurrences
  [x] Split during active scan     — {N} occurrences
  [x] Concurrent acquisition       — {N} occurrences
  [ ] Asymmetric partition          — 0 occurrences (not exercised)

Invariants verified:
  - Mutual exclusion (lease)       — checked {N} times
  - Fence monotonicity             — checked {N} times
  - Shard coverage (no gaps)       — checked {N} times
  - Terminal irreversibility       — checked {N} times

On failure:

SIMULATION REPORT — Level {N}
═════════════════════════════
Result:           FAIL
Seeds tested:     {count} ({failures} failures)
Unique bugs:      {count}

Invariant violations:
  ┌─────────────────────────────┬───────┬─────────────────────┐
  │ Invariant                   │ Count │ Seeds               │
  ├─────────────────────────────┼───────┼─────────────────────┤
  │ Mutual exclusion (lease)    │ 3     │ 42, 1337, 99999     │
  │ Fence monotonicity          │ 1     │ 42                  │
  └─────────────────────────────┴───────┴─────────────────────┘

Reproduction commands:
  SIM_SEED=42 cargo test --features test-support -- sim::level2::lease_expiry_during_checkpoint
  SIM_SEED=1337 cargo test --features test-support -- sim::level2::concurrent_acquisition

Failure trace (seed=42):
  t=0    Worker-1: acquire_shard(shard-A) → Ok(epoch=1)
  t=5    Worker-2: acquire_shard(shard-A) → Ok(epoch=2)
  t=7    FAULT: pause Worker-2 for 100 ticks
  t=10   Worker-1: checkpoint(shard-A, epoch=1) → Ok  ← VIOLATION: stale epoch accepted
  t=107  Worker-2: resume
  ...

Step 5: Handle Failures

When a simulation failure is found:

Minimize: Use proptest shrinking to find the minimal reproduction case
Classify: Is this a real bug or a simulation artifact?
Root cause: Trace the failure to specific code
Fix: Apply the fix
Verify: Re-run with the failing seed AND fresh seeds
Do NOT save the specific seed as a regression test — seeds break on code changes (Eaton, Antithesis). Instead, ensure the invariant check catches the class of bug under continuous exploration.

Step 6: Coverage Assessment

After the run, assess workload coverage:

Well-covered: Workload generated sufficient occurrences across seeds
Under-covered: Workload generated fewer than 5 occurrences across all seeds → May need to tune workload generation probabilities
Not exercised: Workload never triggered → May need explicit workload scenario or higher seed count

If critical workloads are under-covered, recommend:

Increasing seed count
Adding targeted workload scenarios
Adjusting fault injection probabilities

Fault Injection Reference

Level 2 (Stormy) Faults

Fault	Description	Implementation
Network partition	Messages between node subsets are dropped	`NetworkFaults.partitions`
Message loss	Random message drops (10-30%)	`NetworkFaults.drop_rate`
Message reorder	Messages arrive out of order	Random delay in event queue
Lease expiry	Force lease timeout mid-operation	Advance clock past lease deadline
Process pause	Freeze a worker for N ticks	Skip node in simulation loop
Clock skew	Nodes disagree on current time	Per-node clock offset

Level 3 (Radioactive) Faults

All Level 2 faults plus:

Fault	Description	Implementation
Storage corruption	Backend returns garbage data	Mock backend returns corrupted state
Cascading failure	Failure of one node triggers failures in others	Fault propagation rules
Byzantine	Node sends conflicting messages to different peers	Duplicate + modify messages
Split brain	Two partitions each believe they're the majority	Symmetric partition
Repeated crash-restart	Node crashes and restarts multiple times	Reset node state periodically

Invariant Catalog

These are the core invariants that every simulation run must verify. They map to the phase 2 spec invariant catalog.

ID	Invariant	Check
S1	Mutual exclusion — at most one active lease per shard	Count active leases per shard ≤ 1
S2	Fence monotonicity — epochs never decrease	Track max epoch per shard, assert non-decreasing
S3	Terminal irreversibility — Done/Failed shards never transition	Assert no transitions from terminal states
S4	Shard coverage — split children cover parent range exactly	Verify range algebra on split
S5	Idempotency — duplicate OpId returns same result	Replay operations, assert identical results
S6	Progress — under fair scheduling, work eventually completes	Assert completion within bounded time
L1	Lease exclusivity — stale-epoch checkpoints are rejected	Attempt checkpoint with old epoch, assert rejection
L2	Zombie rejection — restarted worker cannot use old lease	Kill and restart worker, assert old operations fail

Seed Management Philosophy

Do NOT save specific seeds as regression tests.

Seeds break whenever code changes (Eaton survey, Antithesis blog). Instead:

Run many fresh seeds continuously (exploration, not regression)
When a bug is found, fix it and verify the fix with fresh seeds
Ensure the invariant assertion catches the class of bug, not just the specific sequence that triggered it
The invariant check IS the regression test — not the seed

This is counterintuitive but well-established: FoundationDB, TigerBeetle, and Antithesis all use continuous exploration over saved seeds.

Related Skills

/sim-review — Verify code is DST-compatible before running simulations
/sim-scaffold — Generate simulation harnesses
/jepsen-test — Complement DST with real-network cluster tests
/test-strategy — Choose between simulation and other test approaches