with one click
validate
// End-to-end sanity check — etna workload check passes, base builds, every variant is detected, every framework drives its crate
// End-to-end sanity check — etna workload check passes, base builds, every variant is detected, every framework drives its crate
[HINT] Download the complete skill directory including SKILL.md and all related files
| name | validate |
| description | End-to-end sanity check — etna workload check passes, base builds, every variant is detected, every framework drives its crate |
Confirm the workload is coherent and runnable. Manifest/doc consistency is delegated to etna workload check; this stage's unique responsibility is the execution-level checks that require actually building and running things.
etna.toml exists at project root.src/bin/etna.rs compiles.BUGS.md and TASKS.md exist (regenerated by the document stage).patches/<variant>.patch (in the workdir) — there are no per-variant git branches.Run group A first. Only proceed to group B if A passes — a broken manifest will prevent B from even compiling. A failure in B never hides a failure in A.
etna workload checkRun etna workload check <project>. Must exit 0. The CLI groups findings by logical check name (variant_names, variant_set_matches_marauders, witnesses_and_properties, patch_files_exist, commits_resolve, docs_idempotent, variant_branches_descend, etc.); emit one check_passed / check_failed progress event summarising the group.
What this covers (do not duplicate in group B):
^[a-z][a-z0-9_]*_[0-9a-f]{7,40}_[0-9]+$).marauders list --path <project> appears in some [[tasks]].mutations, and vice-versa.witnesses[].test_fn exists as fn <name> under <project>/src/ or <project>/tests/.[[tasks.tasks]].property (PascalCase) maps to a pub fn property_<snake> in source.[tasks.injection].patch file exists and git apply --check succeeds against base_commit.source.commits[*] and base_commit resolve (git cat-file -e <sha>).BUGS.md and TASKS.md regenerate with zero diff against what's checked in.git apply --check against base, marauders variants via marauders list.)If etna workload check fails, the appropriate earlier stage must re-run: atomize for manifest-shape / witness-fn / property-fn issues, document for doc drift (usually just regenerate), runner for src/bin/etna.rs issues. Do not re-implement these checks in group B.
Run these only after group A passes.
cargo -C <project> build --release --bin etna
cargo -C <project> test
cargo -C <project> run --release --bin etna -- etna All
All three exit 0.
etna workload check cannot make this assertion because it doesn't run tests. Per-variant detection has to verify the property in both debug and release builds, because etna experiment run builds release while cargo test defaults to debug — and that gap is exactly where fake mutations slip through (e.g. integer-overflow bugs that panic in debug but silently wrap in release).
For each variant:
cargo test witness_<name>_case_ — all witness tests pass.Property mapped from this workload's variants, run cargo run --release --bin etna -- etna <Property> and confirm the JSON line has "status":"passed" or "status":"discarded" — never "failed".marauders convert --path <file> --to functional, then M_<variant>=active cargo test witness_<name>_case_. Then cargo build --release --bin etna with the variant active and run ./target/release/etna etna <Property> for the property covering this variant.git apply patches/<variant>.patch against the working tree, then cargo test witness_<name>_case_, then cargo build --release --bin etna && ./target/release/etna etna <Property>. Revert via git apply -R patches/<variant>.patch.etna <Property> invocation returns "status":"failed" under the mutation.Both checks matter:
chrono / from_num_days_from_ce_overflow) only panic in debug because debug enables overflow-checks by default. In release the silent wrap can leave the property indistinguishable from the fix, the experiment finds nothing, and the variant is fake.Cargo.toml needs [profile.release] overflow-checks = true (see chrono / bytes / hashbrown / rstar for the pattern), or the property needs tightening so it observes a non-panic symptom (out-of-domain inputs should Discard, not Pass, so the bug-trigger inputs aren't free-passed).If either step's expectation isn't met, the variant is not detected. That's a validation failure — atomize must be redone for this variant or the property/profile fixed; there is no way to mask it.
After marauders testing, marauders convert --path <file> --to comment to restore.
(Placeholder-text symptoms: a frozen witness input of 0 / empty string can yield a false PASS under mutation because the witness never touches the mutated path. If a variant "passes" under mutation but obviously shouldn't, check the witness inputs for trivial defaults first.)
(Shared-property symptoms: when the same Property is reused across multiple mutations — i.e. [[tasks.tasks]].property repeats across [[tasks]] blocks in etna.toml — each mutation has its own witness input. The runner's check_<property>() in src/bin/etna.rs (or etna_runner/src/bin/etna.rs) must invoke the property once per witness, returning Err on the first violation. A check that hard-codes one witness only covers one mutation; the others appear to "pass" under their patched code in step 5 even though the bug is real. When validating each variant, confirm check_<property>() enumerates every distinct [[tasks.tasks]].witnesses[] for that property. Concrete miss observed in toml_edit ParseDoesNotPanic × 4 mutations.)
cargo run --release --bin etna -- etna All
cargo run --release --bin etna -- proptest All
cargo run --release --bin etna -- quickcheck All
cargo run --release --bin etna -- crabcheck All
cargo run --release --bin etna -- hegel All
All exit 0 on base HEAD — including FAIL paths. Every invocation must print exactly one JSON line to stdout containing status, tests, time, and either counterexample or error. A PASS:/FAIL: text line instead of JSON means the runner hasn't been updated to the etna log_process_output contract (etna2/src/driver.rs:1400) — etna will skip it during line-by-line JSON parsing and only persist the process-exit-status abort record.
Smoke test at least one variant per framework by activating its mutation (marauders: marauders set --variant <name>; patch: git apply patches/<variant>.patch) and confirming the JSON line has "status":"failed" with a counterexample. After the smoke test, revert (marauders unset --variant <name> or git apply -R patches/<variant>.patch). tests should be plausibly > 1 — a hard-coded tests=1 across every framework is a strong signal of a stub delegating to the witness path.
Run etna experiment run --name <exp> --tests <stem> end-to-end and inspect the resulting store.jsonl: every record for ordered-float-style runs should have status ∈ {passed, failed}, not aborted. aborted in the store means the adapter exited non-zero or crashed — re-read the abort error field and fix the adapter, do not paper over it by re-running.
Cross-check for framework-gotcha symptoms (see runner skill's "Framework gotchas" table). The following patterns in store.jsonl mean a runner-level fix is needed, not a re-run:
"tool":"quickcheck" with "error":"timed out" at ~60s elapsed → missing .max_time(...) on the QuickCheck chain."tool":"hegel" with "error":"…Health check failure: data_too_large…" → missing .suppress_health_check(HealthCheck::all()).counterexample containing "assertion failed: …" or "Property test failed: …" → missing catch_unwind wrapper inside the proptest / hegel property closure."tool":"crabcheck" with "tests":20000 exactly on long runs → crabcheck capped at its default; switch to quickcheck_with_config.A framework listed in the public usage help must actually drive its own crate. Extract each run_<tool>_property function body from src/bin/etna.rs and verify the body references the expected crate identifier:
| Adapter | Required identifier in body |
|---|---|
run_proptest_property | proptest:: |
run_quickcheck_property | QuickCheck (type) or quickcheck:: |
run_crabcheck_property | crabcheck:: or cc::quickcheck |
run_hegel_property | hegel:: or Hegel:: |
A body that only calls run_etna_property or check_<name>() is a stub. Silent stubs compile, pass the TODO/FIXME grep, and produce always-tests=1 numbers that silently misrepresent cross-framework comparison. Reject them. etna workload check does not read src/bin/etna.rs, so this check is uniquely etna-ify's job.
A reasonable implementation:
# Pseudocode for the check.
for tool in proptest quickcheck crabcheck hegel; do
body=$(awk "/fn run_${tool}_property/,/^}/" src/bin/etna.rs)
expected_token_for "$tool" || fail "stub: run_${tool}_property does not reference its framework crate"
done
Emit to <project>/progress.jsonl per the contract in prompts/run.md. Validate fires one event per check plus a terminal event — the driver uses all_checks_passed as the "this run really did succeed" signal:
| When | Event line |
|---|---|
| Starting validate stage | {"stage":"validate","event":"start"} |
Check k passed | {"stage":"validate","event":"check_passed","check":"<A. or Bn. short name>"} |
Check k failed | {"stage":"validate","event":"check_failed","check":"<A. or Bn. short name>","error":"<msg>"} |
| All checks passed (workload is valid) | {"stage":"validate","event":"all_checks_passed"} |
| Validate stage concluded with failures | {"stage":"validate","event":"done","failed_checks":N} |
The check field values are:
"A. etna workload check" (single line covering the whole delegated group)"B1. base builds and tests""B2. per-variant detection""B3. etna binary end-to-end""B4. adapter-body reality check"all_checks_passed is the signal the overnight driver uses to override rc=124 into ok. Do not emit it unless A and every B check passed.
Print a one-line summary per check:
[PASS] A. etna workload check
[PASS] B1. base builds and tests
[PASS] B2. per-variant detection (N variants)
[FAIL] B3. etna binary end-to-end — hegel emits non-JSON line
[PASS] B4. adapter-body reality check
And an overall [PASS] or [FAIL] footer. No JSON output required.