Run any Skill in Manus with one click

$pwd:

validate

Name: Validate
Author: alpaylan

// End-to-end sanity check — etna workload check passes, base builds, every variant is detected, every framework drives its crate

Run Skill in Manus

$ git log --oneline --stat

stars:3

forks:0

updated:May 6, 2026 at 02:34

SKILL.md

readonly

package.json

"author": "alpaylan"

"repository": "alpaylan/etna-ify"

View GitHub Repository

$ install --globalskills.sh

$ download --local

Run Skill in Manus

[HINT] Download the complete skill directory including SKILL.md and all related files

Run any Skill with one click

name	validate
description	End-to-end sanity check — etna workload check passes, base builds, every variant is detected, every framework drives its crate

Stage: Validate

Objective

Confirm the workload is coherent and runnable. Manifest/doc consistency is delegated to etna workload check; this stage's unique responsibility is the execution-level checks that require actually building and running things.

Preconditions

etna.toml exists at project root.
src/bin/etna.rs compiles.
BUGS.md and TASKS.md exist (regenerated by the document stage).
Project HEAD is the base commit. Per-variant state lives in marauders markers (in source) or patches/<variant>.patch (in the workdir) — there are no per-variant git branches.

Ordering

Run group A first. Only proceed to group B if A passes — a broken manifest will prevent B from even compiling. A failure in B never hides a failure in A.

Group A — delegated to `etna workload check`

Run etna workload check <project>. Must exit 0. The CLI groups findings by logical check name (variant_names, variant_set_matches_marauders, witnesses_and_properties, patch_files_exist, commits_resolve, docs_idempotent, variant_branches_descend, etc.); emit one check_passed / check_failed progress event summarising the group.

What this covers (do not duplicate in group B):

Variant-name regex (^[a-z][a-z0-9_]*_[0-9a-f]{7,40}_[0-9]+$).
Every variant in marauders list --path <project> appears in some [[tasks]].mutations, and vice-versa.
Every witnesses[].test_fn exists as fn <name> under <project>/src/ or <project>/tests/.
Every [[tasks.tasks]].property (PascalCase) maps to a pub fn property_<snake> in source.
Every [tasks.injection].patch file exists and git apply --check succeeds against base_commit.
source.commits[*] and base_commit resolve (git cat-file -e <sha>).
BUGS.md and TASKS.md regenerate with zero diff against what's checked in.
(No per-variant branches to ancestry-check; patch-kind variants are validated via git apply --check against base, marauders variants via marauders list.)

If etna workload check fails, the appropriate earlier stage must re-run: atomize for manifest-shape / witness-fn / property-fn issues, document for doc drift (usually just regenerate), runner for src/bin/etna.rs issues. Do not re-implement these checks in group B.

Group B — execution-level checks

Run these only after group A passes.

B1. Base builds and tests

cargo -C <project> build --release --bin etna
cargo -C <project> test
cargo -C <project> run --release --bin etna -- etna All

All three exit 0.

B2. Per-variant detection (the core assertion)

etna workload check cannot make this assertion because it doesn't run tests. Per-variant detection has to verify the property in both debug and release builds, because etna experiment run builds release while cargo test defaults to debug — and that gap is exactly where fake mutations slip through (e.g. integer-overflow bugs that panic in debug but silently wrap in release).

For each variant:

Base HEAD, debug: cargo test witness_<name>_case_ — all witness tests pass.
Base HEAD, release (witness symmetry under the experiment's actual build profile): for every Property mapped from this workload's variants, run cargo run --release --bin etna -- etna <Property> and confirm the JSON line has "status":"passed" or "status":"discarded" — never "failed".
Activate mutation:
- Marauders: marauders convert --path <file> --to functional, then M_<variant>=active cargo test witness_<name>_case_. Then cargo build --release --bin etna with the variant active and run ./target/release/etna etna <Property> for the property covering this variant.
- Patch: git apply patches/<variant>.patch against the working tree, then cargo test witness_<name>_case_, then cargo build --release --bin etna && ./target/release/etna etna <Property>. Revert via git apply -R patches/<variant>.patch.
Expected, debug: at least one witness test fails under the mutation.
Expected, release: the etna <Property> invocation returns "status":"failed" under the mutation.

Both checks matter:

Step 4 alone catches mutations whose bug is observable in debug. But overflow-related bugs (e.g. chrono / from_num_days_from_ce_overflow) only panic in debug because debug enables overflow-checks by default. In release the silent wrap can leave the property indistinguishable from the fix, the experiment finds nothing, and the variant is fake.
Step 5 catches that release-mode divergence. When step 4 surfaces a failure but step 5 doesn't, the workload's Cargo.toml needs [profile.release] overflow-checks = true (see chrono / bytes / hashbrown / rstar for the pattern), or the property needs tightening so it observes a non-panic symptom (out-of-domain inputs should Discard, not Pass, so the bug-trigger inputs aren't free-passed).

If either step's expectation isn't met, the variant is not detected. That's a validation failure — atomize must be redone for this variant or the property/profile fixed; there is no way to mask it.

After marauders testing, marauders convert --path <file> --to comment to restore.

(Placeholder-text symptoms: a frozen witness input of 0 / empty string can yield a false PASS under mutation because the witness never touches the mutated path. If a variant "passes" under mutation but obviously shouldn't, check the witness inputs for trivial defaults first.)

(Shared-property symptoms: when the same Property is reused across multiple mutations — i.e. [[tasks.tasks]].property repeats across [[tasks]] blocks in etna.toml — each mutation has its own witness input. The runner's check_<property>() in src/bin/etna.rs (or etna_runner/src/bin/etna.rs) must invoke the property once per witness, returning Err on the first violation. A check that hard-codes one witness only covers one mutation; the others appear to "pass" under their patched code in step 5 even though the bug is real. When validating each variant, confirm check_<property>() enumerates every distinct [[tasks.tasks]].witnesses[] for that property. Concrete miss observed in toml_edit ParseDoesNotPanic × 4 mutations.)

B3. etna binary end-to-end

cargo run --release --bin etna -- etna All
cargo run --release --bin etna -- proptest All
cargo run --release --bin etna -- quickcheck All
cargo run --release --bin etna -- crabcheck All
cargo run --release --bin etna -- hegel All

All exit 0 on base HEAD — including FAIL paths. Every invocation must print exactly one JSON line to stdout containing status, tests, time, and either counterexample or error. A PASS:/FAIL: text line instead of JSON means the runner hasn't been updated to the etna log_process_output contract (etna2/src/driver.rs:1400) — etna will skip it during line-by-line JSON parsing and only persist the process-exit-status abort record.

Smoke test at least one variant per framework by activating its mutation (marauders: marauders set --variant <name>; patch: git apply patches/<variant>.patch) and confirming the JSON line has "status":"failed" with a counterexample. After the smoke test, revert (marauders unset --variant <name> or git apply -R patches/<variant>.patch). tests should be plausibly > 1 — a hard-coded tests=1 across every framework is a strong signal of a stub delegating to the witness path.

Run etna experiment run --name <exp> --tests <stem> end-to-end and inspect the resulting store.jsonl: every record for ordered-float-style runs should have status ∈ {passed, failed}, not aborted. aborted in the store means the adapter exited non-zero or crashed — re-read the abort error field and fix the adapter, do not paper over it by re-running.

Cross-check for framework-gotcha symptoms (see runner skill's "Framework gotchas" table). The following patterns in store.jsonl mean a runner-level fix is needed, not a re-run:

"tool":"quickcheck" with "error":"timed out" at ~60s elapsed → missing .max_time(...) on the QuickCheck chain.
"tool":"hegel" with "error":"…Health check failure: data_too_large…" → missing .suppress_health_check(HealthCheck::all()).
counterexample containing "assertion failed: …" or "Property test failed: …" → missing catch_unwind wrapper inside the proptest / hegel property closure.
"tool":"crabcheck" with "tests":20000 exactly on long runs → crabcheck capped at its default; switch to quickcheck_with_config.

B4. Adapter-body reality check

A framework listed in the public usage help must actually drive its own crate. Extract each run_<tool>_property function body from src/bin/etna.rs and verify the body references the expected crate identifier:

Adapter	Required identifier in body
`run_proptest_property`	`proptest::`
`run_quickcheck_property`	`QuickCheck` (type) or `quickcheck::`
`run_crabcheck_property`	`crabcheck::` or `cc::quickcheck`
`run_hegel_property`	`hegel::` or `Hegel::`

A body that only calls run_etna_property or check_<name>() is a stub. Silent stubs compile, pass the TODO/FIXME grep, and produce always-tests=1 numbers that silently misrepresent cross-framework comparison. Reject them. etna workload check does not read src/bin/etna.rs, so this check is uniquely etna-ify's job.

A reasonable implementation:

# Pseudocode for the check.
for tool in proptest quickcheck crabcheck hegel; do
    body=$(awk "/fn run_${tool}_property/,/^}/" src/bin/etna.rs)
    expected_token_for "$tool" || fail "stub: run_${tool}_property does not reference its framework crate"
done

Progress events

Emit to <project>/progress.jsonl per the contract in prompts/run.md. Validate fires one event per check plus a terminal event — the driver uses all_checks_passed as the "this run really did succeed" signal:

When	Event line
Starting validate stage	`{"stage":"validate","event":"start"}`
Check `k` passed	`{"stage":"validate","event":"check_passed","check":"<A. or Bn. short name>"}`
Check `k` failed	`{"stage":"validate","event":"check_failed","check":"<A. or Bn. short name>","error":"<msg>"}`
All checks passed (workload is valid)	`{"stage":"validate","event":"all_checks_passed"}`
Validate stage concluded with failures	`{"stage":"validate","event":"done","failed_checks":N}`

The check field values are:

"A. etna workload check" (single line covering the whole delegated group)
"B1. base builds and tests"
"B2. per-variant detection"
"B3. etna binary end-to-end"
"B4. adapter-body reality check"

all_checks_passed is the signal the overnight driver uses to override rc=124 into ok. Do not emit it unless A and every B check passed.

Output

Print a one-line summary per check:

[PASS] A. etna workload check
[PASS] B1. base builds and tests
[PASS] B2. per-variant detection (N variants)
[FAIL] B3. etna binary end-to-end — hegel emits non-JSON line
[PASS] B4. adapter-body reality check

And an overall [PASS] or [FAIL] footer. No JSON output required.

name	validate
description	End-to-end sanity check — etna workload check passes, base builds, every variant is detected, every framework drives its crate

Stage: Validate

Objective

Preconditions

etna.toml exists at project root.
src/bin/etna.rs compiles.
BUGS.md and TASKS.md exist (regenerated by the document stage).
Project HEAD is the base commit. Per-variant state lives in marauders markers (in source) or patches/<variant>.patch (in the workdir) — there are no per-variant git branches.

Ordering

Run group A first. Only proceed to group B if A passes — a broken manifest will prevent B from even compiling. A failure in B never hides a failure in A.

Group A — delegated to `etna workload check`

What this covers (do not duplicate in group B):

Variant-name regex (^[a-z][a-z0-9_]*_[0-9a-f]{7,40}_[0-9]+$).
Every variant in marauders list --path <project> appears in some [[tasks]].mutations, and vice-versa.
Every witnesses[].test_fn exists as fn <name> under <project>/src/ or <project>/tests/.
Every [[tasks.tasks]].property (PascalCase) maps to a pub fn property_<snake> in source.
Every [tasks.injection].patch file exists and git apply --check succeeds against base_commit.
source.commits[*] and base_commit resolve (git cat-file -e <sha>).
BUGS.md and TASKS.md regenerate with zero diff against what's checked in.
(No per-variant branches to ancestry-check; patch-kind variants are validated via git apply --check against base, marauders variants via marauders list.)

Group B — execution-level checks

Run these only after group A passes.

B1. Base builds and tests

cargo -C <project> build --release --bin etna
cargo -C <project> test
cargo -C <project> run --release --bin etna -- etna All

All three exit 0.

B2. Per-variant detection (the core assertion)

For each variant:

Base HEAD, debug: cargo test witness_<name>_case_ — all witness tests pass.
Base HEAD, release (witness symmetry under the experiment's actual build profile): for every Property mapped from this workload's variants, run cargo run --release --bin etna -- etna <Property> and confirm the JSON line has "status":"passed" or "status":"discarded" — never "failed".
Activate mutation:
- Marauders: marauders convert --path <file> --to functional, then M_<variant>=active cargo test witness_<name>_case_. Then cargo build --release --bin etna with the variant active and run ./target/release/etna etna <Property> for the property covering this variant.
- Patch: git apply patches/<variant>.patch against the working tree, then cargo test witness_<name>_case_, then cargo build --release --bin etna && ./target/release/etna etna <Property>. Revert via git apply -R patches/<variant>.patch.
Expected, debug: at least one witness test fails under the mutation.
Expected, release: the etna <Property> invocation returns "status":"failed" under the mutation.

Both checks matter:

Step 4 alone catches mutations whose bug is observable in debug. But overflow-related bugs (e.g. chrono / from_num_days_from_ce_overflow) only panic in debug because debug enables overflow-checks by default. In release the silent wrap can leave the property indistinguishable from the fix, the experiment finds nothing, and the variant is fake.
Step 5 catches that release-mode divergence. When step 4 surfaces a failure but step 5 doesn't, the workload's Cargo.toml needs [profile.release] overflow-checks = true (see chrono / bytes / hashbrown / rstar for the pattern), or the property needs tightening so it observes a non-panic symptom (out-of-domain inputs should Discard, not Pass, so the bug-trigger inputs aren't free-passed).

If either step's expectation isn't met, the variant is not detected. That's a validation failure — atomize must be redone for this variant or the property/profile fixed; there is no way to mask it.

After marauders testing, marauders convert --path <file> --to comment to restore.

B3. etna binary end-to-end

cargo run --release --bin etna -- etna All
cargo run --release --bin etna -- proptest All
cargo run --release --bin etna -- quickcheck All
cargo run --release --bin etna -- crabcheck All
cargo run --release --bin etna -- hegel All

Cross-check for framework-gotcha symptoms (see runner skill's "Framework gotchas" table). The following patterns in store.jsonl mean a runner-level fix is needed, not a re-run:

"tool":"quickcheck" with "error":"timed out" at ~60s elapsed → missing .max_time(...) on the QuickCheck chain.
"tool":"hegel" with "error":"…Health check failure: data_too_large…" → missing .suppress_health_check(HealthCheck::all()).
counterexample containing "assertion failed: …" or "Property test failed: …" → missing catch_unwind wrapper inside the proptest / hegel property closure.
"tool":"crabcheck" with "tests":20000 exactly on long runs → crabcheck capped at its default; switch to quickcheck_with_config.

B4. Adapter-body reality check

Adapter	Required identifier in body
`run_proptest_property`	`proptest::`
`run_quickcheck_property`	`QuickCheck` (type) or `quickcheck::`
`run_crabcheck_property`	`crabcheck::` or `cc::quickcheck`
`run_hegel_property`	`hegel::` or `Hegel::`

A reasonable implementation:

# Pseudocode for the check.
for tool in proptest quickcheck crabcheck hegel; do
    body=$(awk "/fn run_${tool}_property/,/^}/" src/bin/etna.rs)
    expected_token_for "$tool" || fail "stub: run_${tool}_property does not reference its framework crate"
done

Progress events

When	Event line
Starting validate stage	`{"stage":"validate","event":"start"}`
Check `k` passed	`{"stage":"validate","event":"check_passed","check":"<A. or Bn. short name>"}`
Check `k` failed	`{"stage":"validate","event":"check_failed","check":"<A. or Bn. short name>","error":"<msg>"}`
All checks passed (workload is valid)	`{"stage":"validate","event":"all_checks_passed"}`
Validate stage concluded with failures	`{"stage":"validate","event":"done","failed_checks":N}`

The check field values are:

"A. etna workload check" (single line covering the whole delegated group)
"B1. base builds and tests"
"B2. per-variant detection"
"B3. etna binary end-to-end"
"B4. adapter-body reality check"

all_checks_passed is the signal the overnight driver uses to override rc=124 into ok. Do not emit it unless A and every B check passed.

Output

Print a one-line summary per check:

[PASS] A. etna workload check
[PASS] B1. base builds and tests
[PASS] B2. per-variant detection (N variants)
[FAIL] B3. etna binary end-to-end — hegel emits non-JSON line
[PASS] B4. adapter-body reality check

And an overall [PASS] or [FAIL] footer. No JSON output required.

validate

Stage: Validate

Objective

Preconditions

Ordering

Group A — delegated to etna workload check

Group B — execution-level checks

B1. Base builds and tests

B2. Per-variant detection (the core assertion)

B3. etna binary end-to-end

B4. Adapter-body reality check

Progress events

Output

Stage: Validate

Objective

Preconditions

Ordering

Group A — delegated to etna workload check

Group B — execution-level checks

B1. Base builds and tests

B2. Per-variant detection (the core assertion)

B3. etna binary end-to-end

B4. Adapter-body reality check

Progress events

Output

Group A — delegated to `etna workload check`

Group A — delegated to `etna workload check`