en un clic
adjustment-proof
Use when proving, rejecting, or deleting backend adjustment registry entries with real LLM evidence and independent disabled-adjustment worktrees.
Menu
Use when proving, rejecting, or deleting backend adjustment registry entries with real LLM evidence and independent disabled-adjustment worktrees.
Use when spawning, steering, or auditing Agent Orchestrator workers, especially when the user specifies exact AO parameters such as codex, runtime, project, claim-pr, or PR targets.
Checklist for dispatching AO workers — python venv, commit discipline, branch drift, and post-push CodeRabbit verification
Use when dispatching work through the Hermes gateway with /claw, especially when the task may resolve slash commands or hand off into AO worker orchestration.
Dispatch independent adversarial reviews for ZFC, ZFC leveling, and root-cause-first without duplicating their standards.
Run the Dark Factory DOT pipeline runner against a goal. Slash command: /factory. Implements StrongDM's Attractor pattern as an external Python runner — .dot files are the versioned artifact, sealed holdouts live in a separate repo, every step is recorded to CXDB, and the Healer clusters failures into diagnoses. Use when you want the goal_harness idea executed as a reproducible external pipeline instead of in-Claude subagent dispatch.
Run disk usage analysis and cleanup preview on the local Mac. Always validate snapshot coverage before quoting. Never delete without user approval.
| name | adjustment-proof |
| description | Use when proving, rejecting, or deleting backend adjustment registry entries with real LLM evidence and independent disabled-adjustment worktrees. |
Use this skill before accepting any claim that a backend adjustment is proven necessary or safe to delete. A subagent report, PR prose, or one successful organic run is not proof. Proof requires an isolated red/green comparator for each adjustment.
This skill extends:
.claude/skills/zfc-adjuster/SKILL.md.claude/skills/evidence-standards.md.claude/skills/zfc-leveling-roadmap/SKILL.md for level-up/rewards changesAllowed verdicts:
proven: Current-head green evidence passes, and a separate worktree with
exactly one adjustment disabled fails for the matching behavior.delete_candidate: Current-head green evidence passes, and a separate
worktree with exactly one adjustment disabled also passes. This is not enough
to delete by itself; it means the adjustment lacks positive necessity proof.insufficient: Required artifacts are missing, stale, non-real, not isolated,
or do not demonstrate the claimed behavior.Do not use delete from a single negative run. Deletion requires a separate
design decision after reviewing broader coverage, risk, and whether the code is
now redundant.
For each adjustment ID:
scripts/adjustment_proof_matrix.py validate.One disabled worktree per adjustment is mandatory. A bundle named for one adjustment cannot prove or delete another adjustment.
Each green and red evidence directory must contain:
run.jsonmetadata.jsonhttp_request_responses.jsonlllm_request_responses.jsonlFor LLM-interacting level-up/rewards paths, unit tests are supporting evidence
only. They cannot establish proven or delete_candidate.
Use the script to list registered adjustments:
./scripts/adjustment_proof_matrix.py list
Create a deterministic proof plan for one adjustment:
./scripts/adjustment_proof_matrix.py plan \
--adjustment-id level_up_atomicity.suppress_unpaired_rewards_box \
--test-command 'MCP_TEST_MODE=real MOCK_SERVICES_MODE=false ./venv/bin/python testing_mcp/core/test_level_up_organic.py --level-up-scenario single-organic'
Follow the generated commands to create independent worktrees. In the red worktree, disable only the named adjustment. Prefer the narrowest possible edit: remove or bypass the exact correction/suppression branch that implements the registry entry. Do not change prompts, test harnesses, unrelated adjusters, or scenario inputs in the red worktree.
After both runs finish, collect and validate:
./scripts/adjustment_proof_matrix.py collect \
--adjustment-id level_up_atomicity.suppress_unpaired_rewards_box \
--green-worktree /tmp/your-project.com/adjustment-proof/<sha>/<slug>/green \
--red-worktree /tmp/your-project.com/adjustment-proof/<sha>/<slug>/red-disabled \
--green-evidence /tmp/your-project.com/<branch>/<green-run>/iteration_001 \
--red-evidence /tmp/your-project.com/<branch>/<red-run>/iteration_001 \
--test-command 'MCP_TEST_MODE=real MOCK_SERVICES_MODE=false ./venv/bin/python testing_mcp/core/test_level_up_organic.py --level-up-scenario single-organic'
./scripts/adjustment_proof_matrix.py validate \
/tmp/your-project.com/adjustment-proof/<sha>/<slug>/proof_manifest.json
The validator emits one of the allowed verdicts. Use that verdict in the PR body or registry update; do not upgrade it manually.
Fail the proof as insufficient when:
run.json or metadata.jsonprovenIf red and green both pass, record delete_candidate and recommend either
deleting in a separate cleanup PR with broader evidence or keeping the registry
entry as runtime evidence missing until that cleanup decision is made.
Use precise wording:
proven: "Disabling <adjustment_id> alone caused <failure> in red while
the same command passed on green at <sha>."delete_candidate: "Disabling <adjustment_id> alone did not reproduce a
failure under this proof command. This does not prove deletion is safe across
all paths."insufficient: "Evidence did not isolate <adjustment_id> or lacked required
real-LLM artifacts."Never write "all blockers resolved" or "all adjustments proven" unless every registered adjustment has its own validated manifest.