ワンクリックで
adjustment-proof
Use when proving, rejecting, or deleting backend adjustment registry entries with real LLM evidence and independent disabled-adjustment worktrees.
メニュー
Use when proving, rejecting, or deleting backend adjustment registry entries with real LLM evidence and independent disabled-adjustment worktrees.
| name | adjustment-proof |
| description | Use when proving, rejecting, or deleting backend adjustment registry entries with real LLM evidence and independent disabled-adjustment worktrees. |
Use this skill before accepting any claim that a backend adjustment is proven necessary or safe to delete. A subagent report, PR prose, or one successful organic run is not proof. Proof requires an isolated red/green comparator for each adjustment.
This skill extends:
.claude/skills/zfc-adjuster/SKILL.md.claude/skills/evidence-standards.md.claude/skills/zfc-leveling-roadmap/SKILL.md for level-up/rewards changesAllowed verdicts:
proven: Current-head green evidence passes, and a separate worktree with
exactly one adjustment disabled fails for the matching behavior.delete_candidate: Current-head green evidence passes, and a separate
worktree with exactly one adjustment disabled also passes. This is not enough
to delete by itself; it means the adjustment lacks positive necessity proof.insufficient: Required artifacts are missing, stale, non-real, not isolated,
or do not demonstrate the claimed behavior.Do not use delete from a single negative run. Deletion requires a separate
design decision after reviewing broader coverage, risk, and whether the code is
now redundant.
For each adjustment ID:
scripts/adjustment_proof_matrix.py validate.One disabled worktree per adjustment is mandatory. A bundle named for one adjustment cannot prove or delete another adjustment.
Each green and red evidence directory must contain:
run.jsonmetadata.jsonhttp_request_responses.jsonlllm_request_responses.jsonlFor LLM-interacting level-up/rewards paths, unit tests are supporting evidence
only. They cannot establish proven or delete_candidate.
Use the script to list registered adjustments:
./scripts/adjustment_proof_matrix.py list
Create a deterministic proof plan for one adjustment:
./scripts/adjustment_proof_matrix.py plan \
--adjustment-id level_up_atomicity.suppress_unpaired_rewards_box \
--test-command 'MCP_TEST_MODE=real MOCK_SERVICES_MODE=false ./venv/bin/python testing_mcp/core/test_level_up_organic.py --level-up-scenario single-organic'
Follow the generated commands to create independent worktrees. In the red worktree, disable only the named adjustment. Prefer the narrowest possible edit: remove or bypass the exact correction/suppression branch that implements the registry entry. Do not change prompts, test harnesses, unrelated adjusters, or scenario inputs in the red worktree.
After both runs finish, collect and validate:
./scripts/adjustment_proof_matrix.py collect \
--adjustment-id level_up_atomicity.suppress_unpaired_rewards_box \
--green-worktree /tmp/your-project.com/adjustment-proof/<sha>/<slug>/green \
--red-worktree /tmp/your-project.com/adjustment-proof/<sha>/<slug>/red-disabled \
--green-evidence /tmp/your-project.com/<branch>/<green-run>/iteration_001 \
--red-evidence /tmp/your-project.com/<branch>/<red-run>/iteration_001 \
--test-command 'MCP_TEST_MODE=real MOCK_SERVICES_MODE=false ./venv/bin/python testing_mcp/core/test_level_up_organic.py --level-up-scenario single-organic'
./scripts/adjustment_proof_matrix.py validate \
/tmp/your-project.com/adjustment-proof/<sha>/<slug>/proof_manifest.json
The validator emits one of the allowed verdicts. Use that verdict in the PR body or registry update; do not upgrade it manually.
Fail the proof as insufficient when:
run.json or metadata.jsonprovenIf red and green both pass, record delete_candidate and recommend either
deleting in a separate cleanup PR with broader evidence or keeping the registry
entry as runtime evidence missing until that cleanup decision is made.
Use precise wording:
proven: "Disabling <adjustment_id> alone caused <failure> in red while
the same command passed on green at <sha>."delete_candidate: "Disabling <adjustment_id> alone did not reproduce a
failure under this proof command. This does not prove deletion is safe across
all paths."insufficient: "Evidence did not isolate <adjustment_id> or lacked required
real-LLM artifacts."Never write "all blockers resolved" or "all adjustments proven" unless every registered adjustment has its own validated manifest.
Use when spawning, steering, or auditing Agent Orchestrator workers, especially when the user specifies exact AO parameters such as codex, runtime, project, claim-pr, or PR targets.
Checklist for dispatching AO workers — python venv, commit discipline, branch drift, and post-push CodeRabbit verification
Use when dispatching work through the Hermes gateway with /claw, especially when the task may resolve slash commands or hand off into AO worker orchestration.
Dispatch independent adversarial reviews for ZFC, ZFC leveling, and root-cause-first without duplicating their standards.
Run the Dark Factory DOT pipeline runner against a goal. Slash command: /factory. Implements StrongDM's Attractor pattern as an external Python runner — .dot files are the versioned artifact, sealed holdouts live in a separate repo, every step is recorded to CXDB, and the Healer clusters failures into diagnoses. Use when you want the goal_harness idea executed as a reproducible external pipeline instead of in-Claude subagent dispatch.
Run disk usage analysis and cleanup preview on the local Mac. Always validate snapshot coverage before quoting. Never delete without user approval.