| name | skill-verify |
| description | Evidence before claims — run verification commands before declaring work complete, fixed, or passing |
Host: Codex CLI — This skill was designed for Claude Code and adapted for Codex.
Cross-reference commands use installed skill names in Codex rather than /octo:* slash commands.
Use the active Codex shell and subagent tools. Do not claim a provider, model, or host subagent is available until the current session exposes it.
For host tool equivalents, see skills/blocks/codex-host-adapter.md.
Verification Gate
The Iron Law
NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE
NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE
If you haven't run the verification command in this turn, you cannot claim it passes.
The Gate
Before claiming any success or expressing satisfaction:
- IDENTIFY — What command proves this claim?
- RUN — Execute the full command (fresh, not cached)
- READ — Full output, check exit code, count failures
- VERIFY — Does output actually confirm the claim?
- ONLY THEN — State the claim WITH evidence
Skip any step = the claim is unverified.
What Counts as Evidence
| Claim | Requires | NOT Sufficient |
|---|
| Tests pass | Test command output showing 0 failures | Previous run, "should pass" |
| Build succeeds | Build command exit 0 | Linter passing |
| Bug fixed | Reproduce original symptom: now passes | "Code changed, should work" |
| Regression test works | Red (fail without fix) → Green (pass with fix) | Test passes once |
| Subagent completed task | git diff shows expected changes | Subagent says "done" |
| Requirements met | Line-by-line checklist against spec | Tests passing |
| Provider dispatch worked | Output contains expected content | No error ≠ success |
Red Flags — STOP and Verify
If you catch yourself thinking any of these, STOP:
| Thought | What to do instead |
|---|
| "Should work now" | Run the verification |
| "I'm confident" | Confidence ≠ evidence |
| "Just this once" | No exceptions |
| "The linter passed" | Linter ≠ tests ≠ build |
| "The agent said it worked" | Verify independently |
| "It's a small change" | Small changes cause big bugs |
Multi-Provider Context
In Claude Octopus workflows, verification is especially critical because:
- Provider outputs can be hallucinated — Codex/Gemini/Copilot may claim success without evidence
- Consensus ≠ correctness — three models agreeing doesn't mean they're right
- Synthesis files may be stale — check timestamps, don't assume freshness
- orchestrate.sh exit code 0 ≠ quality — the script ran, but did it produce good output?
After any multi-provider workflow:
ls -la ~/.claude-octopus/results/*-synthesis-*.md | tail -1
wc -l ~/.claude-octopus/results/*-synthesis-*.md | tail -1
When to Apply
ALWAYS before:
- Committing code
- Creating PRs
- Marking tasks complete
- Moving to next workflow phase
- Reporting results to user
- Claiming a bug is fixed
In orchestrate.sh workflows:
- After
probe (discover) — verify synthesis file exists
- After
grasp (define) — verify consensus score meets threshold
- After
tangle (develop) — verify tests pass, not just that code was written
- After
ink (deliver) — verify review actually ran, not just that it was dispatched
Examples
Correct: Evidence-Based Claim
$ npm test
✓ user.create() saves to database (45ms)
✓ user.create() validates email (12ms)
Tests: 2 passed, 2 total
All 2 tests pass. ← Claim backed by output.
Incorrect: Claim Without Evidence
I've implemented the feature. It should work now. The tests should pass.
← No test was run. "Should" is not evidence.
Correct: Regression Test Red-Green
1. Write test → run → FAIL (expected, proves test detects the bug)
2. Implement fix → run → PASS (proves fix works)
3. Revert fix → run → FAIL (proves test isn't false-positive)
4. Restore fix → run → PASS (final confirmation)
Integration with Other Skills
This skill is referenced by:
flow-develop.md — verification gate after implementation
flow-deliver.md — verification gate before delivery
skill-code-review.md — verify review findings before reporting
skill-tdd.md — red-green cycle requires evidence at each step
skill-factory.md — autonomous pipeline must verify at every phase