| name | adversarial-iteration |
| description | Use when iterating on a fix after an adversarial code review (Codex, human reviewer) surfaces design-level findings — not typos, lint, or formatter work. Frames the review/fix loop as implement -> [review -> REFLECT -> reproducer -> fix -> verify -> handoff] where each round runs in a fresh agent to prevent context accumulation. Provides a four-item cheap-checks list (timeline draw, upstream contract read, full lifecycle trace, variant enumeration) that catches recurring failure classes. No git commits during the loop — all changes commit together after zero-finding termination. Trigger when a review pass surfaces 2+ findings on a single change, when sibling bugs of an "already-fixed" issue keep appearing across rounds, or when the user asks "why did I miss this?". Do NOT use for single-line fixes where the cause is obvious from the diff, or for initial implementation work (use CLAUDE.md rules).
|
Adversarial Iteration
Work with adversarial review feedback instead of patching symptoms.
The loop
implement -> [spawn round agent -> review -> REFLECT -> reproducer -> fix -> verify -> return summary] --+
^ |
+-------- next round (fresh agent, clean context, re-reads files from disk) <--------------+
terminates when round agent reports ZERO_FINDINGS.
Step 1 (implement) is manual. Steps 2+ run inside /loop autonomously —
no AskUserQuestion, ExitPlanMode, or other user gates. When multiple
approaches exist, tag one "Recommended", pick it, continue. The user can
interrupt at any point; you never pause to ask.
If agents are unavailable, run the next review inline and re-read all
changed files before reviewing.
REFLECT
The unique step. Skipping it produces fixes that pass the named test but seed
the next finding in the same class.
For each finding, answer:
- What did I do wrong? One sentence, no euphemisms.
- Why did I miss it? Name the assumption encoded without verifying.
- Which
CLAUDE.md rule did I violate? Be specific.
- What cheap check would have caught it? (see next section)
Then across all findings: what's the unifying shape? Often a single cognitive
failure caused them all. Name it.
Common shapes:
- Solved the named problem, not the invariant.
- Yes-manned own design — round-1 yes-mans the user; round-2 yes-mans yourself.
- Generic abstraction without per-instantiation audit.
- Layer-ownership leak via assumed invariant.
- Tunnel vision on per-unit fix, missed system-level cooperation.
If reflection produces a new shape, add it to this list.
Cheap checks (do all four before writing fix code)
-
Timeline draw — For concurrent state changes, draw the interleaved
timeline. Find the check-act gap. Two cooperating booleans is almost always
wrong; use Once, compare_exchange, or a single owner.
-
Upstream contract read — For any value flowing in, read the producer's
contract (grep, 5 lines). Don't assume from the type name. Ask: does this
transformation belong in this layer?
-
Full lifecycle trace — For any fix depending on component cooperation,
trace init through teardown. List every component and what happens to
in-flight work. Especially: what stops first?
-
Variant enumeration — For any generic covering N types, list each
type's destructor/serializer/contract. Verify the generic satisfies every
one. If they differ non-trivially, prefer N type-specific wrappers or
parametrize over a closure.
Reproducer test
Write the test before the fix. Verify it FAILS on unfixed code — if it
passes, it's a tautology. The test must touch the system the bug lives in,
not just a helper in isolation. Do not commit yet; all changes commit together
after the loop terminates.
Plan-mode discipline
Draft a plan (in the response, not via ExitPlanMode) referencing:
- The REFLECT root-cause shape this fix addresses.
- The reproducer test (file + function) and its assertion.
- Which cheap check identified the design constraint.
- The chosen primitive and why it's the smallest satisfying change.
- Adjacent contracts that also need tests.
Verify
- Reproducer flips red -> green (necessary, not sufficient).
- Adjacent-contract tests from REFLECT also pass.
- If the next review surfaces a sibling of the same finding, REFLECT wasn't
deep enough — go back to REFLECT, don't just patch.
Round handoff (MANDATORY — blocks next round)
Do NOT commit during the loop. All changes accumulate as working-tree edits
and are committed once after the loop terminates with zero findings.
After verify passes, do these IN ORDER before spawning the next round.
Skipping any one is a protocol violation.
1. Audit comments
grep -rn 'round\|TODO.*round\|HACK\|FIXME\|debugging\|TEMP' <changed files>.
Remove round-N narrative, debugging scaffolding, and stale TODOs from the
working tree before the next agent reads those files.
2. Build round summary
Construct this structured summary for the orchestrator to carry forward:
### Round N complete
- Findings: <list of what the review surfaced>
- Root-cause shape: <the REFLECT unifying shape>
- Fixes: <what changed and why, 2-3 sentences max>
- Files changed: <paths>
- Tests: pass/fail
- Remaining risk: <anything REFLECT flagged but did not address>
3. Spawn fresh agent for next round
Read references/round-agent-prompt.md
for the agent prompt template. Populate it with:
- The round number
- All accumulated prior round summaries
- The list of changed files
Spawn a general-purpose agent (not codex:rescue — the round agent needs
Read/Write/Edit/Bash for the full REFLECT->fix cycle). The fresh context
window is the compaction mechanism — no /compact needed.
The agent runs its full round (review -> REFLECT -> reproducer -> fix ->
verify) and returns a structured round summary. The orchestrator evaluates
whether the summary contains ZERO_FINDINGS.
Termination
The loop ends when all three hold:
- Latest round agent reports
ZERO_FINDINGS after re-reading all changed files.
- Every finding has a reproducer test, verified green.
- Every adjacent-contract test from each REFLECT passes.
No other exit. Sibling findings recurring -> previous REFLECT was too shallow,
go back to REFLECT. Fix getting expensive -> plan the architectural change as
this iteration's step, same loop. Never propose "stop here, accept risk."
Anti-patterns
- Reflection-as-apology. "Should have been more careful" is not a root
cause. "Never read
ExecStdout::next" is.
- Test polarity flip. If the test was green on broken code, it's
confirmation bias, not a reproducer.
- One reproducer, no class coverage. If reflection identified 6 types and
only one has a test, the next review finds the other 5.
- Treating review as checklist. The review surfaces symptoms; REFLECT finds
the design choice that produced them all.
- Skipping agent spawn. Running the next review in the same context instead
of spawning a fresh agent. Accumulated context degrades review quality and
wastes tokens. The pull to "just run the review here" is strong — resist it.
A new agent re-reads files from disk, not from stale cached content.
- Bloated round summaries. Including full diffs or debug traces in the
round summary defeats the purpose. The summary is compressed signal, not a
transcript. Five lines per round, not fifty.
This project's codebase
- Adversarial review:
/codex:adversarial-review
- Loop runner:
/loop (dynamic-pacing, omit interval)
- Commit pattern: all changes commit together after zero-finding termination;
conversation history is the iteration audit trail.