name

adversarial-iteration

description

Use when iterating on a fix after an adversarial code review (Codex, human reviewer) surfaces design-level findings — not typos, lint, or formatter work. Frames the review/fix loop as implement -> [review -> REFLECT -> reproducer -> fix -> verify -> handoff] where each round runs in a fresh agent to prevent context accumulation. Provides a four-item cheap-checks list (timeline draw, upstream contract read, full lifecycle trace, variant enumeration) that catches recurring failure classes. No git commits during the loop — all changes commit together after zero-finding termination. Trigger when a review pass surfaces 2+ findings on a single change, when sibling bugs of an "already-fixed" issue keep appearing across rounds, or when the user asks "why did I miss this?". Do NOT use for single-line fixes where the cause is obvious from the diff, or for initial implementation work (use CLAUDE.md rules).

Adversarial Iteration

Work with adversarial review feedback instead of patching symptoms.

The loop

implement -> [spawn round agent -> review -> REFLECT -> reproducer -> fix -> verify -> return summary] --+
               ^                                                                                          |
               +-------- next round (fresh agent, clean context, re-reads files from disk) <--------------+

  terminates when round agent reports ZERO_FINDINGS.

Step 1 (implement) is manual. Steps 2+ run inside /loop autonomously — no AskUserQuestion, ExitPlanMode, or other user gates. When multiple approaches exist, tag one "Recommended", pick it, continue. The user can interrupt at any point; you never pause to ask.

If agents are unavailable, run the next review inline and re-read all changed files before reviewing.

REFLECT

The unique step. Skipping it produces fixes that pass the named test but seed the next finding in the same class.

For each finding, answer:

What did I do wrong? One sentence, no euphemisms.
Why did I miss it? Name the assumption encoded without verifying.
Which CLAUDE.md rule did I violate? Be specific.
What cheap check would have caught it? (see next section)

Then across all findings: what's the unifying shape? Often a single cognitive failure caused them all. Name it.

Common shapes:

Solved the named problem, not the invariant.
Yes-manned own design — round-1 yes-mans the user; round-2 yes-mans yourself.
Generic abstraction without per-instantiation audit.
Layer-ownership leak via assumed invariant.
Tunnel vision on per-unit fix, missed system-level cooperation.

If reflection produces a new shape, add it to this list.

Cheap checks (do all four before writing fix code)

Timeline draw — For concurrent state changes, draw the interleaved timeline. Find the check-act gap. Two cooperating booleans is almost always wrong; use Once, compare_exchange, or a single owner.
Upstream contract read — For any value flowing in, read the producer's contract (grep, 5 lines). Don't assume from the type name. Ask: does this transformation belong in this layer?
Full lifecycle trace — For any fix depending on component cooperation, trace init through teardown. List every component and what happens to in-flight work. Especially: what stops first?
Variant enumeration — For any generic covering N types, list each type's destructor/serializer/contract. Verify the generic satisfies every one. If they differ non-trivially, prefer N type-specific wrappers or parametrize over a closure.

Reproducer test

Write the test before the fix. Verify it FAILS on unfixed code — if it passes, it's a tautology. The test must touch the system the bug lives in, not just a helper in isolation. Do not commit yet; all changes commit together after the loop terminates.

Plan-mode discipline

Draft a plan (in the response, not via ExitPlanMode) referencing:

The REFLECT root-cause shape this fix addresses.
The reproducer test (file + function) and its assertion.
Which cheap check identified the design constraint.
The chosen primitive and why it's the smallest satisfying change.
Adjacent contracts that also need tests.

Verify

Reproducer flips red -> green (necessary, not sufficient).
Adjacent-contract tests from REFLECT also pass.
If the next review surfaces a sibling of the same finding, REFLECT wasn't deep enough — go back to REFLECT, don't just patch.

Round handoff (MANDATORY — blocks next round)

Do NOT commit during the loop. All changes accumulate as working-tree edits and are committed once after the loop terminates with zero findings.

After verify passes, do these IN ORDER before spawning the next round. Skipping any one is a protocol violation.

1. Audit comments

2. Build round summary

Construct this structured summary for the orchestrator to carry forward:

### Round N complete
- Findings: <list of what the review surfaced>
- Root-cause shape: <the REFLECT unifying shape>
- Fixes: <what changed and why, 2-3 sentences max>
- Files changed: <paths>
- Tests: pass/fail
- Remaining risk: <anything REFLECT flagged but did not address>

3. Spawn fresh agent for next round

Read references/round-agent-prompt.md for the agent prompt template. Populate it with:

The round number
All accumulated prior round summaries
The list of changed files

Spawn a general-purpose agent (not codex:rescue — the round agent needs Read/Write/Edit/Bash for the full REFLECT->fix cycle). The fresh context window is the compaction mechanism — no /compact needed.

The agent runs its full round (review -> REFLECT -> reproducer -> fix -> verify) and returns a structured round summary. The orchestrator evaluates whether the summary contains ZERO_FINDINGS.

Termination

The loop ends when all three hold:

Latest round agent reports ZERO_FINDINGS after re-reading all changed files.
Every finding has a reproducer test, verified green.
Every adjacent-contract test from each REFLECT passes.

No other exit. Sibling findings recurring -> previous REFLECT was too shallow, go back to REFLECT. Fix getting expensive -> plan the architectural change as this iteration's step, same loop. Never propose "stop here, accept risk."

Anti-patterns

Reflection-as-apology. "Should have been more careful" is not a root cause. "Never read ExecStdout::next" is.
Test polarity flip. If the test was green on broken code, it's confirmation bias, not a reproducer.
One reproducer, no class coverage. If reflection identified 6 types and only one has a test, the next review finds the other 5.
Treating review as checklist. The review surfaces symptoms; REFLECT finds the design choice that produced them all.
Skipping agent spawn. Running the next review in the same context instead of spawning a fresh agent. Accumulated context degrades review quality and wastes tokens. The pull to "just run the review here" is strong — resist it. A new agent re-reads files from disk, not from stale cached content.
Bloated round summaries. Including full diffs or debug traces in the round summary defeats the purpose. The summary is compressed signal, not a transcript. Five lines per round, not fifty.

This project's codebase

Adversarial review: /codex:adversarial-review
Loop runner: /loop (dynamic-pacing, omit interval)
Commit pattern: all changes commit together after zero-finding termination; conversation history is the iteration audit trail.