| name | wave-review |
| description | Self-reflective code review of recent pipeline work (BB-3, Segmenter, or any future module). Checks architectural coherence, spec adherence, determinism invariants, test quality, and identifies refactor opportunities. Run every 5 tasks or at phase boundaries. |
wave-review โ Self-reflective review of recent work
You are reviewing the last N commits on the current feature branch to catch drift before it compounds. This is NOT rubber-stamping โ find real issues or explicitly say "nothing to fix".
Module detection: same as /task-next โ infer from branch name or ask. This determines which design spec and non-negotiables to check against.
Step 1 โ Scope the review
Default: last 5 commits on the current feature branch. If the user passed a different N or a commit range, use that.
git log --oneline -10 โ see recent history.
git diff <base>..HEAD --stat where <base> is the commit N back.
- Read each commit's message and diff.
Step 2 โ Review dimensions (go through each, explicitly)
A. Spec adherence
- Does every file created/modified match the TASKS.md card for its commit?
- Are all ยงX.Y spec anchors actually honored?
- For each non-negotiable on each card: satisfied in code? Check the card's acceptance checklist against the diff.
B. Pipeline invariants (module-specific)
Invoke /doctor and attach its output to the review. Then check the active module's invariants:
If BB-3 (normalizer):
- Any new
Date.now() / performance.now() / Math.random() outside runtimeStats?
- Any new
JSON.stringify in identity/hash paths?
- Any new RCA concept names (PricingProcedure, DecisionTable, CML, ContextDefinition)?
- Any new
string[] where NodeRef[] is the contract?
- Any new wall-clock timeouts on parsers?
- Any new imports in
@revbrain/migration-ir-contract beyond zod?
If Segmenter:
- Any new
Date.now() / Math.random() / localeCompare in segment-affecting code? (All sorting must use strict </>)
- Any cross-package imports from
bb3-normalizer? (Thin-dependency rule โ all algos must be local)
- Any hardcoded thresholds instead of reading from
SegmenterOptions?
- Does the edge classification table still cover all
IREdgeType values? (C9 check)
- Any new edge types unclassified into strong/ordering/hazard?
- Is hashing still length-prefixed + streaming? Any
JSON.stringify in the ID path?
If SI Billing:
- Any floating-point math (
parseFloat, toFixed, division producing decimals) in fee calculation or milestone amount code? All math must be integer cents + integer bps.
- Any
JSON.stringify in terms snapshot hash paths? Must use canonicalJson().
- Any Stripe invoice creation without checking
paid_via !== 'carried_credit'?
- Is the state machine pure? Any DB reads/writes, Stripe calls, or email sends inside
agreement-state-machine.ts or milestone-state-machine.ts? Those belong in route handlers.
- Is acceptance + Stripe atomic? Any code path where agreement status changes but Stripe invoice creation is in a separate try/catch that doesn't rollback?
- Any
declared_project_value being set while status is active_assessment? Violates invariant.
- Are all UI strings in both
en/*.json and he/*.json? Any left-*/right-* CSS instead of start-*/end-*?
- Are overdue reminder timestamps checked before sending (dedupe)?
C. Architectural coherence
- Are new modules consistent with existing ones (same file layout, same return shape, same error handling)?
- Are tests in the right location (unit vs integration vs golden vs property)?
- Are there patterns emerging that should be extracted into a helper? Only if used 3+ times โ never prematurely.
- Conversely, are there helpers created for single-use cases? Inline them.
- Is error handling consistent? For BB-3: quarantine for data issues,
BB3InputError for shape, BB3InternalError for bugs. For Segmenter: DanglingEdgeError, UnclassifiedEdgeTypeError, SegmenterInvariantError for invariant violations.
D. Test quality
- Do tests assert behavior, not implementation details?
- Is there a property test where the spec suggested one (identity hash, canonicalJson, field-ref normalization)?
- Do any tests rely on wall-clock, random numbers, or
Map/Set insertion order? Those will flake.
- Is there a golden regression if the commit touches user-facing IR shapes?
- Are error paths covered, not just happy paths?
E. Spec drift
- Did the implementation have to deviate from the spec? If yes, is the deviation in the commit message AND noted for a spec amendment?
- Did implementation reveal an under-specified area? File it as a follow-up.
F. Code smells (standard)
- Dead code, unused exports
- Types that should be discriminated unions but are broad records
any / unknown where a concrete type was possible
- Magic numbers that should be constants
- Comments explaining what the code does (delete; code should be self-explanatory) โ keep comments only for why
- Premature abstraction (one-use helpers, config knobs for no caller)
Step 3 โ Produce the review
Output format:
## Wave review โ commits <SHA1>..<SHA2>
### [OK] Looks good
- <bullet>
- <bullet>
### [FIX] Must-fix issues
1. <file:line> โ <description>
Fix: <specific action>
2. ...
### [SUGGEST] Optional improvements
1. <description>
### [FOLLOWUP] Items to file
- <description>
### Verdict: GREEN | YELLOW | RED
- GREEN = nothing to fix, proceed
- YELLOW = suggestions only, user can defer
- RED = must-fix issues exist; do NOT proceed to next task until addressed
Step 4 โ Act on findings
- RED: stop and fix the issues, then re-run
/wave-review once to confirm GREEN or YELLOW.
- YELLOW: print suggestions, ask the user whether to address now or defer.
- GREEN: ask whether to proceed with
/task-next OR run /sync-branches if 5+ commits since last sync.
Rules
- Be specific. "Consider refactoring X" is useless. "file.ts:42 repeats a switch that should be a lookup table; here's the change" is useful.
- Do NOT nitpick style issues that prettier handles.
- Do NOT invent problems. If the code is fine, say so.
- Favor "delete this" over "add this". Subtraction compounds.
- If the review finds a pattern across multiple commits (not just one), call it out as a systemic issue, not N separate issues.