| name | reassess-spec |
| description | Reassess a spec against the codebase and FOUNDATIONS.md. Validates assumptions, identifies issues/improvements/additions, asks clarifying questions, then writes the updated spec. Use when preparing a spec for ticket decomposition. |
| user-invocable | true |
| arguments | [{"name":"spec_path","description":"Path to the spec file (e.g., specs/106-zone-token-observer-integration.md)","required":true}] |
Reassess Spec
Validate a spec's proposed implementation against the actual codebase and FOUNDATIONS.md. Identify issues, improvements, and beneficial additions. Deliver an updated spec ready for ticket decomposition.
Invocation
/reassess-spec <spec-path>
Arguments (required, positional):
<spec-path> — path to the spec file (e.g., specs/106-zone-token-observer-integration.md)
If the argument is missing, ask the user to provide it before proceeding.
Worktree Awareness
If working inside a worktree (e.g., .claude/worktrees/<name>/), all file paths in this skill — reads, writes, globs, greps — must be prefixed with the worktree root. The default working directory is the main repo root; paths without an explicit worktree prefix will silently operate on main, not the worktree. This applies to every path reference below, including paths passed to Explore agents.
Process
Follow these steps in order. Do not skip any step.
Step 1: Mandatory Reads
Read ALL of these files before any analysis:
- The spec file (from the argument) — read the entire file. For XL specs exceeding Read's token limit (~25000 tokens, commonly >~900 lines), consume the full content via paginated reads (
offset/limit) — do not substitute a summary or skip sections. Use wc -l first to size the file and plan the chunk boundaries.
docs/FOUNDATIONS.md — architectural commandments; every spec must align with these principles
- Existing follow-on tickets: glob
tickets/<spec-id>* and archive/tickets/<spec-id>*. If matches exist, read each. Tickets already decomposing this spec encode naming conventions, scope assumptions, and Status fields that the reassessment should harmonize with — drift between spec and ticket is itself a finding (see Step 2.12). When the user references a specific ticket inline with the invocation (e.g., "a ticket from that spec already exists at tickets/X.md"), reading it is mandatory regardless of glob results.
These three reads are independent and should be batched in parallel — the spec id is already known from the argument, so no read's content informs another's path or scope.
Parse the spec's metadata: Status, Priority, Dependencies, Goals, Non-Goals, FOUNDATIONS.md Alignment table (if present), and all implementation sections. Apply these rules:
- Standard metadata fields absent (Status, Priority, Complexity, Dependencies): record as an Improvement finding — downstream skills like spec-to-tickets depend on these fields.
- No ticket-namespace hint: record as a standard Improvement; propose a namespace derived from the spec id + a short slug (e.g., spec 143 bounded-memory →
143BOUNDMEM). The proposed namespace must be passed verbatim as the NAMESPACE arg to /spec-to-tickets — do NOT include the -* glob suffix in the suggestion (the wildcard is for matching ticket files on disk, not for the argument).
- No Follow-On Tickets section: record as an Improvement; recommend ADDING the section with the namespace plus an anticipated decomposition outline (informational, finalized by
/spec-to-tickets).
Step 2: Extract and Validate References
This step combines reference extraction and codebase validation into a single pass.
From the spec, extract every concrete codebase reference:
- File paths mentioned or implied (e.g.,
src/cnl/compile-observers.ts, data/games/fitl/)
- Type names (e.g.,
GameDef, CompiledObserverProfile, ZoneId)
- Function/method names (e.g.,
derivePlayerObservation, lowerObservers)
- Module names (e.g.,
kernel, cnl, agents)
- Config keys or YAML fields (e.g.,
zones, surfaces, dataAssets)
- Test file paths or test names referenced
- Other specs or tickets listed in Dependencies
- Discriminant values and union variants (e.g., AST tag numbers, enum values, union member names) — verify that claimed variants exist in the actual union type definition
- DSL-level identifiers (e.g., candidate feature names, action IDs, profile names) — these may exist only in YAML game data files (
data/), compiled JSON fixtures (test/fixtures/), or DSL documentation (docs/), not in TypeScript source. Search data and fixture files in addition to source code
For each reference, validate against the actual codebase:
- File paths: Glob/Grep to confirm they exist at the stated location. If a file was moved, renamed, or deleted, record the discrepancy and the actual location (if found). When a referenced file doesn't exist and multiple substitutes are plausible (e.g., a campaign dir contains both
program.md and results.tsv), classify the spec's actual claim about the file — data vs. narrative vs. config — and cite the substitute matching that intent. If the claim spans multiple intents (e.g., the spec cites the file for both narrative motivation AND raw experiment data), cite both substitute files explicitly rather than picking one.
- Types and interfaces: Grep for each type name. Confirm it exists, check its current shape (fields, members). If the spec assumes a field that does not exist or has a different name/type, record the discrepancy. If the spec references a type that doesn't exist but a structurally compatible type does (e.g., a base interface or union variant with the same fields), recommend adopting the existing type name in the spec. Do not propose new type aliases solely to match the spec's naming. Type-shape vs. type-existence: when the spec claims a TypeScript exhaustiveness, narrowing, or compile-time gating mechanism (e.g., "TS will catch a missing case", "discriminated union exhaustiveness", "
satisfies clause enforces X"), verify that the actual type shape supports the claim — a discriminated union requires a literal-union or string-literal discriminant; a satisfies clause requires the asserted-against type to be narrower than the value's inferred type. A spec may correctly name a type while incorrectly assuming its shape (e.g., kind: string cited as if it were a discriminated kind: 'a' | 'b' | 'c'). Instruct Explore agents to flag open-string discriminant fields when the spec's surrounding prose assumes narrowing — agents reporting "the type exists with these 18 emitter-produced kinds" without noting the field's structural openness will let a broken exhaustiveness claim survive into the rewrite.
- Functions and methods: Grep for each function. Confirm signature, module location, and export status. Note any signature differences from what the spec assumes. After confirming the canonical name, grep the spec itself for that name AND for variant spellings (truncations, missing prefix words, common typos) — internal name inconsistencies in spec prose (e.g.,
applyCardBoundary appearing in one paragraph alongside the canonical applyTurnFlowCardBoundary used everywhere else) are a frequent low-cost finding the Explore agent will not surface, since the agent prompt's reference list typically carries only the canonical form.
- Dependencies (specs/tickets): If Dependencies is
None and no Related specs are listed, skip this sub-step. Otherwise, for each dependency, verify whether it lives in specs/, archive/specs/, tickets/, or archive/tickets/. Record the correct path. If archive/specs/ contains multiple files sharing the numeric id (reuse from renumbering or multi-spec arcs — e.g., both 16-fitl-map-scenario-and-state-model.md and 16-template-completion-contract.md coexisting), disambiguate by slug in the updated spec (e.g., Spec 17 [pending-move-admissibility] §3) — bare numeric references read ambiguously. If a dependency is listed as incomplete but has since been implemented, note this. For dependencies claimed as completed or in-progress, read the dependency spec's metadata (Status field) to verify the claimed status matches — a glob confirms the file exists, but only reading the Status field catches stale completion claims. Dangling-dependency check: for each listed dependency, grep the spec body for references (by spec number or name). If the dependency appears only in the metadata line with no body mention, flag as a probable spurious dependency — read the dependency's Overview/Goals to confirm whether a real relationship exists. A dependency with correct metadata but no narrative link is a dangling claim that will mislead downstream ticket decomposition. For archived dependencies, also check for associated active ticket series (tickets/<NAMESPACE>-*) and note them — they indicate whether the dependency's work is in progress or completed. For specs that list the assessed spec as a dependency, verify their assumptions about the assessed spec's deliverables are still valid. If the assessed spec's scope changes, flag impacted downstream specs. Related specs: For specs listed as "Related" (not dependencies), verify that the assessed spec's proposed changes don't invalidate assumptions in the related spec. Flag if a scope change would require updating the related spec. Commit-hash citations: If a dependency cites a commit hash (e.g., "completion landed in commit abc1234"), verify via git log --oneline <hash> -1 and git show <hash> --stat | head -30 — confirm the commit exists, landed on the claimed date, and touched the claimed subsystem. Wrong or stale hashes silently propagate into the reassessment otherwise. Newly-proposed dependency citations: When the reassessment proposes adding new dependency citations to the spec's metadata (e.g., a predecessor archived spec surfaced during validation, a sibling ticket discovered via sub-step 7), glob-verify those paths during the audit phase before they propagate into the diff summary. Step 7's path verification will catch wrong paths, but a Step 2 audit-phase check costs one extra glob and avoids a corrective Edit round.
- YAML/config fields: Grep for field names in schema files, type definitions, and example YAML files. Confirm the spec's assumptions about available fields. For specs that reference DSL-level evaluation refs (e.g.,
candidate.param.*, option.value, decision.name), trace the runtime provider chain to verify which refs are available in the claimed evaluation scope — the ref namespace differs between move-scoped evaluation (candidate model) and completion-scoped evaluation (completion provider). A spec may use a valid ref name in the wrong scope. Test-command syntax: If the spec proposes new test commands (in Acceptance Criteria, Test Plan, or Commands sections), verify their syntax against the actual scripts block in the relevant packages/<package>/package.json. Engine tests run with node --test, which does not accept Jest-style --testPathPattern or positional filter arguments after --; CLAUDE.md documents this convention. A spec proposing pnpm -F <pkg> test:unit -- some/path will silently no-op the filter — propose pnpm -F <pkg> build && node --test dist/test/<path>.test.js (or the whole-suite invocation) instead.
- Downstream consumers (blast radius): Contract/design specs (propose a contract, constraint, or audit scope rather than specific code modifications) are an exception — blast radius means the owner modules for each proposed audit area (~2-5 files per bullet), not all consumers; skip exhaustive import-site grep and move on to identifying the owner modules. For modification/refactor specs: for types or interfaces the spec proposes to modify, grep for all import sites and usage points in both source and test files. For re-exported symbols (e.g., barrel files or
index.ts re-exports), trace through the re-export chain to find actual consumers, not just direct importers — a symbol with 0 direct imports may still have many indirect consumers. For functions called from multiple code paths (e.g., a shared evaluation core called by both Phase 1 and Phase 2), verify that the spec acknowledges all callers and the impact on each — shared code path modifications are a common source of unacknowledged blast radius. For non-exported types or functions (module-internal), note "internal — zero external blast radius" and skip consumer grep. Focus blast radius effort on exported symbols. Record the blast radius separately for source files (which need code changes) and test files (which need fixture/construction updates). Test migration scope is frequently underestimated in specs — flag it explicitly. Conversely, migration scope can be overestimated for data/profile YAML files — when a spec lists migration of data/games/** or similar data-asset directories, grep for the migrated field name across the listed paths. If zero matches, flag the migration scope as empty: the spec should either drop the deliverable or explicitly note "no migration required because no profile declares the field; runtime IR default applies." Source-code blast-radius parity is not a default for data files. For functions that currently throw errors, also grep for the error code/type across the codebase — errors propagate through call chains, so downstream catch sites may not directly import the modified function. Skip error-code grep when the spec only adds new return paths (e.g., new switch cases) without modifying or adding error-throwing behavior. For standalone scripts or CLI entry points (e.g., campaign run-tournament.mjs files), identify consumers by tracing the data flow (what parses their output, what invokes them) rather than grepping for import sites. For discriminated unions whose variants the spec proposes to unify into a single shape, also grep for consumer sites that branch on discriminant properties or use in checks for variant-specific fields — these consumers may need updating when absent properties become undefined. Nested primitive composition vs. duplicated logic: When a spec proposes a "single source" invariant stated as a grep count (e.g., "grep returns exactly one file after migration"), distinguish nested primitive composition (internal callees of the new predicate's helpers — implementation detail) from external duplicated logic (two or more sites independently deciding the same question — the duplication the spec actually targets). The former is exempt; the latter is in scope. Classify each hit explicitly before recommending scope changes, and document exempt sites with their purpose (e.g., "enumeration surfacing, not legality"). Invariants stated as literal grep counts frequently understate the non-legality uses that must be acknowledged. Negative behavioral claims: When the agent reports a negative behavioral claim about a specific consumer that propagates into the updated spec (e.g., "caller X reads only fields Y, not Z" supporting an "unaffected" classification), spot-check by reading the consumer's call-site directly before propagating. Behavioral negatives carry the same single-agent-false-positive risk as the existence negatives the agent-conflict guidance below covers — the agent saw the call site in passing during a blast-radius sweep, but the claim becomes load-bearing once it enters the spec as an "unaffected" classification.
- Existing implementations: Short-circuit (check first): For each major proposed artifact, search the codebase by exact name. If the initial exact-name search returns zero matches, note "no existing implementation found" and skip the broader pattern search below — the proposal is genuinely novel. Refactoring/consolidation specs (splitting, merging, or reorganizing existing code): the relevant check is whether the refactoring has already been performed — e.g., the source file still exists in its pre-split form, the target module doesn't already exist, or the shared function hasn't already been extracted. Skip the broader pattern search below — it targets novel artifacts, not reorganization. Modification/optimization specs (changing an existing function's implementation, not adding or splitting): for each deliverable that proposes a code change to an existing function, read the function's current body and compare it to the spec's "Before" and "After" examples. If the current code already matches the "After" state, classify the deliverable as an Issue (already implemented) — this is distinct from Obsolescence, since one deliverable being done does not invalidate the spec's premise. This check is not covered by the short-circuit above because the function exists (it's not a new artifact). New artifact specs: For each major proposed artifact (new types, new files, new patterns), search the codebase for existing implementations with similar names or functionality. Check whether the proposal duplicates existing infrastructure. This catches specs whose premise has been overtaken by prior work. Search using both exact names from the spec AND broader patterns (e.g., if the spec proposes
CompiledTokenFilter, also search for tryCompile*, *compiler.ts, Compiled* to find related infrastructure that the spec may not reference). Infrastructure-proposal specs (adding a new subsystem or declaration mechanism): also search for existing pipelines that auto-generate or synthesize the proposed artifacts — check compiler synthesis functions (e.g., synthesize*, auto*), auto-generation patterns, and compiled output fixtures (e.g., *-game-def.json, dist/ files) for evidence that the proposed artifacts are already produced by a different mechanism. This catches specs that propose building plumbing which already exists under a different name or via auto-synthesis. Predecessor optimization specs: For specs proposing runtime/dispatch/eval infrastructure changes (bytecode VMs, compiled artifacts, dispatch optimizers, hash caches, AOT compilation), glob archive/specs/* for prior specs targeting the same subsystem (search by subsystem keyword, e.g., *policy*, *eval*, *compile*, *hash*, *dispatch*). If a predecessor exists and is not acknowledged in the assessed spec's Dependencies/Source, flag as Improvement (acknowledge predecessor) or Issue (reframe stale premise) — predecessor specs commonly attack the same hot path and reshape the framing of new optimizations. Contract/design specs (propose a contract, constraint, or audit process rather than code artifacts): skip this sub-step — there are no artifacts to deduplicate. Existence checks for the structures the audit will target are handled by sub-step 6's owner-module identification. Test files: Check whether tests referenced in the spec already exist. If tests exist that the spec doesn't acknowledge, classify as an Improvement — the spec should document existing test coverage to avoid duplicate work during ticket decomposition.
- Quantitative claims: If the spec states numeric metrics (file counts, function counts, workaround counts), verify them against codebase grep/glob results. Correct inaccurate numbers in the updated spec. Line counts are especially fragile — they change with every commit. Always verify line count claims explicitly (e.g., via file reads or
wc -l), and instruct Explore agents to do the same. Theoretical projections (performance estimates, timing budgets, overhead percentages) cannot be verified via grep — note them as "theoretical, not codebase-verifiable" and move on. External measurement data (V8 profiling percentages, benchmark results, instrumentation output) is empirical but not codebase-derivable — validate the codebase patterns they reference (e.g., confirm the function exists and uses the claimed operation) but accept the measurement values as external evidence. When verifying construction site counts, instruct Explore agents to count sites producing a specific type (e.g., object literals assigned to or returned as PolicyEvaluationCoreResult), not all conditional spread occurrences in a file — the latter inflates counts by including unrelated patterns. When a numeric claim is technically defensible under one counting methodology but misleading under the methodology relevant to implementation (e.g., raw grep count vs. actionable count after excluding infrastructure, construction sites, and already-covered paths), recommend the spec document its counting methodology and distinguish the raw count from the actionable subset. Empirical metrics may live in campaign tooling: empirical metrics cited in specs (decision counts, score distributions, win rates, tied-decision proportions) may be computed by campaign tooling (campaigns/<campaign>/*.mjs) rather than emitted by engine source. If grep on engine source returns nothing for a metric the spec cites as evidence, extend the search to campaign-tooling directories before classifying the field as nonexistent.
- Causal/behavioral claims: If the spec describes a root cause, failure mechanism, or execution flow, trace the actual code path to confirm. The most valuable discrepancies are often not wrong names but wrong explanations — a spec may correctly name a function but misunderstand its behavior or miss a branching path (e.g., an optimization that short-circuits the described flow, or a pipeline vs non-pipeline routing distinction the spec doesn't account for). Instruct Explore agents to trace call chains for any claimed failure mechanism.
- Campaign-scoped files: For specs modifying campaign-level files that exist in multiple campaign directories (e.g.,
run-tournament.mjs, harness.sh, program.md), verify whether the spec explicitly scopes which copies are affected. If not, flag as an Issue — the spec must state its campaign scope to avoid ambiguity during ticket decomposition.
- Parallel-draft collision check: For specs that modify, delete, or re-home specific files, grep
specs/ (the active draft directory, distinct from archive/specs/) for other DRAFT specs that reference those same files. If a collision exists, flag as an Issue — both drafts cannot land as written without reconciliation. Read the colliding spec's Status and scope briefly to determine whether (a) merge/absorb is viable, (b) the assessed spec should defer, or (c) scope boundaries should be clarified. This catches sideways collisions that normal blast-radius checks (import graphs, consumer sites) miss — standalone artifacts (e.g., test files, isolated fixtures) have zero import footprint but may still be the declared target of another active draft. Distinct from sub-step 4's "Related specs" handling, which covers collisions already declared via Related/Dependencies metadata; this check discovers undeclared ones.
- Follow-on ticket consistency: For each existing follow-on ticket discovered in Step 1, compare its file paths, function names, type names, and proposed scope against the spec's. Drift between spec and ticket is an Issue — the reassessment should align them. Tickets are typically authored after the spec and against fresher codebase reads, so when they diverge prefer the ticket's naming (e.g., the ticket uses
lowerAgents matching the codebase while the spec uses compileAgents — the spec is wrong, not the ticket). When the spec is ahead of the ticket (a fresh insight or scope change emerging from this reassessment that the ticket hasn't yet absorbed), flag the ticket as a candidate for re-scoping in the final summary so the user can decide whether to retire/rewrite it before running /spec-to-tickets. This check is bidirectional: spec→ticket drift surfaces here in Step 2; ticket→spec drift (e.g., ticket scope decisions that should propagate into the spec's body) also surfaces here. Distinct from sub-step 4 ("Dependencies — specs/tickets") which covers tickets the spec depends on; this check covers tickets that decompose the spec.
Use Explore agents to perform extraction and validation. Agent count decision table:
| Condition | Agents | Split axis |
|---|
Behavioral claim requires running live code (diagnostic scripts, built artifacts, ad-hoc .mjs investigators) | 0 | Direct investigation via Bash/Read/Grep |
| <5 references | 1 | Single comprehensive agent |
| ≥5 references, simple or no behavioral claims | 1 | Comprehensive (structural + behavioral) |
| ≥5 references, complex behavioral claims | 2 | Agent 1: structural (files, types, signatures, blast radius) / Agent 2: behavioral (claim tracing, code path validation per Step 2.9) |
| Spec touches multiple layers (engine + campaign + skill) | 2 | Split by layer, not by validation type |
A behavioral claim is "complex" if tracing it requires reading 3+ functions across 2+ files (e.g., control flow through a pipeline, failure mechanisms spanning multiple modules, multi-layer dispatch). Single-file claims are structurally simple regardless of function count — even 4+ functions within one file don't warrant a second agent.
Direct investigation (0 agents): When validating a behavioral claim requires running a diagnostic script against built artifacts (e.g., probing a classifier's actual return value on a specific state, confirming a failure reproduces under specific seeds, verifying runtime type shapes against dist/ output), drive the investigation directly via Bash/Read/Grep and author an ad-hoc .mjs script. Agent delegation is ill-suited to live-code traces because agents cannot fluidly iterate against compiled output. Static reference verification (file paths, type existence, function signatures, blast radius) stays on the default agent path — only the live-code behavioral claim is direct. See also the "In-session diagnostic investigations" block under Step 5's deferred-decision handling for the script pattern and check-in decision.
Structural vs. behavioral complexity: "Complex" targets execution tracing depth, not artifact breadth. Reading many files to inspect static content (assertion shapes in test files, profile/enum declarations in data files, Status fields in spec metadata, marker comments, config keys) is structural inspection and counts as simple regardless of file count. A reassessment that touches 10+ files but only reads them for static content (does the type exist? what does this assertion check? is this spec archived?) fits the "simple or no behavioral claims" row. Upgrade to 2 agents only when the claims require following runtime control flow (e.g., "the error propagates through chain X", "pipeline branch Y short-circuits before reaching Z", "this function is called from both Phase 1 and Phase 2 dispatch paths").
Contract/design specs: Apply the behavioral-complexity criterion one tier lower — for specs that propose a contract, constraint, or audit scope rather than specific code modifications, a claim that would be "complex" for a modification spec is typically "simple or no behavioral claims" for a contract/design spec. The agent reports owner modules rather than tracing precise control flow, so 1 agent suffices even when the claim nominally crosses 3+ functions across 2+ files. Blast radius coverage remains mandatory regardless; do not scale below one agent.
Fresh-spec context scaling: When the spec's subject matter was substantially traced earlier in the same session — via spec authorship, in-session debugging of the same code paths, dependency completion, or other reconnaissance-equivalent work (e.g., Claude debugged a bug then drafted the spec, or completed a dependency spec moments before reassessing this one) — scale down by one agent. Prior conversation context substitutes for one agent's reconnaissance pass. Blast radius coverage remains mandatory regardless; do not scale below one agent.
Provide each agent with either the full spec content or a comprehensive structured extraction of all references in its scope. If summarizing, ensure you capture every reference from every section — including those embedded in prose, code blocks, tables, and footnotes. The goal is completeness, not format. This is read-only — agent-based exploration is safe and significantly faster.
Blast radius is mandatory: The agent prompt MUST explicitly request blast radius analysis — grep for all import sites and consumer files of any type or interface the spec proposes to modify. Instruct Explore agents to separate source-file consumers from test-file consumers in their blast radius analysis — test migration scope is frequently underestimated. This is the highest-value output from the Explore agent and must not be omitted.
Path verification is mandatory: Instruct Explore agents to glob-verify every file path they report — especially test file paths, which commonly live under test/unit/<subsystem>/, test/integration/, or sibling subdirectories that look interchangeable but are not. Agents must either cite a path they have confirmed via glob OR explicitly label it as unverified (e.g., "glob returned no match — likely at X based on naming convention"). Agent-reported paths propagate directly into the draft spec; a wrong path survives until Step 7 and forces a post-write correction. Catching it at the agent boundary is cheaper. Test-placement convention recommendation: When a proposed test-file's parent directory doesn't exist, instruct agents to identify the established convention for that test category by globbing existing tests of the same kind (e.g., test/**/*replay-identity*.test.ts, test/**/*-golden.test.ts, test/**/*canary*.test.ts) and recommend the conventional placement in the findings — not just 'directory missing'. The reassessment should propose the correct path so the spec rewrite lands on it directly.
Convention verification is mandatory: For convention/protocol claims (commit-body conventions, test-class markers, branding patterns, naming protocols), instruct agents to grep .claude/rules/, CLAUDE.md, AGENTS.md, and docs/ for canonical documentation in addition to source code and git log. Conventions may be documented in markdown rule files but never yet exercised in git history — a git log --grep returning empty does not mean the convention is unconfirmed; it may simply be the first proposed use.
When multiple Explore agents return conflicting findings on the same factual claim (e.g., one agent says a parameter exists, the other says it doesn't), treat the conflict as unresolved. Verify the claim directly by reading the authoritative source file before classifying the finding. Do not prefer one agent's result over another without independent verification — agent conflicts are the highest-signal indicator that a claim needs manual tracing.
For quantitative claims (file counts, call site counts), spot-check at least one agent-reported number against an independent grep before using it in findings or in numbers that propagate into the updated spec — agent grep strategies may over- or under-count (e.g., counting all files importing a barrel re-export rather than files specifically using the target symbol). Numbers that flow into D12 wave plans, D8 retirement notes, or ticket decomposition hints carry the same propagation risk as numbers in the findings report.
For enumeration-completeness claims from a single agent (e.g., "all import sites of X are Y, Z"), run an independent grep -rn '<symbol>' <package-or-repo-root>/src/ after the agent returns to verify the consumer list is complete before propagating into the spec. A single-agent enumeration carries the same false-completeness risk class as single-agent negatives — the agent may have stopped at the expected sites without exhaustively traversing the codebase. Mandatory for any symbol the spec flags for deletion or rename, in both 1-agent and 2-agent dispatch modes.
For negative existence claims from a single agent (file missing, export not found, function absent), always spot-check via direct Glob or Grep before classifying as a finding. Single-agent negatives are a common false-positive source because the agent's search strategy may have missed the artifact even when present — e.g., the file lives in a different module than the agent expected, or the symbol is re-exported through a barrel the agent didn't traverse. Multi-agent conflict resolution (above) doesn't cover this case because there's no second agent to disagree; the spot-check is the equivalent guardrail for single-agent mode.
After the Explore agent returns, always read the cited code region (roughly ±30 lines around any line number the spec references) directly. Agent summaries capture what was asked; cited-line regions reveal adjacent logic the spec may silently modify, replace, or drop. This is the highest-value manual verification step — blast-radius grep finds call sites, but only reading the cited region catches cases where the spec's proposed replacement prologue inadvertently orphans downstream logic that was executing under the same conditional. If the spec references multiple cited lines (e.g., legal-moves.ts:710, policy-agent.ts:134, random-agent.ts:30), read each region. The agent handles breadth (types, signatures, blast radius); you handle depth (code flow, pseudocode accuracy, insertion points, adjacent-logic preservation).
After Explore agents return, verify spec/ticket dependencies separately if not covered by agent prompts — a quick glob for each dependency path (specs/<id>*, archive/specs/<id>*) is sufficient. This check is easy to overlook when drafting agent prompts focused on types and blast radius.
Do not present findings yet. Collect everything for Step 3.
Step 3: FOUNDATIONS.md Alignment Check
Review each section of the spec against docs/FOUNDATIONS.md:
- If the spec has a FOUNDATIONS.md Alignment table, verify each claimed alignment is accurate. Flag any principle the spec claims to satisfy but actually violates. Beyond the alignment table itself, also scan the spec body for explicit Foundation invocations by number (e.g., "per F#14", "satisfies F#19") used to justify a design choice — re-read the cited Foundation's text and verify its plain reading supports the spec's claim. Specs sometimes invoke a Foundation as authority while contradicting its substance; this failure mode is not caught by the "did the spec address F#N" checklist below because the spec did address it, just incorrectly.
- Identify any Foundation principle the spec does not address but should, given its scope. Pay particular attention to:
- Foundation 1 (Engine Agnosticism) — does the spec introduce game-specific logic in engine code?
- Foundation 2 (Evolution-First) — does the spec put rule-authoritative data outside YAML?
- Foundation 8 (Determinism) — does the spec introduce non-deterministic behavior?
- Foundation 11 (Immutability) — does the spec mutate state?
- Foundation 14 (No Backwards Compatibility) — does the spec leave compatibility shims or defer migration?
- Foundation 15 (Architectural Completeness) — does the spec patch a symptom instead of fixing root cause?
- Record each alignment issue with the specific Foundation number and what conflicts.
Step 4: Classify Findings
Organize all findings from Steps 2 and 3 into the following categories:
- Issues: Something in the spec is factually wrong, stale, or violates FOUNDATIONS.md. The spec cannot go to tickets without fixing this.
- Pre-change state described in present tense: A recurring variant worth calling out explicitly. Specs authored as post-incident or post-ticket analyses often describe the codebase state as it was at the time of the triggering event using present tense (e.g.,
X.test.ts currently asserts stopReason === 'terminal'), even after the triggering change has since landed and the described state is no longer current. Classify these as Issues — the reader will parse the claim as authoritative-now and may inherit an out-of-date mental model into ticket decomposition. The fix is typically tense correction plus an explicit "resolved by commit/spec X; this spec formalizes the lesson going forward" framing, not removal of the historical context — the motivation remains load-bearing, only the tense is wrong.
- Obsolescence: The spec's core proposal is already implemented, superseded, or invalidated by codebase evolution. The entire premise needs rethinking, not just refinement.
- Improvements: The spec is not wrong, but a refinement would make the implementation cleaner, safer, or more aligned with existing patterns.
- Additions: A feature or deliverable not in the spec that would be beneficial and aligns with the spec's stated goals. Apply YAGNI ruthlessly — only propose additions that are natural extensions of the spec's scope, not tangential features.
For each finding, record:
- What the spec says (or omits)
- What the codebase actually has (with file paths and line references)
- The recommended change to the spec
Cascading corrections: When a finding has ripple effects on other spec sections (e.g., adding a writer module means updating the writer count in the overview, adding acceptance criteria entries, updating the In Scope description), identify all affected sections during classification. When removing an entire section, trace all references to it: overview counts ("three layers" → "two"), problem statement references, acceptance criteria items, and dependency lists. Pay special attention to the Problem Statement — it often contains summary numbers (line counts, family counts, file counts) that must match the Evidence section after corrections. For specs with embedded code blocks, pseudocode, or ASCII diagrams that reference codebase names, trace name corrections through all such blocks — they are the most common source of stale references surviving a partial update. Path/line citations and named function references in prose commonly appear redundantly across multiple sections — Source/Evidence, Problem Statement, Brainstorm Context, User Constraints, motivation/why-this-matters prose, Risks, Out-of-Scope, and ASCII diagrams or table cells (especially FOUNDATIONS alignment rows). A kernel/zobrist.ts:158 cited as evidence for the failure mode is often re-cited in the Problem Statement when explaining the mechanism, and again in the brainstorm narrative when describing the gap. Treat the section list as illustrative, not exhaustive — the pre-draft sentinel grep in Step 6.1 enumerates the actual occurrences regardless of section. When a finding corrects a citation token, type/function name, or substituted concept (e.g., a narrative reframing like "AST evaluator" → "closure-tree evaluator", or a deprecated identifier replaced by its successor), grep the entire spec for the old token/term before drafting and queue ALL occurrences as edits — including occurrences inside ASCII diagrams, pipeline boxes, table cells (especially FOUNDATIONS alignment rows), and edge-case bullets, not just prose. Don't rely on Step 7 verification to catch the secondary site (Step 7 will catch it but only after the user sees the corrective round-trip). The diff summary in Step 6 must cover all sections that will change, including cascading corrections — not just the primary finding location. For reassessments with >15 findings or complex ripple effects, tag each finding in the Step 5 report with its cascading scope: append "(cascading: also affects <section names>)" to give the user early visibility. For smaller reassessments, the diff summary in Step 6 provides sufficient visibility — cascading tags in Step 5 are optional.
Handling Obsolescence: When a spec's core proposal is already implemented or superseded, present two options: (a) archive the spec as completed/superseded, (b) rewrite with a narrower scope targeting the remaining gap. Wait for user direction before proceeding to Step 6. The Obsolescence fork is a scope decision (archive vs. scoped rewrite) and is always blocking, even in auto/non-plan mode — the "inline recommendation for design preferences with clear defaults" clause (Step 5) does not apply here, because this is a scope decision, not a design-detail preference. Present both options with a recommendation, then explicitly wait for user direction. If the user chooses rewrite, the Step 6 output may be a substantially new spec rather than a refinement of the existing one. Partial obsolescence: If the spec's infrastructure is already implemented but its configuration/usage goal remains unachieved (e.g., the engine supports metrics but the game data doesn't declare them), classify the infrastructure mismatch as Issues (wrong approach) rather than Obsolescence (invalid premise). Reserve Obsolescence for cases where the entire stated goal — not just the implementation approach — is already satisfied.
Scope-collapsing reframing: When findings collectively reveal that the spec's proposed implementation approach is substantially wrong (e.g., engine changes proposed when only data/configuration changes are needed), recommend reframing the spec's Proposed Changes section entirely rather than patching individual claims. Flag the Complexity metadata field for re-evaluation — approach simplification often reduces complexity. Note the reframing in the diff summary so the user sees the full scope of the change.
Scope-expanding follow-up: If the reassessment surfaces a related but independent concern that would expand this spec's scope (e.g., a downstream symptom visible in the same reproduction but driven by a different root cause), do not add it here. Record it as an Improvement that recommends a follow-up spec, preserving the insight without violating YAGNI. The current spec stays focused; the new concern becomes its own reassessable unit.
Step 5: Present Findings
Present all findings to the user in a structured report:
## Reassessment: <spec-name>
### Codebase Status
[Include when the spec extends, modifies, or depends on existing infrastructure. Omit only when the spec proposes entirely novel artifacts with no existing counterpart.]
<Brief summary (3-7 typical, up to 9 acceptable when each anchor genuinely contextualizes ≥2 findings) of what already exists in the codebase that is relevant to the spec's proposal. Helps contextualize all subsequent findings. Focus on implementation status (not yet started / partially implemented / infrastructure exists but not wired up) and any surprising pre-existing artifacts (e.g., test files that already exist, partial prototypes). Include only facts that anchor multiple findings or contextualize the gap-vs-status framing — facts unique to a single finding belong in that finding, not here.>
### Issues (must fix)
[If none: "No issues found."]
1. **<title>** — <what the spec says> vs. <what the codebase has>. Recommendation: <change>.
### Obsolescence (premise invalidated)
[If none: omit this section entirely.]
1. **<title>** — <what the spec proposes> is already implemented / superseded by <what exists>. Options: (a) archive the spec, (b) rewrite with a new scope targeting <remaining gap>.
### Improvements (should fix)
[If none: "No improvements found."]
1. **<title>** — <current spec text> could be improved because <reason>. Recommendation: <change>.
### Additions (consider adding)
[If none: "No additions proposed."]
1. **<title>** — <what's missing> would be beneficial because <reason>. Recommendation: <new section or deliverable>.
### FOUNDATIONS.md Alignment
- <Foundation N>: <aligned | issue description>
### Questions
[If none: "No questions."]
1. <question>
Question discipline: Ask at most 3 questions in this initial report. If you have more than 3, prioritize the ones that block further reassessment and defer the rest to a follow-up round after the user responds. Zero questions is correct when all findings are factual (wrong path, wrong name, wrong count, missing acknowledgment of existing coverage) and no design preference is in play — close the report with "Waiting for approval" rather than a synthetic confirmation question. Default-to-one adds ceremonial friction, especially under auto mode. Design preferences with clear defaults in non-plan/auto-mode flows: a design preference with a clearly-superior default may be presented inline as a recommendation ("Recommendation: X") rather than a blocking question — the user's next message can approve, override, or ask for more detail. Reserve AskUserQuestion for cases where the default is genuinely uncertain or multiple options each have real tradeoffs. The 'zero questions' license therefore applies whenever findings are factual OR design-preferences-with-clear-defaults; synthetic confirmation questions add ceremonial friction in either case. Single-judgment-call exception: When the only open item is a single design preference whose alternatives have real consequences (not stylistic — e.g., choosing a lowering source between two viable IRs, picking a refactoring boundary between two valid extraction points, or selecting between two implementations whose tradeoffs the user might want to consider), asking one explicit question is acceptable even with a clear default. Explicit confirmation prevents misalignment when the recommendation is non-obvious to the user; the user's delegation ("proceed with your recommended choice") or override resolves it cleanly and triggers Exception A1 in Step 6. The "no synthetic questions" rule still applies — don't manufacture confirmation questions for findings the user has already approved or for trivially-defaulted preferences. (Plan mode retains its stricter AskUserQuestion-for-design-preferences rule under scenario 2/3 below — the non-blocking inline recommendation pattern is specific to non-plan/auto-mode flows where the user's next message is a natural approval gate.)
Wait for user response. Do not proceed to Step 6 until the user has approved, rejected, or modified each finding and answered all questions. (Exception: in plan mode scenario 1 — all factual, no questions — ExitPlanMode serves as the approval gate; proceed directly to the plan file write without an explicit wait.)
Non-plan-mode scenario 1 equivalent: When all findings are factual-only (same criteria as plan mode scenario 1: wrong paths, wrong names, wrong counts, missing acknowledgment of existing coverage — no design preferences) and the resulting diff is mechanical (each finding maps to a deterministic text change), present the findings report AND the Step 6.2 diff summary in the same response. The user's single approval then covers both the Step 5 findings gate and the Step 6.3 final-approval gate — proceed directly to Step 6.4 (write) on approval. This mirrors plan mode scenario 1 structurally and eliminates a round-trip when the content warrants it. If any finding is a design preference, or if the diff requires judgment beyond mechanical substitution, fall back to the standard two-gate flow. Cross-reference: Step 6 "Non-plan mode" recognizes this bypass and skips 6.1-6.3 when taken.
Plan mode: Present the full findings report as text first (the structured report from Steps 2-4), then handle questions based on the scenario:
- All factual, no questions: All findings are unambiguous factual corrections (wrong names, wrong paths, wrong counts, or missing documentation of verified codebase facts). Present findings inline → immediately write the plan file with the diff summary as "Approved Changes (Diff Summary)" → call ExitPlanMode. The plan file write happens here in Step 5 (not in Step 6) — this is the only place it is written for scenario 1. The ExitPlanMode approval gate subsumes the Step 5 wait. Factual vs. design preference heuristic: A finding is factual if only one correct answer exists (wrong file path, wrong type name, wrong count). A finding is also factual when the spec states a general rule (e.g., "intern all branded domain identifiers") but omits a concrete instance that clearly falls under that rule — the omission has one correct answer (include it). A finding is a design preference if multiple valid presentations exist (how much detail to include, which format, which ordering). Improvements that propose a different level of detail or documentation granularity are design preferences, even when one option is clearly better — present the default as "(Recommended)" in
AskUserQuestion but let the user override. Design preferences belong in scenario 2 or 3, not scenario 1. Tiebreaker for blended cases: When a finding is simultaneously "general rule + omitted concrete instance" and "more documentation granularity", prefer factual classification if the concrete instances are already knowable from the codebase at audit time (a deterministic glob/grep surfaces them). The instances are data, not design — the only judgment is whether to include them, and a general rule already in the spec settles that.
- Mostly factual, 1-2 blocking questions: Present findings inline → use
AskUserQuestion for only the blocking questions. In the inline ### Questions section, reference the AskUserQuestion call by number (e.g., "See questions below for selection") rather than duplicating the full question text.
- Multiple questions (up to 3): Present findings inline → use a single
AskUserQuestion call containing all questions (the tool supports up to 4 per invocation). This gives the user full context before being asked to decide.
In all cases, inline text questions won't block in plan mode — AskUserQuestion is required for blocking interaction.
If the user's answers raise new questions or invalidate previous findings, present a follow-up round (same format, same question limit). Repeat until all findings are resolved.
If the user defers one or more decisions back to you (e.g., "you decide", "reassess based on FOUNDATIONS"), first validate any additional context the user provided with their deferral (e.g., facts about how other games use the system, claims about existing architecture, OR authoritative source pointers the user directs you to read — rules files, card definitions, reference implementations) against the codebase using read-only tools — the FOUNDATIONS recommendation is only as sound as the facts it's based on. Then analyze the question against docs/FOUNDATIONS.md principles, present your recommendation with the specific Foundation justification, and treat it as approved unless the user objects. If the recommendation does not expand the spec's scope (e.g., it only adds a clarifying note or corrects a factual claim), treat it as approved immediately without blast radius re-assessment. If the recommendation expands the spec's scope (e.g., adding a prerequisite refactor justified by Foundation 15), re-assess the blast radius for the expanded scope before treating the recommendation as approved — add any newly-affected files to the findings. Scale the analysis depth to the question type:
- Simple factual questions: A one-line Foundation reference suffices.
- Design questions with a clear FOUNDATIONS answer: Evaluate each option against the relevant Foundations. Identify the decisive Foundation(s) — the one(s) that make one option clearly superior. Lead the recommendation with that Foundation. Provide a focused paragraph — state the recommendation, cite the Foundation(s), and briefly explain the implementation consequence. Do not enumerate alternatives at length. If reaching the recommendation requires briefly explaining why alternatives are inferior, a short comparison is acceptable — the prohibition targets open-ended design discussions where no Foundation is decisive, not cases where the comparison itself demonstrates the Foundation's applicability.
- Architectural questions with multiple viable alternatives: Provide a brief comparison of alternatives against FOUNDATIONS.md principles before presenting the recommendation.
When multiple questions are deferred simultaneously, present the FOUNDATIONS analysis for each in a single reply using clear subheadings (one per deferred question) — this preserves the user's context and avoids serializing decisions that aren't dependent. If one deferred decision's resolution affects another (e.g., the reporter-mechanism choice in one question moots a syntax question in another), present the dependency explicitly and resolve in order.
Optional structural template for scope-expanding recommendations: When a FOUNDATIONS-derived recommendation expands the spec's scope (per the blast-radius re-assessment rule above), conclude the analysis with a brief "Blast radius of the scope expansion" subsection listing new files touched, functions deleted/relocated, new test coverage required, and any observable behavioral shifts (e.g., diagnostic surface changes). This keeps expansion impact visible in one place rather than inlined across prose and gives the user a concrete basis for approving or objecting before the draft is written.
If a deferred question is rendered moot by another approved decision (e.g., the section containing the threshold is removed), note the mooted state and move on — no FOUNDATIONS analysis needed.
Step 5 → Step 6 collapse when deferred decisions auto-approve: If resolving a deferred decision is the last open item in Step 5 AND the FOUNDATIONS-derived recommendation does not expand scope (already treated as approved per the non-scope-expansion rule above), present the Step 6.2 diff summary in the same reply as the FOUNDATIONS analysis — do not split into two round-trips. The user's next message then covers both the deferred-decision approval and the Step 6.3 final approval in one. This is a common flow when a user defers the last open question with phrasing like "you decide" or "determine what's best"; a separate round-trip to re-present the diff summary adds latency without giving the user anything new to evaluate. If the recommendation does expand scope (blast radius re-assessment required), keep Steps 5 and 6 separate — the user needs a chance to reject the scope expansion before the diff summary bakes it in. Scope-expansion vs collapse are mutually exclusive paths: if you ran the "Blast radius of the scope expansion" structural template for a recommendation, you are in scope-expansion mode — do not also collapse Steps 5→6. Equivalently: the collapse is only for recommendations that change zero type surfaces, switch arms, test files, or proposed modules. Formalizing a placeholder already named in the spec's narrative (e.g., lifting a pseudocode term like ExplicitStochasticNode into a concrete type variant) still counts as scope expansion under this rule because the type system gains a new variant — present the blast radius, wait for approval, then draft. User phrasing like "no matter the blast radius" authorizes accepting a wider radius but does not waive the round-trip: the user still needs an opportunity to object before the diff summary locks the expansion in. Scope-collapsing reframings are NOT eligible for the collapse either, even though they technically do not expand scope: the structural impact of collapsing the spec's proposed approach (removing proposed modules, error classes, stop reasons, or substantial sections) warrants an explicit user-approval round-trip. Keep Steps 5 and 6 separate in this case — the user needs an opportunity to reject the collapse before the full rewrite is drafted. Option-fork approval in Step 4's Obsolescence handling (option (a) archive vs. option (b) rewrite) IS that rejection opportunity — once the user chooses rewrite, drafting may proceed and Step 6.3 covers only diff-summary approval, not a second chance to reject the collapse itself. Exception B does not apply until the diff summary has been presented inline.
If the user requests deeper analysis of a specific finding before deciding, perform the investigation using read-only tools (reading additional source files, tracing call chains, etc.) and present updated findings before re-asking for approval. This investigation round does not count toward the follow-up question limit — it is resolution of the original question, not a new question.
If the user answers a question with uncertainty plus a request to investigate (e.g., "I don't know — investigate", "not sure, check the codebase"), treat it as a combined investigation-then-approval flow: perform the read-only investigation, present the result inline as an updated finding, and continue to the next pending question or the diff summary without re-asking the original question. Only re-ask the original question if the investigation surfaces new alternatives the user must choose between; otherwise, the investigation result IS the answer.
New findings surfaced during investigation or draft preparation: If investigating a deferred question OR closely re-reading the spec while preparing the Step 6.1 draft surfaces a finding not in the original findings report (e.g., reading a user-pointed source reveals that a claim in the spec is wrong that wasn't caught in Step 2; or noticing a wrong test command in an Acceptance Criterion while drafting its replacement), treat it as a net-new Issue/Improvement/Addition, present it inline alongside the deferred-question answer or in the diff summary with brief acknowledgment, and re-evaluate whether it triggers scope-collapsing reframing per Step 4. Both deferred-decision investigations and draft preparation are common paths for net-new findings — investigations read sources outside the initial Explore agent scope (authoritative rules documents, specific data files), and draft preparation forces revisiting every section the diff summary will mention. Do not silently absorb such findings — present them as separately classified findings so the user can approve or reject them on their own terms.
In-session diagnostic investigations: When a deferred question requires answering a behavioral claim that static tracing cannot settle (e.g., "what does this classifier return on this state?", "does this failure reproduce on seed X?", "what's the actual shape of this runtime value?"), author an ad-hoc diagnostic script rather than speculating. The pattern:
- Where to place it: sibling to existing campaign diagnostics (e.g.,
campaigns/<campaign>/diagnose-*.mjs), imitating the style of the campaign's existing diagnose-*.mjs files. Import from packages/engine/dist/ to avoid a full-build loop when the engine is already compiled.
- Scope: minimal reproduction — instrument the exact claim being validated, not the surrounding subsystem. One script per question.
- Check-in decision: default is yes, if the script's finding reshapes the spec's Problem Statement or Source section — it becomes a reproducible I0 fixture cited from the spec. Otherwise delete after use. The final summary in Step 7 should record this decision so the user sees the outcome.
- Precedent:
campaigns/fitl-arvn-agent-evolution/diagnose-existing-classifier.mjs (reassess-spec session on 2026-04-19) — exemplar for the pattern: imports from dist/, minimal reproduction (probes classifyMoveDecisionSequenceAdmissionForLegalMove directly on captured state), checked in alongside the reframed spec it informed.
This pattern is specifically enabled by the 0-agent row in the Step 2 agent table: running live code is outside agent capability and is the direct-investigation case. The script's finding should propagate into the spec's Source and/or Problem Statement sections as a checked-in I0-equivalent fixture before the rewrite.
Step 6: Write the Updated Spec
Plan mode: All scenarios proceed to Step 6.4 after ExitPlanMode approval — the updated spec MUST be written before Step 7's verification can run. Concretely:
- Scenario 1: the plan file was already written in Step 5 and approved via ExitPlanMode. Skip Steps 6.1–6.3 (draft/present/wait — all subsumed by the plan file and its approval). Proceed directly to Step 6.4 (write the updated spec), then Step 7.
- Scenarios 2-3: Skip Steps 6.1-6.3. The plan file's "Approved Changes (Diff Summary)" section (written in Step 5) serves as the draft and presentation. ExitPlanMode approval covers this gate — proceed directly to Step 6.4 after plan approval, then Step 7.
Do not conflate "plan file written" with "spec file written" — the plan file is a durable artifact recording the intent; the spec file is the deliverable that the user actually consumes.
Non-plan mode: After all findings are resolved and the user has approved the changes. Scenario 1 bypass: if Step 5's non-plan-mode scenario 1 equivalent was taken (findings + diff summary presented together and user approved in one response), Steps 6.1-6.3 are already completed inline; proceed directly to Step 6.4. Otherwise, proceed through 6.1-6.4 sequentially:
- Draft the updated spec incorporating all approved changes. Preserve the spec's existing structure and voice. Do not rewrite sections that have no findings — change only what was agreed upon. Pre-draft sentinel grep: Before drafting, for every term being substituted (citation token, line number, type/function name, deprecated identifier, replaced concept), run
grep -nF '<old-term>' <spec-path> against the spec. Queue every occurrence as an edit regardless of section — including Brainstorm Context, Problem Statement, User Constraints, motivation/context prose, ASCII diagrams, pipeline boxes, table cells, FOUNDATIONS alignment rows, and edge-case bullets, not just the section the original finding pointed at. Step 7 verification will catch stragglers, but at the cost of a corrective round-trip the pre-draft grep eliminates. (This operationalizes the cascading-corrections rule from Step 4.)
- Present the diff summary to the user as a numbered list:
N. **<section name>**: <brief change description> (one line typical; sub-bullets permitted when a single section's change spans multiple coordinated artifacts, e.g., a new module plus its migration sites). Include metadata field changes (Status, Priority, Complexity, Dependencies) in the diff summary when they change as a consequence of the reassessment findings — these affect downstream ticket decomposition. Format self-check before drafting: explicitly count how many of the spec's sections you changed — the raw >60% threshold is easy to miss implicitly when you're mid-draft. Compute the ratio as (modified + added) / (post-edit total section count) — additive sections (entirely new sections appended without removing or modifying existing ones, e.g., adding a Follow-On Tickets section) count as 1 changed section each in the numerator and 1 in the denominator. Common triggers that cross the threshold (use these as operational cues rather than counting sections one by one): Obsolescence-driven rewrites, A-vs-B fork resolutions that delete one option entirely, scope-collapsing reframes, multi-finding reassessments where the section skeleton changes (sections renamed, removed, merged, or reordered). Surgical paragraph-level edits across many sections do NOT trigger the threshold even if the raw section count crosses 60% — when section headings, sub-headings, and structural ordering are preserved, the numbered diff list remains the right format because each entry maps to a localized change the user can audit. Operational test: count sections whose headings or sub-headings changed plus sections that were added or removed wholesale, divided by post-edit total sections. If that ratio is <60%, use the numbered diff list regardless of how many sections had prose edits. For full or near-full rewrites (>60% of sections materially changed), replace the numbered diff list with a prose structural summary (2-4 sentences noting what was reframed, removed, or added) followed by the full draft inline — a numbered diff format implies surgical changes, which is misleading when the spec is being restructured.
- Wait for final approval before writing the file. Two exception paths bypass the wait:
- Exception A — pre-approval via delegation, bulk approval, or direct forward command: Three patterns trigger this bypass. (1) Explicit delegation of outstanding decisions in the user's most recent message (e.g., "you decide", "proceed with whatever you think is best", "reassess based on FOUNDATIONS", "choose whatever seems better"). When the open question came from Step 5's Single-judgment-call exception, the user picking the recommended option is the common delegation form — phrasings include "go with the recommendation", "recommended X", "(c) recommended", "1) recommended", "pick your recommendation". The delegation (in either form) itself serves as approval. (2) Bulk approval of all findings when the diff is mechanical (e.g., "Approved", "Approved.", "approve all findings", "apply all corrections", "accept everything", "approve all five corrections") — the user has individually approved each finding and the diff is a deterministic consequence, so a separate Step 6.3 round-trip adds no information. (3) Direct forward command after the findings approval gate (e.g., "Proceed", "Go", "Continue", "Move on") issued in response to a Step 5 findings report that closed with "Awaiting approval" or equivalent — the command approves the findings as written and applies regardless of whether the diff is mechanical or requires composition. Pattern (3) is distinct from (2) because the user has not signaled bulk-approval-with-mechanical-diff intent; they have issued a forward command that covers the next gate. It is also distinct from Exception B (which requires the diff summary to have already been presented). In all three cases, present the diff summary inline and proceed directly to Step 6.4 without re-asking. This parallels the plan-mode bypass where ExitPlanMode approval covers the diff summary gate. The delegation, bulk approval, or forward command must be explicit and cover all open questions — partial delegation (e.g., one question deferred, another still open) does not trigger the bypass. If Step 5's non-plan-mode scenario 1 equivalent was already used, Exception A is typically redundant — scenario 1 covers the same case structurally and arrives at 6.4 without a separate diff-summary round-trip.
- Exception B — pre-approval via direct write instruction: If the user's most recent message is a direct write instruction after the diff summary has been presented (e.g., "Write the updated spec", "Apply the changes", "Go ahead and write it", "Proceed", "Ship it"), treat it as the final approval. The direct instruction subsumes Step 6.3 — no further presentation or confirmation is needed. Distinguish from delegation: delegation (Exception A1) resolves open decisions before the diff summary; bulk approval (Exception A2) accepts all findings before the diff summary; direct write (Exception B) approves the already-presented diff summary.
- Write the updated spec to the same path as the original. Prefer targeted
Edit calls over Write for surgical changes — Write overwrites the full file and risks silent loss of sections not covered by the diff summary. Batch in one message by default. When ≥2 Edit calls have unique old_strings across the file, issue them as parallel tool calls in a single message — the Edit tool uses exact-string matching, so order is irrelevant when old_strings cannot collide. Stating intent ("applying edits in parallel") in conversation is not enough — the calls must actually be in the same assistant message. Sequential top-to-bottom is required only when old_strings could collide (e.g., replace_all near a repeated phrase), when one edit's content becomes another's anchor, or when insertions are adjacent enough to shift each other's anchor text. Use Write instead when cumulative changes exceed ~50 lines AND the edits are adjacent/overlapping (or restructure file-level layout — renamed top-level sections, wholesale region replacements), OR when changes cross the >60% section threshold (the full-rewrite case from 6.2). Non-adjacent surgical edits with unique old_strings should use parallel Edit regardless of cumulative volume — the exact-string semantics make ripple effects impossible. After edits, proceed directly to Step 7 verifications.
If the user requests changes to the draft, incorporate them and re-present before writing.
Plan mode note: If invoked during plan mode, Steps 1-5 proceed normally (read-only). Step 6 (writing the updated spec) is deferred until plan mode is exited. For scenarios 2-3: all AskUserQuestion rounds (including user-deferred FOUNDATIONS analysis) must be fully resolved before writing the plan file — the plan file should capture the final approved changes, not intermediate proposals. Record the approved changes in the system-provided plan file (including the diff summary), then call ExitPlanMode. Title the plan file section "Approved Changes (Diff Summary)" — this serves as the Step 6.2 presentation. The plan file version of the diff summary should be at least as detailed as the inline presentation — the plan file is the durable artifact. ExitPlanMode approval covers both the plan and the diff summary — a separate Step 6 approval is not needed. After ExitPlanMode approval, proceed directly to Step 6.4 (write the updated spec) — do not re-present the diff summary in conversation.
Step 7: Post-Write Verification and Final Summary
After writing the updated spec, run three verifications:
- Paths verify per direction: Cited paths fall into two classes — verify each accordingly. Existing references (source files, test files, dependencies cited as already-exists) must exist:
[ -f "$path" ] && echo OK || echo MISSING "$path". Proposed-new artifacts (test files, modules, types listed as deliverables in Overview/Acceptance Criteria/Follow-On Tickets) must NOT yet exist: [ ! -f "$path" ] && echo "OK absent" || echo "UNEXPECTED EXISTS". A deliverable path that already exists means the spec's premise was already implemented and Step 4 should have classified it as Obsolescence — fix the classification, do not just remove the path. Use Glob for a few paths, or a batch Bash loop when verifying 10+ paths total. This catches stale references introduced during the rewrite AND deliverables that have been silently superseded.
- Removed references gone: Grep for each reference the reassessment removed, to confirm no stragglers survive in sections that weren't the primary edit target. This catches incomplete edits where a stale reference was removed in one section but persists in another. Intentional preservations are not stragglers: historical mentions in Source, Dependencies, or changelog-style sections that intentionally cite a removed artifact (e.g., "Spec 17 §4 deleted
deriveDeferredX") are expected — confirm by reading the surrounding prose before editing. Only true stragglers (references in prescriptive/implementation sections that should have been updated) require correction. Bash chaining caveat: a successful grep here returns exit 1 (no match = desired outcome), which breaks && command chains. When batching verifications in one bash command, use ; chaining or append an explicit echo "Exit: $? (1 = no stray match, good)" after the grep — && will silently terminate the chain before later verifications run. Example: grep <stale-pattern> <file> ; echo "Exit: $?".
- Section headings preserved: Grep for
^## top-level headings (and ^### sub-headings where relevant) to confirm no standard section was renamed or removed during editing. Downstream skills like /spec-to-tickets rely on stable headings — renames break ticket decomposition. The "Preserve downstream structure" guardrail is enforced here.
If any verification fails, correct the spec via Edit and re-verify the specific item before presenting the final summary. Do not present the "suggested next step" until all three verifications pass.
Then present:
- Number of issues fixed, improvements applied, and additions incorporated
- Any deferred items the user chose not to address now
- Suggested next step:
/spec-to-tickets <spec-path> <NAMESPACE> to decompose into tickets
Do NOT commit. Leave the file for user review.
Guardrails
- FOUNDATIONS alignment is mandatory: Every change to the spec must respect
docs/FOUNDATIONS.md. Never approve a spec change that violates a Foundation principle, even if the user requests it — flag the conflict instead.
- Codebase truth: All references in the updated spec must be validated against the actual codebase. Never propagate stale file paths, renamed types, or removed functions.
- One question at a time in follow-ups: After the initial report (which may have up to 3 questions), follow-up rounds ask one question at a time to avoid overwhelming the user.
- YAGNI ruthlessly: Additions must be natural extensions of the spec's scope. Do not propose features that "might be nice" but are not aligned with the spec's stated goals.
- No scope creep: The deliverable is the updated spec file. Do not write design docs, create tickets, or start implementation.
- No approach proposals: This is reassessment, not greenfield design. Do not propose 2-3 alternative architectures. The spec already has a design — validate and refine it. Exception: when a user-deferred decision is resolved via FOUNDATIONS.md analysis and the recommended fix requires expanding the spec's scope to address a root cause (Foundation 15), present the scope expansion as an Addition finding with explicit Foundation justification.
- Preserve spec voice: When editing, match the spec's existing writing style. Do not rewrite unchanged sections for stylistic preferences.
- Preserve downstream structure: When writing the updated spec, preserve all metadata fields (Status, Priority, Complexity, Dependencies, etc.) and section headings that downstream skills (e.g., spec-to-tickets) may depend on. Do not rename or remove standard sections. Filename preservation: also preserve the spec's filename even when the on-page title shifts as a result of scope-collapsing reframing or retitling — filenames are referenced by ticket namespaces (e.g.,
138AGESTUVIA-003), archived specs, commit messages, and external review links. If the on-page title no longer matches the filename's slug, note the divergence in the Step 7 final summary so the user can decide whether a follow-up rename is warranted.
- Worktree discipline: If working in a worktree, ALL file operations use the worktree root path.