en un clic
sdd-verify
// Use when verifying that an implementation matches a change's SDD artifacts. Triggers: "verify", "check implementation", "did I implement everything", "verify the change", "is implementation complete", "check conformance".
// Use when verifying that an implementation matches a change's SDD artifacts. Triggers: "verify", "check implementation", "did I implement everything", "verify the change", "is implementation complete", "check conformance".
Use when the user wants rigorous, non-sycophantic editorial feedback on a draft, essay, blog post, or argument through back-and-forth dialogue — pressure-testing thesis, structure, argument, clarity, tone, and evidence. Triggers: "be my sparring partner", "pressure-test this draft", "poke holes in my argument", "is this ready to publish", "sharpen this post", "where is this weak". Not for one-shot copyediting, proofreading, or ghostwriting.
Use when distilling the through-line gist of one or more sources — the spine, argument, tension, or recurring frame running through a set of documents, notes, research, or transcripts, OR across the ideas within a single rich piece — into a few concise paragraphs. Triggers: "synthesize", "what's the through-line/gist", "extract the insight", "pull these together". Not for faithful summary or condensation that covers what a source says, nor for comparisons or catalogs where enumeration is the deliverable.
Use when deriving SDD specs from existing code or retroactively documenting implemented behavior. Triggers: "derive specs", "generate specs from code", "document this code in SDD", "retrofit specs", "spec out what's already built".
Use when creating a new change with all SDD artifacts — proposal, delta specs, design, and tasks. Triggers: "propose a change", "create a change for X", "I want to implement feature Y", "start a new change", "let's build X".
Use when the user wants any spec-driven development action — exploring before speccing, deriving specs from code or translating specs from another system, proposing a change, applying or implementing, verifying, syncing delta specs, or archiving. Triggers: 'spec this out', 'create a change', 'apply tasks', 'verify implementation', 'sync specs', 'archive change', or any mention of 'specs' or 'sdd'.
Use when merging delta specs from a change into the main baseline specs. Triggers: "sync specs", "merge delta specs", "update main specs", "apply the spec changes", "sync the change".
| name | sdd-verify |
| description | Use when verifying that an implementation matches a change's SDD artifacts. Triggers: "verify", "check implementation", "did I implement everything", "verify the change", "is implementation complete", "check conformance". |
Verify that the implementation satisfies the contracts stated in the change's specs. Produces a structured report across six dimensions (Completeness, Scope, Contract, Coverage, Coherence, Conformance) with three severity levels.
Specs are contracts — property statements about observable state (see references/sdd-spec-formats.md § 1).
Scenarios are evidence that samples those contracts, not the contracts themselves (§ 1.5).
sdd-verify samples: it checks that implementations honor the stated scenarios and traces through code for the broader contract claim.
It does not formally prove universal properties — strong claims supported by thin scenarios are a risk flag, not a failure.
SPECS_ROOTis resolved by thesddrouter before this skill runs. Replace.specs/with your project's actual specs root in all paths below.
sdd-verify.sdd-sync — confirm implementation matches the change before syncing specs.specs/changes/<name>/ is missing) — the skill is change-scoped; use sdd-derive to retrofit specs from existing code insteadsdd-apply to drive work; verify afterBefore starting, check .specs/changes/<name>/tasks.md:
tasks.md is missing, warn: "No tasks.md for this change — completeness check will be skipped."
Offer to proceed anyway.tasks.md exists but no tasks are marked complete, warn: "No completed tasks yet — verify output will be limited."
Offer to proceed anyway.| Dimension | Question | What to check |
|---|---|---|
| Completeness | Is everything done? | All tasks checked off, all delta spec requirements implemented |
| Scope | Is everything implemented in scope? | Every meaningful code change traces to a delta spec requirement; no unspecified behavior introduced |
| Contract | Does the implementation satisfy the spec contract? | Implementation honors each requirement's scenarios and the broader contract claim stated in the text |
| Coverage | Do scenarios meaningfully sample the contract? | Scenarios span happy path, boundaries, and plausible failure modes — not trivially-passing cases |
| Coherence | Does it follow the design? | Implementation follows decisions in design.md |
| Conformance | Do schemas match the specs? | Generated schema diff confirms spec requirements are reflected in the schema |
| Level | Meaning | Action required |
|---|---|---|
| CRITICAL | Implementation contradicts a requirement or design decision | Must fix before proceeding |
| WARNING | Implementation partially meets a requirement or deviates from design | Should address |
| SUGGESTION | Minor improvement opportunity | Optional |
Severity is not the same thing as sync gating.
A finding can keep its original severity and still be non-blocking when the user explicitly authorizes that exception and it is recorded per references/sdd-change-formats.md.
Evidence classification (VERIFIED / TESTED / INSPECTED / WAIVED), the SHALL/INSPECTED → CRITICAL sufficiency rule, the waiver format and provenance check, and the citation format are defined in references/evidence-rules.md.
Read that file before Phase 4 — both the orchestrator and any subagents need it.
In summary:
design.md § Verification Waivers entry with checkable manual evidence.Hard rule: SHALL at INSPECTED → CRITICAL unless waived. Static reading is never sufficient for a SHALL.
Verification overrides are separate from waivers:
Overrides never change severity; they only affect blocking status. Use them to preserve decision continuity, not to erase or reclassify findings.
Phases 4–6 (Evidence, Contract, Coverage) SHALL be evaluated against the parallel-subagent gate before Phase 3 begins. Single-agent execution is permitted only when the gate routes you there — it is not the default.
The gate evaluation runs as a required step within Phase 2.
The full availability gate, granularity proposal, model resolution, dispatch protocol, and synthesis steps live in references/parallel-subagent-path.md.
Read all available artifacts (graceful degradation — proceed with what exists):
.specs/changes/<name>/tasks.md — task completion status.specs/changes/<name>/design.md — design decisions and any ## Verification Waivers / ## Verification Overrides (if exists).specs/changes/<name>/specs/ — delta specs (if exist).specs/specs/ — baseline specs for full contextRun the project's full test suite before any other checks. Verification claims rest on observed behavior, so a green suite is the baseline evidence that the implementation actually works.
Discover the test command by checking, in order:
CONTRIBUTING.md, README.md, AGENTS.md, CLAUDE.md.Makefile targets, package.json scripts.test, pyproject.toml [tool.pytest.ini_options] or [tool.poetry.scripts], tox.ini, noxfile.py, Cargo.toml..github/workflows/*test*.yml, .gitlab-ci.yml.Run the full suite with per-test-ID output — do not narrow to changed files or a single test path.
Subagents confirm citations by searching the log for individual test names, so the output must show each test's pass/fail status by name.
Common flags: pytest -v, jest --verbose, go test -v, nox -- -v.
For machine-readable output use pytest --junit-xml=<path> and point subagents at the XML.
If the runner cannot produce per-test-ID output, treat all requirements as INSPECTED and note this as a WARNING.
Capture the output to .specs/changes/<name>/.verify/test-output.log.
If any tests fail or error, flag each failure as CRITICAL in the report and stop the verify run by default — do not proceed to later phases. A failing suite invalidates Contract, Coverage, and Coherence conclusions, and there is no reliable way to localize "affected area" without a test→requirement map. Re-run after fixes.
If the user explicitly says they already know about the failing suite and wants to proceed anyway, continue only in a failing-suite override mode:
sdd-sync, record the exception per references/sdd-change-formats.md and ensure tasks.md contains an unchecked remediation task.If the project has no runnable test suite, note this as a WARNING and continue — every requirement will land at INSPECTED at best.
Do not modify, skip, or xfail tests to make the suite pass — that is itself a CRITICAL finding.
You SHALL run this gate after Phase 2 and before Phase 3. Single-agent execution is allowed only when the gate routes you there.
Checklist — do all four:
Read references/parallel-subagent-path.md § 1 (Availability Gate) and § 2 (Granularity).
Count delta spec requirements across all in-scope capabilities.
Apply the gate table in § 1 (first match wins) to determine the path.
Announce the outcome to the user using this template, then wait for confirmation if parallel:
Parallel gate: <N> requirements across <K> capabilities.
Path: <single-agent | parallel> — <reason from gate table row>.
<If parallel:> Proposing <P> subagents at <per-capability | per-requirement> granularity. Confirm or override.
If the gate sends you to single-agent, proceed to Phase 3 yourself.
If the gate sends you to parallel, wait for user confirmation, then resolve model and dispatch per references/parallel-subagent-path.md §§ 3–4.
Failing-suite override and the gate.
The gate table in references/parallel-subagent-path.md § 1 already partitions failing-suite states (rows 2 and 3): parallel dispatch is allowed under override only when the dispatch inputs explicitly carry the override state and the known overridden blockers; otherwise the gate routes to single-agent.
Apply that table — do not improvise a separate decision here.
Audit rule. If no gate-outcome announcement appears in the run, the gate was not evaluated and the next phase must not begin.
If .specs/.sdd/schema-config.yaml exists, regenerate "after" snapshots now, before evidence classification, so the schema diff is available as VERIFIED evidence in Phase 4.
.specs/changes/<name>/schemas/after/.If no schema config exists, skip silently. Phase 7 is where the diff is interpreted against the spec; this phase only produces the artifact.
This is the first per-requirement phase — eligible for subagent dispatch (see references/parallel-subagent-path.md).
tasks.md.references/evidence-rules.md §§ 1, 3, 5:
evidence-rules.md § 2):
design.md (then WAIVED, subject to the provenance check in § 4).If Phase 2 is in failing-suite override mode, passing tests from that run may still be cited for requirement-local evidence, but the report must state that the overall suite is failing and that downstream conclusions are advisory.
The output of this phase is a per-requirement evidence table that drives Phase 5 and the final report. On the parallel path, the orchestrator assembles this table from subagent findings during synthesis.
Cross-cutting search work — runs at the orchestrator regardless of single-agent vs. parallel path. Subagents do not enumerate write-sites; they consume the enumeration as a dispatch input.
For each ADDED or MODIFIED universal SHALL in scope, produce a list of contract-relevant write-sites — code locations that produce or modify the value the SHALL is over.
See references/sdd-change-formats.md § 3.1 for the definition and heuristic.
Procedure:
code-review-graph query_graph_tool), or repeated reads.requirement → [write-site path:line, …].The enumeration is the orchestrator's responsibility because:
The enumeration feeds Phase 5 step 4 directly, and on the parallel path it is passed verbatim to subagents in the dispatch prompt under Implementation write-sites in scope.
If a SHALL is not universal (narrow shape, no input-space partitions) skip enumeration — single-site requirements don't need this check.
This is the inverse of Phase 4 (Completeness): for each meaningful code change, check whether a delta spec requirement covers it. It runs at the orchestrator regardless of single-agent vs. parallel path — subagents do not run this phase.
Get the code diff for this change:
git diff <base-branch>...HEAD (or git diff HEAD~1..HEAD if a single commit), capturing meaningful changes only.tasks.md references or the user.Identify meaningful changes — changes that alter observable behavior:
Classify each meaningful change into one of three categories:
Flag related and unspecified changes by severity:
In-scope changes require no finding — omit them from the report.
Record the scope map — which changes are covered and by which requirement — so Phase 9 can include it in the report. A change that is fully covered need not be mentioned; only unspecified changes appear in the report.
If no git diff is available and file enumeration is also impossible, note this as a WARNING and skip the scope check.
sdd-sync replaces each MODIFIED requirement wholesale, so any baseline scenario or sub-clause the delta block omits is silently deleted at sync.
This check is the gate that catches that loss before sync; it runs at the orchestrator regardless of single-agent vs. parallel path.
For each MODIFIED requirement in the delta specs:
.specs/specs/<capability>/spec.md (same capability directory, same requirement name).design.md), flag WARNING: MODIFIED <name> delta drops baseline scenario <scenario> — sdd-sync will delete it from the baseline.
Restore the full post-change requirement (references/sdd-spec-formats.md § 4) or document the removal in design.md.A delta requirement with no matching baseline requirement is ADDED or RENAMED, not MODIFIED — out of scope for this check.
For a deterministic backstop on the scenario part of this check, run scripts/check_modified_completeness.py <specs-root> --change <name> — scoped to the change under verification, matching the sdd-sync guard so an unrelated active change cannot fail this one.
Omit --change to scan every active change, e.g. as a repo-wide pre-commit/CI gate that runs without an agent.
It compares scenario names only and exits non-zero when the MODIFIED delta drops a baseline scenario; sub-clause and body-text loss (step 2) still needs a human read.
Intentional drops are marked with a <!-- modified-removes: ScenarioName --> comment in the delta block.
Skip requirements flagged as missing in Phase 4. For each requirement at TESTED or VERIFIED tier, confirm the cited evidence actually demonstrates the contract claim — do not just trust the test name. INSPECTED-tier requirements are not contract-checked here; their tier already determined the finding in Phase 4.
If Phase 2 is in failing-suite override mode, Contract findings are still useful for triage, but they do not clear the change for release or sync.
For each implemented requirement at TESTED or VERIFIED:
references/sdd-spec-formats.md § 1).<path:line> for SHALL <name> has no test coverage.
A passing test on one path is not evidence for a deduplication shortcut, retry branch, or composition step that writes the same value.For each requirement, assess whether its scenarios meaningfully sample the contract claim. This is a coverage heuristic, not a formal check — it guards against requirements whose scenarios are too thin to catch a plausible failure.
If Phase 2 is in failing-suite override mode, label Coverage conclusions as provisional because the suite is already known red.
Coverage smells to flag as WARNING:
references/sdd-spec-formats.md § 1.6.
When a positive signal fires (lifecycle states, identity/equivalence, multi-source composition, derived-pair invariant) but scenarios cover only some partitions of the input space, flag this distinct from "single scenario."
This is a qualitative concern: the input space partitions into named arms and only some are sampled.
Example: a SHALL with identity/equivalence semantics whose only scenario covers (novel input) and never (equivalent-to-existing input) — partition-incomplete, even if multiple scenarios are present.Flag as SUGGESTION (not WARNING) if the requirement is genuinely narrow and a single scenario is adequate (e.g., "the response Content-Type SHALL be application/json").
If Phase 3 produced a schema diff:
schemas/expected.md (if present)?<field/endpoint> — not covered by any requirement").openapi.yaml), diff it against the newly generated snapshot.
Flag genuine divergences as WARNING; tolerate documented additions (e.g., vendor extensions, computed fields) noted in design.md.If schemas/expected.md exists but Phase 3 was skipped (no schema-config.yaml), note this in the report — the user expected schema tracking but it was not configured.
If design.md exists:
design.md, verify the implementation follows it via static reading.If Phase 2 is in failing-suite override mode, Coherence findings remain worth reporting, but they do not outweigh the CRITICAL failing-suite status.
Coherence findings are static reads by design — design.md records HOW choices, not external contracts, so the evidence-tier rule does not apply here.
A decision either is or is not reflected in code; if the decision needs runtime confirmation, it should have been promoted to a spec requirement.
Before writing the Summary, classify blocking status:
references/sdd-change-formats.md.For each entry in design.md § Verification Overrides, carry the audit record into the report:
# Verification Report: {Change Name}
**Date:** {date} **Tasks:** {N}/{M} complete
## Evidence
| Requirement | Level (SHALL/SHOULD/MAY) | Tier (VERIFIED/TESTED/INSPECTED/WAIVED) | Citation |
| ----------- | ------------------------ | --------------------------------------- | ------------------------------------------ |
| {name} | SHALL | TESTED | `tests/path::test_name` |
| {name} | SHALL | WAIVED | design.md § Verification Waivers → {ref} |
| {name} | SHOULD | INSPECTED | (no executable evidence — flagged WARNING) |
## CRITICAL
- {Description of issue} — {file:line or area}
## WARNING
- {Description of issue}
## SUGGESTION
- {Description of improvement}
## WAIVED
- {Requirement} — manual evidence: {reference} (per design.md § Verification Waivers; provenance: {predates change | added during implementation | added after verify failure})
## OVERRIDES
- {blocking finding or gate outcome} — stage: {stage}; reason: {reason}; constraints: {constraints}; follow-up: {tasks.md task text}; approved by: {approved by}; recorded: {recorded}
## SCOPE
- {Change description} — {file:line} · {related | unspecified} (CRITICAL/WARNING/SUGGESTION)
## CONFORMANCE
- [x] {What schema confirmed} — {requirement name}
- [ ] {What schema did not confirm} — {requirement name} (CRITICAL/WARNING)
## Summary
{1-2 sentences on overall status and recommended next step}
Omit empty sections (e.g., omit CONFORMANCE if no schema config exists; omit WAIVED if no waivers apply).
If Phase 2 proceeded under a failing-suite override, the Summary must say so explicitly and recommend fixing or isolating the failing tests before treating the verify run as complete.
If any blocking findings are overridden for sync, the Summary must say that the run is not a clean pass and that sdd-sync is permitted only under the recorded overrides.
If no issues found:
All applicable dimensions verified:
- [x] Completeness
- [x] Scope
- [x] Contract
- [x] Coverage
- [x] Coherence (if design.md exists)
- [x] Conformance (if schema-config.yaml exists)
- [x] Evidence — every SHALL at TESTED or VERIFIED (or WAIVED with checkable manual evidence)
Ready for `sdd-sync`.
Only include checklist items for dimensions that actually applied — match what's in the report.
If the run has no remaining blockers but does include overrides, do not use the clean-pass template above.
State instead that verification found issues the user explicitly chose not to let block sync, and that remediation remains open in tasks.md.
| Missing artifact | Behavior |
|---|---|
tasks.md | Skip completeness check, warn before proceeding |
design.md | Skip coherence check and waiver lookups, note in report |
| Delta specs | Skip contract and coverage checks, note in report |
schema-config.yaml | Skip Phase 3 snapshot and Phase 7 conformance check, note in report |
schemas/expected.md | Skip expected-vs-actual diff within conformance; run drift detection only |
| No git diff available | Skip scope check (Phase 4 step), flag as WARNING, note in report |
"Warn before proceeding" means a conversational message to the user. "Note in report" means adding a note inside the verification report itself.
references/sdd-spec-formats.md § 1.5 — scenarios are evidence, not definition).evidence-rules.md.evidence-rules.md § 4) — waivers added in the same branch as the implementation, especially after a failed verify, need to be surfaced.design.md but failing to carry its context into the OVERRIDES section or Summary, which breaks the audit trail for later verify/sync work.design.md and tasks.md.references/parallel-subagent-path.md § 6.sdd-sync replaces MODIFIED requirements wholesale, so a delta block that drops a baseline scenario causes silent contract deletion at sync; verify is the gate that catches it first.references/evidence-rules.md — tiers, sufficiency rule, waivers, provenance check, citation formatreferences/parallel-subagent-path.md — availability gate, granularity, dispatch, synthesisreferences/verify-subagent.md — canonical job description for subagents on the parallel pathreferences/sdd-spec-formats.md — contract shapes (§ 1.1), scenarios as evidence (§ 1.5), partition heuristic (§ 1.6)references/sdd-change-formats.md — task format (§ 3) and contract-relevant write-site definition (§ 3.1)references/sdd-schema.md — schema evidence annotations (§ 1) and lifecycle policy (§ 4)scripts/check_modified_completeness.py — deterministic dropped-scenario check (scenario names only, not body sub-clauses); exits non-zero on dropped baseline scenarios, wireable as a pre-commit/CI gate