with one click
workflow-spec-driven-dev
// [Workflow] Use when activating spec-driven development to keep engineering specs and business docs synchronized.
// [Workflow] Use when activating spec-driven development to keep engineering specs and business docs synchronized.
[HINT] Download the complete skill directory including SKILL.md and all related files
| name | workflow-spec-driven-dev |
| version | 2.1.0 |
| last_reviewed | "2026-04-21T00:00:00.000Z" |
| description | [Workflow] Use when activating spec-driven development to keep engineering specs and business docs synchronized. |
| disable-model-invocation | true |
[BLOCKING] Execute skill steps in declared order. NEVER skip, reorder, or merge steps without explicit user approval. [BLOCKING] Before each step or sub-skill call, update task tracking: set
in_progresswhen step starts, setcompletedwhen step ends. [BLOCKING] Every completed/skipped step MUST include brief evidence or explicit skip reason. [BLOCKING] If Task tools are unavailable, create and maintain an equivalent step-by-step plan tracker with the same status transitions.
Goal: Single entry point for spec-driven doc generation and maintenance. Two output layers, one workflow.
Output Layers:
| Layer | Path | Format | Audience |
|---|---|---|---|
| Engineering Spec | docs/specs/{app-bucket}/{system-name}/ | Tech-agnostic bundle (ERD + rules + API + events + journeys) | Engineers, AI agents, architects |
| Business Feature Docs | docs/business-features/{Module}/ | 17-section stakeholder docs | PO, BA, Dev, QA, UX |
Mode Routing:
| Mode | When to Use | Step Sequence |
|---|---|---|
init-full | Zero — no specs, no feature docs | scout → size-evaluation → plan → plan-review → plan-validate → spec-discovery(init) → feature-docs(init) → review-artifact → docs-update(final sync) → watzup → workflow-end |
update | Code changed, new requirement, new PBI | workflow-review-changes → spec-discovery(update) → feature-docs(update) → tdd-spec(update) → tdd-spec-review → tdd-spec direction=sync → review-changes → docs-update(final sync) → watzup → workflow-end |
audit | Quarterly health check, verify doc freshness | scout → spec-discovery(audit) → feature-docs(audit) → review-artifact → docs-update(final sync) → watzup → workflow-end |
Key Rules:
AskUserQuestion BEFORE any action — NEVER skip Step 0Skill tool for EACH step — NEVER batch-complete or mark done without invocationUse AskUserQuestion confirm mode before any action.
Auto-detection rules:
IF docs/specs/{app-bucket}/{system-name}/ NOT found AND docs/business-features/ has NO modules
→ Suggest: init-full
IF docs/specs/{app-bucket}/{system-name}/ NOT found AND docs/business-features/ has modules
→ Suggest: init-specs-only (run spec-discovery init only, feature-docs already has docs)
IF git diff has service/frontend changes
→ Suggest: update
IF explicit --audit flag OR user says "audit" / "check freshness" / "are docs stale"
→ Suggest: audit
System name confirmation:
Check for existing spec dirs:
ls docs/specs/ 2>/dev/null
{app-bucket} or a flat {system-name} depending on the host project. To find a specific {system-name}, probe ls docs/specs/{name}/.Present detected mode with reasoning. User confirms before proceeding.
Starting from zero: no docs/specs/{app-bucket}/{system-name}/, no docs/business-features/{Module}/ for target scope.
## Step A — Discovery (scout)
/scout
→ Holistic codebase map — module registry, entry points, integration boundaries
→ Produce: docs/specs/{app-bucket}/{system-name}/00-module-registry.md
→ Instruct scout: use spec-discovery Step 1 format (NOT task-file list)
→ TaskCreate: "scout — produce module registry for {system-name}"
## Step B — Size Evaluation and Divide-and-Conquer Planning (MANDATORY — runs BEFORE plan)
**[BLOCKING GATE]** Before writing any extraction plan, evaluate scope and recursively decompose until
each work unit is independently executable.
Evaluation algorithm (run inline after /scout completes):
ESTIMATE: module_count = count dirs in 00-module-registry.md phases_per_module = count applicable phases (A, B, C, D, E) total_tasks = module_count × phases_per_module
IF total_tasks > 50 (e.g., 13 modules × 5 phases = 65): → SPLIT into groups: max 10 modules per extraction run → Create one extraction plan per group → Run groups sequentially (one session each) → STOP — do NOT attempt full extraction in one session
IF 15 < total_tasks ≤ 50 (e.g., 4–10 modules): → Sub-agents mandatory (one per module) → Single plan covering all modules — proceed to /plan
IF total_tasks ≤ 15 (1–3 modules): → Single-session extraction — proceed to /plan
**Recursive decomposition rule:** If any extraction group still feels too large after splitting, split again.
Each sub-agent must handle ≤ 3 modules with ≤ 5 phases each = max 15 tasks per agent. This prevents
high-level shallow output from context dilution.
TaskCreate: "size-evaluation — count modules, estimate total_tasks, decide split strategy"
## Step C — Extraction Plan
/plan
→ Decompose into per-module × per-phase tasks (N×M minimum)
→ Phase A.ERD always included with Phase A (separate task per module)
→ ONE task per module × phase — 10 modules × 5 phases = 50 tasks minimum
→ ≤50 files per task — split large modules: "Business Rules: Orders (Part 1: Commands)", "(Part 2: Event Handlers)"
→ Dependency order: Phase A (domain model) before Phase B (business rules) per module
→ Priority: core domain modules first, infrastructure last
→ Verify TaskList count ≥ N×M before proceeding (BLOCKING gate)
→ If split strategy needed: produce one plan per extraction group
→ Produce: docs/specs/{app-bucket}/{system-name}/extraction-plan.md
→ TaskCreate: "plan — produce N×M extraction plan (N modules × M phases)"
## Step D — Plan Review
/plan-review
→ Validate: N×M task count confirmed, module coverage complete, no phase gaps
→ Verify: no module exceeds 3-module-per-agent limit if sub-agents used
→ Verify: each extraction group is independently executable (no cross-group deps)
→ REJECT plan if: any task is vague ("extract everything"), any agent handles >3 modules,
any phase missing for a module that has that concern
→ Verify: every task scoped to ≤50 files — split if larger
→ TaskCreate: "plan-review — validate N×M count, agent assignments, no phase gaps"
## Step E — Plan Validation
/plan-validate
→ User confirms: system-name, scope (full/scoped), module list, phases to extract, split strategy
→ TaskCreate: "plan-validate — user confirms scope and split strategy"
/spec-discovery [mode=init]
→ Full extraction → docs/specs/{app-bucket}/{system-name}/
→ Phases: A (domain model + ERD) → B (rules) → C (API) → D (events) → E (journeys)
→ Phase A.ERD: produces `01-domain-erd.md` — Mermaid erDiagram with ALL entities, relationships,
cardinalities, and aggregate boundaries; foundational artifact referenced by phases B–E
→ Per-task investigation: READ (grep → narrow → read) → TRACE (call chain, validators, triggers)
→ EXTRACT (this phase/module only) → WRITE ([Source: file:line] every claim)
→ VERIFY (mark [UNVERIFIED] without source) → COMPLETE
NEVER accumulate across tasks — write output after each task immediately
→ Phase F: bundle assembly + README completeness index + SPEC-CHANGELOG.md
→ Phase F.5: per-module README.md (17-section summaries)
→ Sub-agents for 4+ modules (BLOCKING: ONE message spawn)
→ Each sub-agent prompt MUST include: module name, task list, output path, tech-agnostic contract,
SYNC protocols (critical-thinking, evidence-based, incremental-persistence, cross-scope boundary)
/feature-docs [mode=init]
→ Per-module 17-section docs — runs OWN Phase A-E extraction from source code independently
→ Note: duplicates some source-reading from spec-discovery — ACCEPTED TRADE-OFF
(see plans/260420-0337-spec-driven-dev-unified/README.md Decision 3: cohesion > coupling —
feature-docs→spec-discovery one-directional; tight coupling would break on schema changes)
→ spec-discovery output in docs/specs/{app-bucket}/{system-name}/ readable as shortcut reference,
but feature-docs does NOT take it as formal input parameter
→ Output: docs/business-features/{Module}/detailed-features/README.{Feature}.md
→ Create CHANGELOG.md per module
→ Sub-agents: one per module for 4+ modules (scale gate applies in workflow context only)
/review-artifact
→ Quality check both layers:
- Engineering spec: source citations, tech-agnostic, completeness, ERD coverage
- Feature docs: 17 sections, YAML frontmatter, max 1500 lines, evidence in Sections 5+6+15
→ PASS criteria for engineering spec: zero [UNVERIFIED] without exclusion reason + zero tech-specific
terms (framework names, language constructs, class names) in any spec file
→ MANDATORY: compare entity count + rule count between docs/specs/{app-bucket}/{system-name}/ and docs/business-features/
for each module. Discrepancy >10% → flag as gap in watzup summary
→ Engineering spec authoritative for structure; feature-docs authoritative for UI/test sections
→ Gap found → fix task → re-investigate → fix → spawn fresh code-reviewer sub-agent (max 2 rounds)
→ NEVER inline re-review — main agent rationalizes its own output
/docs-update
→ Near-final synchronization sweep across project docs, business feature docs, engineering specs, and TDD/spec dashboards
→ MUST run after review-artifact fixes and before /watzup
→ Verify feature docs Section 15, docs/specs dashboards, and module indexes are synchronized
→ Report skipped sub-phases explicitly when no impacted docs exist
/watzup
→ Session summary:
- Spec bundle: N modules, X files, ~Y lines, completeness matrix (module × phase ✅/⚠️/❌)
- Feature docs: N modules, X feature docs
- Coverage gaps (PARTIAL modules — spec layer vs feature-docs layer)
- Open questions (confidence < 80%)
- Next recommended actions: /product-discovery (future backlog from spec), /greenfield-init
(re-implementation plan), /feature-docs (expand individual features)
/workflow-end
Code changed (new feature, bug fix, refactor, new PBI). Sync both layers incrementally.
git diff --name-only HEAD (default)/workflow-review-changes
→ Full code review cycle + docs-update (Phase 2 feature-docs + Phase 2.5 spec-discovery lightweight update)
→ docs-update inside workflow-review-changes does a DIFF-SCOPED lightweight pass:
Phase 2: /feature-docs update (impacted sections only)
Phase 2.5: /spec-discovery update (impacted modules × phases only, if bundle exists)
→ Produces: impact map (modules affected, phases impacted, doc sections to update)
→ NOTE: If docs/specs/ does NOT exist, Phase 2.5 is skipped — proceed to /spec-discovery below to initialize or skip entirely
/spec-discovery [mode=update]
→ INTENTIONAL DEEPER RE-EXTRACTION — not a duplicate of docs-update Phase 2.5
→ docs-update Phase 2.5 = diff-scoped lightweight sync (fast, surface-level)
→ This call = full phase re-extraction per impacted module (reads ALL source, not just diff)
→ Use when: significant structural changes, new aggregates/events, or docs-update Phase 2.5 flagged [UNVERIFIED] gaps
→ SKIP THIS STEP if: docs-update Phase 2.5 completed with zero gaps AND no [UNVERIFIED] items — mark as "Skipped: docs-update Phase 2.5 sufficient"
→ Input: impact map from workflow-review-changes
→ Re-extract only impacted modules × phases
→ Append to SPEC-CHANGELOG.md
→ Update per-module README.md summaries
→ Update last_extracted frontmatter on changed spec files
/feature-docs [mode=update]
→ INTENTIONAL DEEPER RE-EXTRACTION — not a duplicate of docs-update Phase 2
→ docs-update Phase 2 = impacted-sections-only lightweight sync
→ This call = full 17-section pass with 3-pass verification (evidence audit, domain model, cross-reference)
→ Use when: new features added, major section restructure, or docs-update Phase 2 reported gaps
→ SKIP THIS STEP if: docs-update Phase 2 completed with zero gaps — mark as "Skipped: docs-update Phase 2 sufficient"
→ Input: same impact map from workflow-review-changes
→ Run section impact mapping (Phase 1.5)
→ Update only impacted sections in each feature doc
→ Append to module CHANGELOG.md
/tdd-spec [mode=update]
→ **EXPLICIT TC STEP — required in update mode when new functionality added**
→ SKIP THIS STEP if: changes are purely cosmetic (CSS, comments, config only) — mark as "Skipped: no behavioral impact"
→ Input: impact map from workflow-review-changes + feature-docs update output
→ Mode detection: If new commands/queries/endpoints added → `implement-first`; if PBI-driven → `TDD-first`; if TC edits only → `update`
→ Write/update TCs in feature doc Section 15 for all impacted modules
→ TC ID collision prevention: grep existing IDs before assigning new ones
→ Do NOT overwrite Tested TCs — append new TCs and mark stale TCs for review
/tdd-spec-review
→ Review updated/planned TCs for invariant coverage, GIVEN/WHEN/THEN completeness, stale TC handling, and duplicate TC IDs
→ SKIP THIS STEP if /tdd-spec was skipped because there were no TC changes — mark as "Skipped: no TC changes"
→ BLOCK sync if review finds missing invariants, ambiguous expected behavior, or TC/code/spec drift
/tdd-spec [direction=sync] [direction=forward]
→ Forward sync: feature doc Section 15 → docs/specs/ dashboard
→ SKIP THIS STEP if: no TC changes in this update cycle
→ Orphan check: quarantine dashboard TCs with no feature-doc source
/review-changes
→ Holistic review of changes to both layers
→ Verify: spec changes match code changes (no over/under documentation)
→ Verify: feature doc sections updated match impact map
→ Verify: new TCs cover all new functionality (no coverage gaps)
/docs-update
→ Near-final synchronization sweep across project docs, business feature docs, engineering specs, and TDD/spec dashboards
→ MUST run after review-changes fixes and before /watzup
→ Verify feature docs Section 15, docs/specs dashboards, and module indexes are synchronized
→ Report skipped sub-phases explicitly when no impacted docs exist
/watzup
→ Summary: modules updated, spec files changed, feature doc sections changed, TCs added/updated
→ Any new open questions or UNVERIFIED items introduced
/workflow-end
Triggering update from new PBI or requirement (not code change):
After /workflow-review-changes:
→ User provides PBI text or requirement description
→ Map requirement → affected domain entities → affected modules
→ /dor-gate: required when a new/changed PBI is being made implementation-ready; PASS or WARN required before planned specs/TCs become implementation guidance
→ /pbi-mockup: required when the PBI changes user-facing UI or user journeys; skip with reason for backend-only/non-UI requirements
→ Treat as "speculative update": add new sections/rules to spec files marked [PLANNED — not yet implemented]
→ Add corresponding TCs in feature doc Section 15 marked Status: Planned
→ /tdd-spec-review: review planned TCs before syncing them into docs/specs dashboards
→ These become implementation guidance, not verified spec
Periodic health check (quarterly or before major release). Verify both layers current.
Before starting audit, estimate duration to set expectations:
audit_effort = module_count × avg_phases_per_module × 5min_per_phase
example: 8 modules × 4 phases = 160 min ≈ 2.5h (AI-assisted)
Budget multiplier: If last audit was >90 days ago → ×1.5 (more drift expected).
/scout
→ Quick codebase scan: current state of entities, commands, controllers
→ Note: scout in audit mode is lightweight — 30min max, no deep investigation
/spec-discovery [mode=audit]
→ Compare each spec file's last_extracted vs git log of source files
→ Produce: docs/specs/{app-bucket}/{system-name}/SPEC-AUDIT-{date}.md
→ Output: stale modules table with recommended re-extraction scope
/feature-docs [mode=audit]
→ Compare each feature doc's last_updated vs git log of source files
→ Produce: docs/business-features/{Module}/AUDIT-{date}.md per module
→ Output: stale sections table with recommended update scope
/review-artifact
→ Consolidated audit report across both layers
→ Produce: plans/reports/spec-audit-{date}-{system-name}.md
→ Include: total stale coverage %, estimated update effort, priority order
/docs-update
→ Near-final synchronization sweep across project docs, business feature docs, engineering specs, and TDD/spec dashboards
→ MUST run after review-artifact audit conclusions and before /watzup
→ Verify audit outputs, feature docs, specs indexes, and TDD/spec dashboards remain consistent
→ Report skipped sub-phases explicitly when no impacted docs exist
/watzup
→ Present action plan: which modules/features to update first
→ Recommend: run update mode scoped to stale modules
/workflow-end
| Step | Skip When |
|---|---|
| Phase A.ERD in init | Never — ERD mandatory with Phase A |
/feature-docs in init-full | User explicitly requests spec-only output |
/spec-discovery in update | docs/specs/{app-bucket}/{system-name}/ doesn't exist (run init-full instead) |
/dor-gate in update | Update source is code diff only, existing PBI is already DoR-ready, or no PBI readiness decision is being made |
/pbi-mockup in update | Backend-only/non-UI requirement, code diff only, or existing mockup already covers the change |
/tdd-spec-review in update | /tdd-spec skipped because there were no TC changes |
/docs-update near-final sync | Never skip entirely; sub-phases may be skipped only with explicit reason in the docs-update report |
/review-artifact audit pass | No gaps found in both layers AND no UNVERIFIED items |
| Phase C (API) | Internal library with no public endpoints |
| Phase D (Events) | No async messaging or background jobs |
| Phase E (Journeys) | Backend-only scope, no user-facing UI |
/scout + /plan in main context → module registry + N×M task list/review-artifactEach spec-discovery sub-agent receives:
docs/specs/{app-bucket}/{system-name}/{module-id}-{module-name}/Each feature-docs sub-agent receives:
docs/business-features/{Module}/docs-update called as workflow step (not standalone) automatically includes spec-discovery update via Phase 2.5:
docs-update (as workflow step):
Phase 1: Project docs (inline — unchanged)
Phase 2: /feature-docs update mode (existing)
Phase 2.5: /spec-discovery update mode (NEW — if docs/specs/{app-bucket}/{system-name}/ exists AND spec_discovery_update != false in project-config.json)
Phase 3: /tdd-spec (existing)
Phase 3.5: /tdd-spec-review (required when Phase 3 changes TCs)
Phase 4: /tdd-spec [direction=sync] (existing)
Both layers stay in sync on every feature/bugfix/refactor workflow.
/product-discovery| Goal | Use |
|---|---|
| Update one specific feature doc after small change | /feature-docs directly |
| Extract spec for one specific module | /spec-discovery directly |
| Understand one specific feature | /investigation |
| Write integration tests from existing spec | /write-integration-test |
[TASK-PLANNING] Before acting, analyze task scope and systematically break it into small todo tasks and sub-tasks using TaskCreate.
IMPORTANT MANDATORY Steps: /workflow-spec-driven-dev
IMPORTANT MANDATORY Steps: /workflow-spec-driven-dev
[BLOCKING] Invoke
Skilltool for EACH step — NEVER batch-complete, NEVER mark done without skill invocation. [BLOCKING] Confirm mode viaAskUserQuestionBEFORE any action — NEVER skip Step 0. [BLOCKING] Spawn sub-agents for 4+ modules in ONE message — NEVER sequential spawning. [BLOCKING — Context Compaction / Session Resume] At any session start or after context compaction: (1)TaskListFIRST — resume existing, NEVER create duplicates; (2) readdocs/specs/{app-bucket}/{system-name}/README.mdcompleteness table — skip already-extracted modules (✅); (3) NEVER re-run/scoutor/planin a resumed session. [BLOCKING] Readdocs/project-reference/spec-principles.mdbefore running any extraction/update/audit step — it is the shared spec quality baseline for both engineering spec and feature-doc layers.
Nested Task Expansion Contract — For workflow-step invocation, the
[Workflow] ...row is only a parent container; the child skill still creates visible phase tasks.
- Call
TaskListfirst. If a matching active parent workflow row exists, setnested=trueand recordparentTaskId; otherwise run standalone.- Create one task per declared phase before phase work. When nested, prefix subjects
[N.M] $skill-name — phase.- When nested, link the parent with
TaskUpdate(parentTaskId, addBlockedBy: [childIds]).- Orchestrators must pre-expand a child skill's phase list and link the workflow row before invoking that child skill or sub-agent.
- Mark exactly one child
in_progressbefore work andcompletedimmediately after evidence is written.- Complete the parent only after all child tasks are completed or explicitly cancelled with reason.
Blocked until:
TaskListdone, child phases created, parent linked when nested, first child markedin_progress.
Critical Thinking Mindset — Apply critical thinking, sequential thinking. Every claim needs traced proof, confidence >80% to act. Anti-hallucination: Never present guess as fact — cite sources for every claim, admit uncertainty freely, self-check output for errors, cross-reference independently, stay skeptical of own confidence — certainty without evidence root of all hallucination.
AI Mistake Prevention — Failure modes to avoid on every task:
Check downstream references before deleting. Deleting components causes documentation and code staleness cascades. Map all referencing files before removal. Verify AI-generated content against actual code. AI hallucinates APIs, class names, and method signatures. Always grep to confirm existence before documenting or referencing. Trace full dependency chain after edits. Changing a definition misses downstream variables and consumers derived from it. Always trace the full chain. Trace ALL code paths when verifying correctness. Confirming code exists is not confirming it executes. Always trace early exits, error branches, and conditional skips — not just happy path. When debugging, ask "whose responsibility?" before fixing. Trace whether bug is in caller (wrong data) or callee (wrong handling). Fix at responsible layer — never patch symptom site. Assume existing values are intentional — ask WHY before changing. Before changing any constant, limit, flag, or pattern: read comments, check git blame, examine surrounding code. Verify ALL affected outputs, not just the first. Changes touching multiple stacks require verifying EVERY output. One green check is not all green checks. Holistic-first debugging — resist nearest-attention trap. When investigating any failure, list EVERY precondition first (config, env vars, DB names, endpoints, DI registrations, data preconditions), then verify each against evidence before forming any code-layer hypothesis. Surgical changes — apply the diff test. Bug fix: every changed line must trace directly to the bug. Don't restyle or improve adjacent code. Enhancement task: implement improvements AND announce them explicitly. Surface ambiguity before coding — don't pick silently. If request has multiple interpretations, present each with effort estimate and ask. Never assume all-records, file-based, or more complex path.
Incremental Result Persistence — MANDATORY for all sub-agents or heavy inline steps processing >3 files.
- Before starting: Create report file
plans/reports/{skill}-{date}-{slug}.md- After each file/section reviewed: Append findings to report immediately — never hold in memory
- Return to main agent: Summary only (per SYNC:subagent-return-contract) with
Full report:path- Main agent: Reads report file only when resolving specific blockers
Why: Context cutoff mid-execution loses ALL in-memory findings. Each disk write survives compaction. Partial results are better than no results.
Report naming:
plans/reports/{skill-name}-{YYMMDD}-{HHmm}-{slug}.md
Sub-Agent Return Contract — Sub-agent MUST return ONLY this structure. Main agent reads only summary — NEVER requests full sub-agent output inline.
## Sub-Agent Result: [skill-name] Status: ✅ PASS | ⚠️ PARTIAL | ❌ FAIL Confidence: [0-100]% ### Findings (Critical/High only — max 10 bullets) - [severity] [file:line] [finding] ### Actions Taken - [file changed] [what changed] ### Blockers (if any) - [blocker description] Full report: plans/reports/[skill-name]-[date]-[slug].mdMain agent reads
Full reportONLY when: (a) resolving specific blocker, or (b) building fix plan. Sub-agent writes full report incrementally (per SYNC:incremental-persistence) — never held in memory.
[N.M] $skill-name — phase prefixes and one-in_progress discipline.IMPORTANT MUST ATTENTION follow declared step order for this skill; NEVER skip, reorder, or merge steps without explicit user approval
IMPORTANT MUST ATTENTION for every step/sub-skill call: set in_progress before execution, set completed after execution
IMPORTANT MUST ATTENTION every skipped step MUST include explicit reason; every completed step MUST include concise evidence
IMPORTANT MUST ATTENTION if Task tools unavailable, maintain an equivalent step-by-step plan tracker with synchronized statuses
AskUserQuestion BEFORE any action — NEVER skip Step 0Skill tool for EACH step — NEVER batch-complete or mark done without invocation01-domain-erd.md Mermaid erDiagram with ALL entities, relationships, cardinalities, and aggregate boundaries; foundational for phases B–E[Source: file:line] — mark [UNVERIFIED] not blank; all output tech-agnostic (no framework names, no language constructs)TaskList first, read completeness tracker, skip ✅ modules — NEVER re-run scout or plan[UNVERIFIED] without exclusion reason + zero tech terms; gap found → fresh code-reviewer sub-agent (max 2 rounds) — NEVER inline re-review[IMPORTANT] Analyze how big the task is and break it into many small todo tasks systematically before starting — this is very important.