| name | spec |
| description | Turn a feature idea or rough requirement into a complete, dependency-ordered spec set under ./specs — PRD, component designs, glossary, security/perf/test cross-cuts, key-decisions log, stakeholder roadmap, and engineer-facing implementation plan — cross-referenced with prior-art memos in ./docs/research and vendored code in ./vendors. Use whenever the user says "write the spec", "design this", "let's plan X", "produce a PRD / impl plan / roadmap", "restructure the specs", "review and re-organise the design", "think ultra hard and split this into phases", or describes a non-trivial system that needs a written design before code. Trigger even when the user does not say "spec" if they ask for phased delivery, milestone exit criteria, or a build-order graph. |
Spec
Turn requirements into a load-bearing spec set: numbered, cross-linked, dependency-ordered, with a stakeholder-facing roadmap and an engineer-facing implementation plan that pair 1:1. The spec set is the contract between intent and code; if it is wrong or incoherent, every downstream phase pays for it.
When this fires
- "write the spec / PRD / design / impl plan / roadmap for X"
- "restructure the specs so it can be built incrementally"
- "review the design and add the missing specs"
- "phase this delivery — what lands when?"
- The user describes a system non-trivial enough that ad-hoc coding will produce drift, missing invariants, or unbounded scope
- The research skill has produced memos and the next move is to commit decisions to a spec
Output shape
A directory of numbered Markdown files under ./specs/. The numbering is the build order — reading top-to-bottom matches the milestone progression in the roadmap. Update ./specs/index.md so a fresh reader can navigate.
Diagram expectation
Use fenced ASCII-style diagrams (```text) whenever they materially improve understanding of high-level architecture, data flow, processing flow, component/subsystem relationships, lifecycle/state transitions, or build/dependency order. Diagrams are part of the spec contract: keep labels precise, show directionality, and put the diagram near the decision it clarifies. Prefer terminal-safe text diagrams over Mermaid so the spec remains readable in terminals, code review, and plain Markdown renderers. Unicode box-drawing characters (┌─┐│└┘, ▼, ▲) are encouraged when they make the structure clearer. Do not add decorative diagrams that merely restate simple prose.
For non-trivial systems, the diagram must be structurally rich enough to review: use nested boxes, lanes, or grouped boundaries rather than a bare A -> B -> C chain. Show component ownership, trust/runtime boundaries, queues/channels, storage, external services, error paths, and fan-in/fan-out where they matter. A simple arrow chain is acceptable only for a short linear ordering with no meaningful boundary or alternative path.
For ordered protocols, login flows, request lifecycles, retries, shutdown, or multi-party handshakes, use a sequence-style ASCII diagram with vertical lifelines and numbered steps. Include the actor names as columns, preserve time from top to bottom, and label durable state changes, redirects, validation, token minting, persistence, and failure branches that affect correctness.
Example shape:
┌────────────────────────────────────┐
│ Public crate API / CLI │
│ │
│ ┌──────────────────────────────┐ │
│ │ Validation boundary │ │
│ │ - length / range caps │ │
│ │ - typed domain newtypes │ │
│ │ - serde deny_unknown_fields │ │
│ └──────────────┬───────────────┘ │
│ │ │
│ ┌──────────────▼───────────────┐ │
│ │ Domain command / envelope │ │
│ │ - invariants encoded │ │
│ │ - thiserror failures │ │
│ └──────────────┬───────────────┘ │
└─────────────────┼──────────────────┘
│ bounded mpsc
┌────────────────────────────▼───────────┐ ┌────────────────────┐
│ Actor / state owner │ │ Observability │
│ - owns mutable state │──▶│ - tracing spans │
│ - restart / shutdown policy │ │ - redacted fields │
└──────────────┬─────────────────────────┘ └────────────────────┘
│
┌──────────────▼──────────────┐
│ Storage / external system │
│ - timeout / retry budget │
│ - classified data boundary │
└─────────────────────────────┘
Sequence-flow shape:
CLI / Client Service Actor External Provider
│ │ │
│ 1. Build validated request │ │
│ and correlation id │ │
│ │ │
│ 2. Submit command ────────────▶│ │
│ │ 3. Persist pending state │
│ │ with TTL / idempotency key │
│ │ │
│ │ 4. Call provider ─────────────▶│
│ │ timeout / retry budget │
│ │ │
│ │ 5. Provider response ◀──────── │
│ │ │
│ │ 6. Validate + classify result │
│ │ update durable state │
│ │ │
│ 7. Return typed outcome ◀──────│ │
│ or domain error │ │
│ │ │
│ 8. Emit structured span │ │
│ with redacted fields │ │
Right-size the spec set to the problem
The number of files is a function of system complexity, not a template to fill. Do not generate every slot below just because the layout shows it. A small library might ship as 00-prd.md + 10-design.md + 90-roadmap.md and nothing else; a large multi-crate platform might need every slot plus a few more. Pick the smallest set that captures the load-bearing decisions for this system.
Heuristics for sizing:
- Tiny (1–3 files) — a single-purpose crate, one or two integration points, no novel invariants. Often: PRD + one design doc + a short roadmap. Skip the cross-cuts; their content fits in the design doc.
- Medium (5–10 files) — multiple components with non-trivial contracts between them, more than one integration point, real performance or security constraints. PRD + per-component designs + roadmap + impl-plan + glossary if any term is overloaded. Cross-cuts only when their content does not fit cleanly in the component designs.
- Large (15+ files) — a platform / SDK with many components, multiple consumers, long-lived contracts, freeze windows, public RFCs. The full canonical layout below earns its keep.
A good test before adding a file: what specific question does this file answer that nothing else answers? If you cannot name one, fold it into the nearest related spec.
Canonical layout (illustrative — pick what applies)
The structure below is the example layout the obs project ended up with after the system grew (an observability SDK with ~10 components, hard perf/security budgets, and a public RFC freeze). Treat it as a menu of named slots so files can grow in cleanly later — not a checklist. Omit any slot that does not earn its keep for the current scope; keep the numbering scheme so future additions slot in without renumbering.
specs/
├── index.md — table of every spec + reading order + build-order graph
├── 00-prd.md — product requirements (vision, users, goals, non-goals, success metrics)
├── 10-data-model.md — wire shapes, envelope / message / record types, naming conventions
├── 11-runtime-core.md — engine: lifecycle, traits, threading, panic policy
├── 12-<component>.md — additional foundation designs, in dependency order
├── 13-<component>.md
├── 20-<integration>.md — outward-facing integrations (transports, sinks, exporters)
├── 30-<bridge>.md — interop with neighbouring ecosystems
├── 40-<middleware>.md — middleware / framework adapters
├── 50-cli.md — CLI surface, if any
├── 60-dev-ergonomics.md — what using the SDK feels like; concrete examples; quickstart
├── 61-crates-and-features.md — workspace layout, dependency graph, feature flags
├── 70-security.md — threat model, classification, redaction, secrets
├── 71-performance-budgets.md — P50/P99 targets, bench harness, CI gates
├── 72-testing-strategy.md — test pyramid, fixtures, integration mocks
├── 80-glossary.md — disambiguate overloaded terms
├── 90-roadmap.md — STAKEHOLDER-facing: milestones M0…Mn, exit criteria, calendar shape
├── 91-impl-plan.md — ENGINEER-facing: dependency-ordered phases, effort estimates
├── 92-rfc-<v>.md — public-comment summary at freeze (when applicable)
├── 93-improvements-review.md — (example) deferred-findings backlog spec; impl skill appends to it
└── 99-key-decisions.md — D1…Dn, the *why* behind each load-bearing choice
Even on a large system, several of these are conditional: 50-cli.md only if there is a CLI; 30-… / 40-… only if there are real interop/middleware surfaces; 92-rfc-* only at a public freeze; the deferred-findings backlog (named 93-improvements-review.md in the obs project, but pick whatever name the project prefers) is created on demand by the impl skill, not preemptively.
Two rules about the roadmap / impl-plan split — they are different documents on purpose:
90-roadmap.md is organised by user-visible feature. M0 = "hello world emit", M1 = "schema-first authoring". Stakeholders read this to plan calendars.
91-impl-plan.md is organised by dependency order. Phase 1 = the spine that nothing else can be built without. Engineers read this to know what to write next, and why.
The two pair 1:1 against milestones but the order and grouping differ. Earlier drafts conflate them; do not.
Workflow
-
Capture intent — restate the user's requirement in 1–2 paragraphs: the problem and the vision. Mirror it back before generating files. If the requirement is fuzzy ("we want better observability"), force it concrete (users, top job-to-be-done, the one metric of success). A PRD with no measurable success looks fine and ages badly.
-
Read what exists — before adding files:
./specs/index.md (if present) — what's already designed, what naming is used.
./docs/research/ — every memo. Cite their decisions; do not re-litigate.
./vendors/ — for prior art the design should align with or deliberately diverge from. Reference vendors/<repo>/path:LINE directly in the spec.
AGENTS.md (project + user-global) — engineering norms the spec must encode into design decisions, not just respect in spirit. The spec is allowed to set tighter rules; it must not silently relax AGENTS.md. See the next subsection for the binding Rust checklist.
2a. Bind Rust engineering norms (anchor: AGENTS.md)
The spec commits the project to specific Rust patterns up front; the impl skill will then match those patterns line-for-line. Before drafting any component design, read project AGENTS.md and ~/.codex/AGENTS.md and encode their norms into the spec text — not as a footnote, not "TBD per coding standards", but as concrete shapes (error types, async surfaces, validation points, lint sets, doc requirements) that a reviewer can mechanically check.
Do not restate AGENTS.md in the spec; reference it ("Errors: per AGENTS.md § Error Handling — thiserror enum with #[source]"). If a component genuinely needs to deviate, the spec must say "deviates from AGENTS.md § X because …" so reviewers can challenge it. If AGENTS.md is silent on a question the spec must answer, the spec sets the rule and 99-key-decisions.md records why.
A component design (11-…, 12-…, …) is not done until each AGENTS.md section relevant to it (Error Handling, Async & Concurrency, Type Design & API, Safety & Security, Serialization, Testing, Logging & Observability, Performance, Documentation) is either pinned by reference or marked "N/A — ".
-
Run the research skill if prior art is missing — if the design hinges on an assumption that has not been validated (a crate works under release+LTO; an API actually composes; a perf budget is achievable), invoke the research skill first. Do not bake unvalidated assumptions into a spec.
-
Think ultra hard about phasing — before writing a single design doc, sketch:
- What is the smallest end-to-end slice a user can run? That is M0.
- What does each subsequent milestone unlock? Name it from the user's POV.
- What blocks what? Draw the build-order graph.
- Where architecture, data flow, processing flow, or subsystem relationships are load-bearing, draw a boxed ASCII diagram before writing the prose.
Two principles that separate good phasing from plausible phasing:
- Land contracts before consumers. If every sink consumes
&dyn EventSchema, the schema registry lands in the foundation, not alongside the first sink. Otherwise the contract is provisional and gets retrofitted.
- Pay design costs once, in the foundation. Multi-tenant observer resolution, security classification, error envelopes, identity / context propagation — adding any of these later is a refactor of every call site. Settle them in the spine even if M0 only uses the trivial case.
-
Write the PRD first (00-prd.md). Vision, users, goals (with measurable criteria), non-goals (explicit — non-goals prevent scope creep more than goals do), success metrics, naming conventions that will bind the rest of the spec set.
-
Write the data model (10-data-model.md). The wire shape every downstream component sees. Naming, types, invariants, envelope vs payload distinction. Lock this early — drift here cascades.
-
Write component designs in build order (11-…, 12-…, 20-…, etc.). Each spec ends with a "Cross-references" section pointing to the specs it depends on and the specs that depend on it. Use [NN-name.md § X](./NN-name.md#x) link form so jumps work in any markdown viewer.
-
Write the cross-cuts (60, 61, 70, 71, 72). These are read alongside the build-order specs, not in sequence. Make them small and concrete; do not let them become philosophy essays.
-
Write the glossary (80-glossary.md). Every overloaded term — span vs scope, sink vs layer, envelope vs event — gets a one-paragraph disambiguation. Cheap, prevents weeks of arguing.
-
Write the roadmap and impl-plan together (90, 91):
- Roadmap has milestones, exit criteria, and a calendar estimate. Calibrate honestly: if the spec's earlier estimate was off by 2×, say so and adjust.
- Impl-plan has Phase 0 (risk retirement), Phase 1 (spine), … Phase N (hardening). Each phase has a numbered task table with spec citations and effort estimates. Each phase has explicit exit criteria — a test or invariant that must hold before the next phase starts.
-
Write 99-key-decisions.md as the spec set stabilises. Each entry: D-id, decision, alternatives considered, the why, and a reverse pointer to the spec sections that depend on it. When a future reviewer asks "why this?" — point them here, not to a chat scrollback.
-
Update index.md — table of every spec with type + purpose, plus a reading-order list and the build-order graph. The index is the entry point; spend real effort on it.
PRD template
# PRD — <product name>
Status: <draft v1> · Owner: <team> · Last updated: <YYYY-MM-DD>
## 1. Problem
What is broken today, with concrete evidence (incidents, costs, missing capability). Avoid abstractions; name the failure mode users actually hit.
## 2. Vision
What "good" looks like, in one paragraph plus one concrete code / UX example. The example is load-bearing — it pins the ergonomic contract.
## 3. Goals
| # | Goal | Measure |
| -- | ---- | ------- |
| G1 | … | … |
Each goal must have a *measurable* success criterion. "Better DX" is not a goal; "≤ 60 s from cargo install to first event on stdout" is.
## 4. Non-goals
Explicit list. Non-goals prevent scope creep more than goals do.
## 5. Users
Primary, secondary, anti-personas. What each persona is doing when they reach for the product.
## 6. Success metrics
What we will measure post-launch to know we shipped the right thing.
## 7. Naming conventions (binding)
Public namespaces, prefixes, file layouts that the rest of the spec set must honour. Lock early; renames are expensive.
Component design template
# <NN>-<name>: <subsystem>
Status: <draft|stable> · Owner: <team> · Depends on: <list of NN specs>
## 1. Purpose
One paragraph: what this subsystem owns, what it does not own, why it exists separately.
## 2. Interface
The public types / traits / functions / wire shapes. Code-shaped where possible.
## 2a. Architecture / flow diagrams
Add concise boxed ASCII-style diagrams for any non-trivial component boundary, data flow, processing pipeline, state transition, or dependency relationship this subsystem owns. Prefer diagrams with enough structure to expose reviewable decisions: nested boxes for components and ownership boundaries, labeled arrows for message/data movement, and side branches for failures, retries, shutdown, or backpressure. Use sequence-style lifelines for ordered request/protocol flows where the exact step order is the contract.
## 3. Invariants
The properties that must hold at every observable point. Each invariant has a test or lint that pins it.
## 4. Behaviour
The non-trivial cases — error paths, edge cases, concurrent / async behaviour, drop order, panic policy, cancellation.
## 5. Cross-references
- ← Depends on: <links>
- → Consumed by: <links>
- ↔ Related research: <links to ./docs/research/...>
Roadmap template (90-roadmap.md)
(Templates below use 4-backtick outer fences so the inner triple-backtick blocks render correctly when copied; CommonMark allows fences to nest as long as the outer uses more backticks than any inner block.)
# Roadmap — Incremental Delivery
## 0. Principles
- **Always shippable.** Every milestone leaves the workspace green on the standard quality gates.
- **Type-safety / contract-safety first.** Each milestone may defer features but never relaxes guarantees.
- **Honest calibration.** Estimates are realistic; pad explicitly for review/on-call/meeting overhead.
## 1. Build-order graph
```text
┌──────────┐ ┌────────────────┐ ┌────────────────────┐
│ 00 PRD │───▶│ 10 Data Model │───▶│ 11 Runtime Core │
│ goals │ │ invariants │ │ lifecycle / traits │
└──────────┘ └───────┬────────┘ └─────────┬──────────┘
│ │
▼ ▼
┌────────────────┐ ┌────────────────────┐
│ 12 Foundation │───▶│ 20 Integration │
│ contracts │ │ transport / sink │
└───────┬────────┘ └─────────┬──────────┘
│ │
▼ ▼
┌────────────────┐ ┌────────────────────┐
│ 60/61 DX │ │ 70/71/72 Gates │
│ crates/features│ │ security/perf/test │
└────────────────┘ └────────────────────┘
```
## 2. Milestones
### M0 — <user-visible feature>
**Specs touched**: 00, 10, 11, 12.
**Exit criteria**: a fresh user can <do thing> in <time>; <invariant> holds; <test> passes.
### M1 — …
…
Impl-plan template (91-impl-plan.md)
# Implementation Plan — Dependency-Ordered Build
## 0. Readiness assessment
What is ready, what isn't, and what blocks Phase 1 today. Be honest; missing specs and unvalidated assumptions go here.
## 1. Why dependency order ≠ feature order
Two or three concrete examples where the dependency-correct order differs from the user-feature order, with the *why*. This justifies the rest of the document.
## 2. Estimated total effort
Calendar weeks for one developer, with assumptions. Note where parallelism collapses the schedule.
## 3. Phase 0 — risk retirement
| # | Deliverable | Lands in | Effort |
| -- | ----------- | -------- | ------ |
Each spike memo from `./docs/research/` listed; missing specs called out; no production code yet.
**Exit gate**: every spike memo committed; specs updated to reflect findings.
## 4. Phase 1 — foundation (weeks N–M)
The spine in strict dependency order. Each row blocks everything underneath.
| # | Task | Spec | Effort |
| --- | ---- | ---- | ------ |
| 1.1 | … | … | … |
**Exit criteria**: <test> passes; <bench> is within budget; <invariant> verified.
## 5+ Phase 2 … Phase N
Same shape per phase. Cross-reference roadmap milestones explicitly: "Phase 3 closes M2 and starts M3."
## N. What makes this order *correct*, not just plausible
Two or three principles that drove the ordering. State them so a reviewer can challenge the order on its own terms instead of arguing tasks line by line.
Key-decisions template (99-key-decisions.md)
# Key Decisions
Each decision is permanent; supersede with a new D-id rather than editing in place.
## D1 — <one-line decision>
- **Context**: where this applies
- **Alternatives considered**: A, B, C — with the trade-offs that ruled them out
- **Decision**: the chosen path, in one sentence
- **Why**: load-bearing reasoning
- **Pinned by**: <links to spec sections that depend on this>
- **Date**: <YYYY-MM-DD>
Quality bar
A spec set is done when:
- A new engineer can read
index.md → 00-prd.md → the build-order specs in order, and write code without asking "what did the author mean?"
- Every cross-reference resolves; every "Sink does X" has exactly one authoritative location and others link to it.
- Every load-bearing decision has an entry in
99-key-decisions.md.
- The roadmap exit criteria are observable (a test, a bench, a user-visible behaviour) — not "Phase X is done when we finish Phase X".
- Estimates have been calibrated against
./docs/research/ findings, not napkin guessed.
Anti-patterns
- PRD that conflates problem and solution — the problem section reads like an API spec. Force the problem to stand alone; if you can't write it without naming the solution, the requirement isn't understood yet.
- Roadmap = impl-plan with different headings. They are different documents. If 90 and 91 have the same row order, one of them is wrong.
- Specs that ban features ("we will never do X") without saying why. Future you will violate the ban for good reasons; preserve the why now.
- Glossary that defines obvious terms. Define only the overloaded ones — the words that two readers used differently in the same conversation.
99-key-decisions.md as a changelog. It is not. It records load-bearing design choices, with alternatives and rationale, for future reviewers.
Hand-off
When the spec set lands:
- Commit with a single message that names the milestone shape: "specs: PRD + foundation designs + roadmap (M0–M2)".
- Tell the user the entry points:
index.md, the PRD, and the impl-plan's Phase 0.
- If Phase 0 has open spikes, hand off to the research skill before any code is written.
- If Phase 0 is closed, the impl skill takes over with
91-impl-plan.md Phase 1.