| name | outcome-shaped-specs |
| description | Use when starting a new feature in a project that uses agentic-loop development. Every feature gets a spec file with Outcome, Acceptance tests (checkboxes), Decisions (with Why + How-to-apply), and Known Limitations sections. Agent refuses to start work on a feature whose spec doesn't have acceptance tests. Prevents hallucinated requirements and gives the loop a measurable definition of "done." |
| metadata | {"category":"agentic-discipline"} |
Outcome-shaped specs for agentic development
If you're running an agentic loop — Ralph, Claude Code, similar — the single biggest correctness-multiplier is requiring every feature to start from a spec the agent can verify against. Not a TODO list, not a paragraph in a chat — a structured document with checkboxes the agent can mechanically tick off.
The format below has shipped many features without rework. The agent is instructed (in CLAUDE.md or equivalent) to refuse to start a feature whose spec lacks the required sections.
When to use
Reach for this skill if you're:
- Setting up a new agentic-dev project and need a starting convention
- Writing the rule that gates "what gives the agent permission to start coding"
- Tired of features shipping with the wrong scope because the request was ambiguous
- Looking for a spec template that the agent and the human BOTH can parse
The pattern — specs/NNN-feature-slug.md
# Spec NNN — <Feature Name>
## Outcome
<2-3 sentences describing what the user can do after this ships. Operator-facing. No "implementation will use X" — that goes in Decisions.>
## Acceptance tests
- [ ] <Concrete user-observable behavior 1>
- [ ] <Concrete user-observable behavior 2>
- [ ] <Performance / quality bar if applicable>
- [ ] <Test coverage requirement, e.g. "unit tests for the bucket math; e2e for the toggle flow">
## Out of scope
- <Things adjacent operators might assume; explicitly NOT in this feature>
## Decisions
### D1. <Decision name>
- **Option A** — <description>. <tradeoffs>.
- **Option B** — <description>. <tradeoffs>.
**Recommendation**: <which + one-line why>. <How decision shapes implementation>.
### D2. <Next decision>
...
## Known limitations
- <Thing v1 explicitly does NOT do, and why deferring is OK>
## Memory pointers
- <File paths the implementer will need; populated when the milestone activates>
Why each section earns its place
## Outcome: forces the author to articulate the user-facing benefit. If the agent or human can't write this in 2-3 sentences, the feature isn't scoped yet.
## Acceptance tests (checkboxes): the mechanical "done" signal. The agent ticks boxes as it implements; the human reviews by reading the boxes. Each test must be observable from outside — no "the code is refactored" or "the function returns X" (those are implementation; they go in code review).
## Out of scope: explicit anti-features. Prevents scope creep where the agent helpfully adds an adjacent capability the operator didn't ask for and now has to maintain.
## Decisions: each non-trivial design choice gets named, alternatives listed, recommendation made. The Why + How-to-apply format means future you (or a future agent) doesn't have to re-derive why we chose B over A.
## Known limitations: explicitly punts v2 work to a follow-up. Without this section, agents will often try to over-deliver and the spec becomes a moving target.
Numbering convention
Specs are numbered sequentially (001-foundation.md, 002-data-ingestion.md, …). The numbers preserve the historical order so you can read the project as a narrative. Don't renumber when reorganizing — that breaks references in git history.
The hard rule that gates the agent
In your CLAUDE.md (or equivalent rules file the agent loads):
## Hard rules
1. **No work without a spec.** If an Issue has no `## Outcome` and
`## Acceptance tests` sections, refuse to start. Open a sub-issue
requesting clarification instead.
This rule makes "the agent has permission to act" be a function of "the spec is well-formed." It transforms the agent from a chatbot that does what you ask into a teammate that pushes back when you ask wrong.
Gotchas
- Acceptance tests that aren't observable. "The resampler aggregates correctly" is not testable; "the API returns 12 bars when given 24 1h bars over a 48h window with granularity=4h" is. Push the author until the tests are concrete.
- Decisions that say "we'll figure this out later." That's not a decision; it's "I haven't decided." Either commit (with a Why) or move it to a sub-issue.
- Bloat. Once specs grow past ~3 pages they stop being read. Cap at the essentials; punt details to memory/learnings notes.
- Skipping the spec for "small" features. The temptation to ship a one-liner without a spec is high. Resist for anything user-visible — even one-liners often have non-obvious behavior ("does this toggle persist? per-browser or per-account? what's the URL form?") that surfaces ONLY when you write the spec.
- Updating the spec after the fact. If implementation diverges from the spec, update the SPEC (not just the code) so the spec stays the source of truth.
Reference implementation
In the project where this discipline was hammered out, the layout was:
- A
specs/ top-level directory with numbered files
- A
specs/000-charter.md at index 0 with the project-wide outcome and constraints
- Subsequent specs
001-… through 010-… each ~1-3 pages following the template
- A
CLAUDE.md hard rule at the project root referencing the template
The structural piece worth borrowing is the sequential numbering + the hard rule that refers to "## Outcome and ## Acceptance tests" by name, so the agent has unambiguous criteria for "this spec is well-formed enough to start."
Related skills
browser-verify-ui-changes — the acceptance tests for any UI feature must include "verified via chrome-devtools MCP"
read-reference-source-first — when the feature ports behavior from another tool, the Decisions section names which tool and links to the source