with one click
ralph-spec
// Write Ralph specification documents - structured feature specs with clear requirements, acceptance criteria, and implementation guidance for autonomous task execution
// Write Ralph specification documents - structured feature specs with clear requirements, acceptance criteria, and implementation guidance for autonomous task execution
[HINT] Download the complete skill directory including SKILL.md and all related files
| name | ralph-spec |
| description | Write Ralph specification documents - structured feature specs with clear requirements, acceptance criteria, and implementation guidance for autonomous task execution |
| license | MIT |
| compatibility | opencode |
| metadata | {"category":"planning","system":"ralph"} |
Use this skill when creating or improving specification documents for Ralph, the autonomous task execution system.
Understanding the pipeline is essential to writing good specs:
SPEC (you write)
|
v
PLAN stage (LLM reads spec, does gap analysis against codebase)
|
v
TASKS (LLM generates: name, notes, accept, deps, priority)
|
v
BUILD stage (separate LLM implements one task per iteration)
|
v
VERIFY stage (checks task's acceptance criteria)
The spec is a requirements document. It defines WHAT must be true when done and HOW to verify it. The planner decides task boundaries, implementation order, and per-task acceptance criteria. The builder writes the code.
A spec's job is to give the planner enough structure and clarity that it naturally produces well-scoped tasks. You do NOT need to prescribe task boundaries in the spec — but you DO need to write requirements at a granularity that makes good decomposition obvious.
| Constraint | Limit | Why |
|---|---|---|
| Spec length | ≤ 300 lines | The planner must read the spec AND explore the codebase in one context window. Long specs crowd out codebase research. |
| Files touched | ≤ 10 files per spec | More than this means the spec covers multiple concerns. Split it. |
| New concepts | ≤ 3 per spec | Each new type, API, protocol, or abstraction adds cognitive load. |
| Requirements subsections | ≤ 8 H3 sections | Each subsection is a natural task boundary for the planner. More than 8 and the planner starts merging them. |
If your spec exceeds any of these, split it into multiple specs.
Place specs in: ralph/specs/<spec-name>.md
Use kebab-case for filenames (e.g., user-authentication.md, api-rate-limiting.md).
Every Ralph spec MUST have these sections in order:
# Feature Name
Short, descriptive name. This becomes the spec identifier.
## Overview
One paragraph explaining WHAT this feature does and WHY it exists.
Focus on the problem being solved, not implementation details.
## Requirements
### Subsection Name
Detailed requirements organized by topic. Use:
- Bullet points for lists of requirements
- Code blocks for interfaces, signatures, schemas
- Tables for structured data (field definitions, command references)
Granularity rule: Each H3 subsection under Requirements should describe ONE cohesive deliverable. The planner will naturally create one task (or a small cluster) per subsection. If a subsection requires changes to more than 3 files, it's too big — split it.
## Acceptance Criteria
- [ ] Criterion 1: Specific, testable requirement
- [ ] Criterion 2: Another testable requirement
- [ ] Criterion 3: Edge case handling
CRITICAL: The VERIFY stage checks these. Each criterion must be independently verifiable — completing it should not require other criteria to be done first (unless there's an explicit dependency chain).
Add these when relevant:
## Architecture
Use ASCII diagrams for flows. Keep brief — architecture communicates
relationships, not implementation.
## Dependencies
- Requires `auth-core.md` to be implemented first
- Assumes `libfoo >= 2.0` is available
## Non-Requirements
- This spec does NOT change X
- Y is out of scope (see `other-spec.md`)
## Error Handling
| Error Condition | Response |
|-----------------|----------|
| Invalid input | Return error code X |
| Resource not found | Log warning, continue |
pytest tests/unit/test_foo.py"For each H3 subsection in Requirements, ask:
### System Config Struct
Define `valk_system_config_t` in `src/gc.h`:
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `gc_heap_size` | `u64` | `0` | Initial heap size (0 = auto) |
Provide `valk_system_config_default()` inline function returning defaults.
This gives the planner exactly what it needs: a type to create, where it goes, its fields and semantics. The planner generates a task like "Add valk_system_config_t to src/gc.h" with clear acceptance.
### System API Implementation
Here are the complete function bodies for src/gc.c:
\```c
valk_system_t *valk_system_create(valk_system_config_t *config) {
valk_system_t *sys = calloc(1, sizeof(valk_system_t));
sys->heap = valk_gc_heap_create(config->gc_heap_size);
// ... 30 more lines ...
}
void valk_system_shutdown(valk_system_t *sys, u64 deadline_ms) {
// ... 25 more lines ...
}
void valk_system_destroy(valk_system_t *sys) {
// ... 15 more lines ...
}
// ... 5 more functions ...
\```
This is ~100 lines of code that the planner sees as ONE blob. It creates one task: "Implement System API." That task is too large for one BUILD iteration. Instead, specify each function's signature and contract separately:
### System Lifecycle
| Function | Signature | Purpose |
|----------|-----------|---------|
| `valk_system_create` | `valk_system_t *valk_system_create(valk_system_config_t *config)` | Allocate, create heap, init handle table, register calling thread |
| `valk_system_destroy` | `void valk_system_destroy(valk_system_t *sys)` | Free heap, handle table, barrier, mark queues, the struct |
| `valk_system_shutdown` | `void valk_system_shutdown(valk_system_t *sys, u64 deadline_ms)` | Stop subsystems, wait for threads, wait+destroy subsystems |
`valk_system_create` must:
- Allocate with `calloc`
- Set `valk_sys` global pointer
- Call `valk_system_register_thread` for the calling thread
- Return the allocated system (never NULL on success)
`valk_system_shutdown` must:
- Set `shutting_down` flag atomically
- Call `stop()` on each subsystem under lock
- Spin-wait for `threads_registered` to reach 1 (with deadline)
- Call `wait()` then `destroy()` on each subsystem under lock
Now the planner sees three distinct functions with clear contracts and can decide whether to make them one task or three.
When a feature is too large for one spec, split it into multiple specs with explicit cross-references.
ralph/specs/
system-refactor-00-heap-rename.md # Mechanical rename prerequisite
system-refactor-01-types.md # New type definitions + shims
system-refactor-02-lifecycle.md # System create/destroy/shutdown
system-refactor-03-stw-protocol.md # Barrier-based STW replacement
system-refactor-04-aio-integration.md # AIO as subsystem
Each spec declares its dependencies explicitly:
## Dependencies
- Requires `system-refactor-01-types.md` to be completed (types must exist)
- Assumes heap types have been renamed per `system-refactor-00-heap-rename.md`
Ralph processes specs one at a time. Dependency ordering is the user's responsibility when invoking ralph construct.
Large renames, search-and-replace operations, and other mechanical changes deserve their own spec. These specs are unique:
# Heap Type Rename
## Overview
Remove vestigial "2" suffixes from heap types and functions left over from
the malloc→page heap migration. Purely mechanical — no logic changes.
## Requirements
### Type Renames
| Old | New |
|-----|-----|
| `valk_gc_heap2_t` | `valk_gc_heap_t` |
| `valk_gc_page2_t` | `valk_gc_page_t` |
| `valk_gc_tlab2_t` | `valk_gc_tlab_t` |
### Function Renames
| Old | New |
|-----|-----|
| `valk_gc_heap2_create` | `valk_gc_heap_create` |
| `valk_gc_heap2_destroy` | `valk_gc_heap_destroy` |
### Files Affected
Source: `src/gc_heap.h`, `src/gc_heap.c`, `src/gc_mark.c`, `src/gc.c`
Tests: `test/test_memory.c`, `test/unit/test_gc.c`
## Acceptance Criteria
- [ ] `grep -rE 'heap2_|tlab2_|page2_' src/ test/` returns no matches
- [ ] `make build` succeeds
- [ ] `make test` passes (existing tests are the validation)
Each criterion should follow the pattern:
- [ ] [What] [Verification command] [Expected result]
- [ ] `valk_system_config_t` defined in `src/gc.h`: `grep -c 'valk_system_config_t' src/gc.h` returns >= 1
- [ ] System create allocates heap: `test -f test/test_system.c && make test_system && ./build/test_system --test create` passes
- [ ] No old symbols remain: `grep -rE 'valk_gc_coord[^_]|valk_runtime_' src/` returns no matches
- [ ] Event loop shares system heap (no per-loop heap): `grep -c 'valk_gc_heap_create' src/aio/aio_uv.c` returns 0
- [ ] System works correctly <!-- Too vague, no command -->
- [ ] Performance is acceptable <!-- Not measurable -->
- [ ] All tests pass <!-- Which tests? Whole suite is untargeted -->
- [ ] Code compiles <!-- Obvious, not a meaningful criterion -->
Each acceptance criterion should be satisfiable without requiring unrelated criteria to be done first. If criterion B depends on criterion A, either:
The planner uses acceptance criteria to generate task dependencies. If criteria are tangled, the planner creates tangled tasks.
CRITICAL: Acceptance criteria that reference tests can create unfulfillable task dependencies.
Task A: "Create foo.c"
accept: "test_foo.c passes"
Task B: "Write test_foo.c"
deps: [Task A] # Can't write tests until code exists
Deadlock: Task A requires test_foo.c which doesn't exist yet.
Option 1: Import/existence acceptance for code tasks
- [ ] `foo.c` implements FooClass: `grep -c 'FooClass' src/foo.c` returns >= 1
- [ ] `test_foo.c` passes: `make test_foo && ./build/test_foo` passes
The planner creates two independent tasks. The test task naturally depends on the code task.
Option 2: Bundle code + test in one criterion
- [ ] `foo.c` implements FooClass AND `test_foo.c` covers it: `make test_foo && ./build/test_foo` passes
Single task that includes both.
id, name, status"Before finalizing a spec, verify:
| Mistake | Problem | Fix |
|---|---|---|
| 1200-line mega-spec | Planner can't fit gap analysis in context; creates coarse tasks | Split into 4-8 focused specs with cross-references |
| Full function bodies in spec | Builder copy-pastes without understanding; planner sees one blob | Signatures + behavioral contracts only |
| "Phase 1... Phase 2..." structure | Planner maps 1 phase = 1 task (too coarse) | Separate specs per phase, or remove phase labels and let planner determine order |
| "Handle errors gracefully" | Undefined behavior | Specify exact error response per condition |
| "Should be performant" | Not measurable | "Responds within 100ms for 99th percentile" |
| "Similar to X" | Requires inference | Spell out the behavior explicitly |
| "etc." or "and so on" | Incomplete list | List all items explicitly |
| Missing edge cases | Incomplete spec | Add criteria for: empty input, max limits, concurrency, partial failures |
| Acceptance criteria: "tests pass" | Which tests? Untargeted. | "pytest tests/unit/test_foo.py passes" |
| Requirements section with 15+ items | Planner creates one mega-task | Split into multiple H3 subsections, each ≤ 3 files |
# System Thread Registration
## Overview
Add thread registration and unregistration to valk_system_t so the GC
coordinator knows how many threads are active and can wake event loop
threads for STW pauses.
## Dependencies
- Requires `system-types.md` (valk_system_t must exist with threads[] array)
- Requires `system-lifecycle.md` (valk_system_create must exist)
## Requirements
### Thread Info Extension
Add `wake_fn` and `wake_ctx` fields to `valk_gc_thread_info_t` in `src/gc.h`,
after the existing `mark_queue` field:
| Field | Type | Purpose |
|-------|------|---------|
| `wake_fn` | `void (*)(void *wake_ctx)` | Called to wake thread for STW |
| `wake_ctx` | `void *` | Opaque context passed to wake_fn |
Event loop threads set `wake_fn` to a `uv_async_send` wrapper.
Main thread sets `wake_fn = NULL` (woken by its own safe point check).
### Register Function
`void valk_system_register_thread(valk_system_t *sys, void (*wake_fn)(void *), void *wake_ctx)`
Must:
- Call `valk_mem_init_malloc()` for the calling thread
- Set `valk_thread_ctx.heap = sys->heap`
- Set `valk_thread_ctx.system = sys`
- Atomically increment `sys->threads_registered` and use the old value as slot index
- Populate `sys->threads[slot]` with ctx, thread_id, active=true, wake_fn, wake_ctx
### Unregister Function
`void valk_system_unregister_thread(valk_system_t *sys)`
Must:
- Call `VALK_GC_SAFE_POINT()` to participate in any pending STW
- Set `sys->threads[id].active = false` and `wake_fn = NULL`
- Atomically decrement `sys->threads_registered`
- Set `valk_thread_ctx.gc_registered = false`
### Wake All Function
`void valk_system_wake_threads(valk_system_t *sys)`
Iterate `sys->threads[0..MAX]`. For each active thread with non-NULL
`wake_fn`, call `wake_fn(wake_ctx)`.
## Acceptance Criteria
- [ ] `valk_gc_thread_info_t` has `wake_fn` and `wake_ctx` fields: `grep -c 'wake_fn' src/gc.h` returns >= 1
- [ ] Register function exists: `grep -c 'valk_system_register_thread' src/gc.c` returns >= 1
- [ ] Unregister function exists: `grep -c 'valk_system_unregister_thread' src/gc.c` returns >= 1
- [ ] Wake function exists: `grep -c 'valk_system_wake_threads' src/gc.c` returns >= 1
- [ ] After create, calling thread is registered: test in `test/test_system.c` — `make test_system && ./build/test_system --test create` passes
- [ ] Two registrations get unique slots: test in `test/test_system.c` — `make test_system && ./build/test_system --test register_unique_slot` passes
- [ ] Wake function calls wake_fn for active threads: test in `test/test_system.c` — `make test_system && ./build/test_system --test wake_threads` passes
This spec is 70 lines. The planner will likely create 3-5 tasks from it (struct extension, register, unregister, wake, tests). Each requirement subsection maps cleanly to a deliverable.
Once the spec is written:
ralph plan <spec> generates tasks from the specralph construct <spec> enters construct mode:
INVESTIGATE -> BUILD -> VERIFY
^ |
| [gaps found] |
+--------------------+
[failure: timeout/context]
|
v
DECOMPOSE -> (next iteration)
| Command | Description |
|---|---|
ralph plan <spec> | Generate tasks from spec (gap analysis) |
ralph construct <spec> | Enter construct mode for spec |
ralph query | Get full current state as JSON |
ralph query stage | Get current stage |
ralph task add '<json>' | Add single task |
ralph task add '[...]' | Batch add tasks |
ralph task done | Mark current task as done |
ralph task accept <id> | Accept a done task |
ralph task reject <id> "reason" | Reject a done task |
Ralph uses tiered context management:
| Threshold | Action |
|---|---|
| 70% | Warning logged, execution continues |
| 85% | Compaction attempted |
| 95% | Kill current task, trigger DECOMPOSE |
Keep specs small to minimize context pressure. A 300-line spec plus codebase exploration can consume 40-60% of context before the planner even starts outputting tasks.
Ralph logs are stored in /tmp/ralph-logs/<repo>/<branch>/<spec>/.
Logs are auto-cleared on system restart.