| name | loom |
| description | Extract requirements from conversations, link to code, detect drift, decompose specs into atomic tasks, and run those tasks on a local small model. Use when making decisions, before modifying code, when a spec is ready for implementation, or to check staleness. |
Loom ๐งต โ Requirements Traceability & Small-Model Task Execution Skill
Weaving requirements through code โ and driving small-model code execution with them.
Loom captures decisions as versioned requirements, expands them into specifications, links code back to them with content hashes for drift detection, and (as of the Task + loom decompose + loom_exec layer) decomposes specs into atomic, executor-ready tasks that a local small model can complete against a full context bundle.
When to use
This skill is always active via AGENTS.md integration. Invoke at these moments:
| Moment | Action | Command |
|---|
| Onboarding a new target repo | Write .loom-config.json + health-check | loom init |
| Decision made in chat | Extract requirement (with rationale) | loom extract --rationale ... |
| Before modifying code (automatic) | Pre-edit briefing | hooks/loom_pretool.py (hook) |
| Manual drift check | Inspect a file | loom check <file> |
| After implementing | Link to reqs or specs | loom link <file> --req/--spec |
| Spec ready for implementation | Decompose into atomic tasks | loom decompose SPEC-xxx --apply |
| Task queue has ready work | Execute locally | loom_exec --next |
| Heartbeat | Surface staleness / drift | loom status --json |
| Cold requirements piling up | Surface stale + unlinked | loom stale --older-than 90 |
| Decision retired (not replaced) | Archive instead of supersede | loom archive REQ-xxx |
| Any time | Measure hook cost | loom cost |
| Effectiveness telemetry | Coverage + drift + activity rollup | loom metrics |
| CI gate | Single 0-100 health number | loom health-score --json |
Core commands
Traceability
loom extract [--rationale "why"] โ Parses REQUIREMENT: domain | text from stdin. Emits versioned records; supersedes conflicting prior requirements on request.
loom check <file> โ Drift check. Exits 2 if any linked req has been superseded.
loom context <file> โ The briefing the hook injects: linked reqs, specs, drift. JSON-first.
loom link <file> [--req REQ-xxx | --spec SPEC-xxx] โ Link code, with content-hash capture for later drift.
loom status, loom query "text", loom list, loom trace <target>, loom chain <req_id>, loom coverage โ read-only views.
loom conflicts --text "..." โ Detect conflicts. Now LLM-verified (embedding overlap surfaces candidates; an LLM confirms before reporting).
Specifications & patterns
loom spec REQ-xxx -d <description> [-c <criterion>]... [-s <status>] โ Detailed HOW for a requirement. -c is repeatable for each acceptance criterion.
loom pattern, loom patterns, loom pattern-apply โ Shared design standards across requirements.
Tasks & execution
loom decompose SPEC-xxx [--model provider:name] [--apply] [--out file.yaml] โ Proposes atomic tasks. Defaults to anthropic:claude-opus-4-7 if ANTHROPIC_API_KEY is set, else ollama:qwen2.5-coder:32b. Validates atomicity (โค2 files, โค80 LoC, single grading criterion) and the dep graph before persisting.
loom task {add|list|show|claim|release|complete|reject|prompt} โ Atomic-task lifecycle. loom task list --ready filters by dependency completion.
loom_exec [TASK-id | --next | --loop] โ Drives Ollama end-to-end: claims, assembles context bundle, calls executor, applies code to scratch copy, runs grading test, promotes on pass. Default model from LOOM_EXECUTOR_MODEL, falling back to qwen3.5:latest.
Hygiene & metrics
loom stale [--older-than N] [--unlinked] [--json] โ Rank requirements by last_referenced ascending. Never-touched ones rank coldest (sorted by creation timestamp).
loom archive REQ-xxx โ Mark a requirement as archived. Distinct from supersede; recoverable via loom set-status REQ-xxx pending.
loom metrics [--since N] โ Effectiveness rollup: requirements (active/archived/superseded), coverage (impl + test-spec %), drift events + ratio, conflicts caught, activity (extracted/linked over the window), staleness buckets.
loom health-score โ Single 0-100 score over impl coverage + test coverage + freshness + non-drift. CI-friendly via --json | jq .score.
Docs & measurement
loom sync โ Regenerate REQUIREMENTS.md and TEST_SPEC.md from the store.
loom cost โ Aggregate hooks/loom_pretool.py log: p50/p95/p99 latency, bytes injected, overhead (fires with nothing to inject).
loom doctor โ Full health check (Ollama reachable, store integrity, orphan impls, drift, coverage).
Validated thesis
With enough detail in requirements, spec, and context, and small enough units of work, very small models can be effective.
Benchmarks in benchmarks/ollama_gaps*.py ran three tasks of increasing difficulty (write, extend, behavior-preserving refactor) against several local and cloud models. qwen3.5:latest (9.7B, local) matched Opus 4.7 on 3/3 trials across all three tasks. See experiments/gaps/FINDINGS.md.
What this means operationally
- Decomposition is the expensive step. Run it once per spec with a frontier model.
- Execution is the cheap step. A 9.7B local model produces deterministic, passing output at
temperature=0 when handed the right context bundle.
- Architectural cost split on a 100-task project: frontier-only โ $30, hybrid (Opus-decompose + qwen3.5-execute + Opus-review) โ $0.60โ1.
AGENTS.md integration
## Loom Integration
When a decision is made about how something should work:
โ `echo "REQUIREMENT: domain | text" | loom extract --rationale "why"`
When a spec is ready for implementation:
โ `loom decompose SPEC-xxx --apply` then `loom_exec --loop`
Before modifying code (automatic via PreToolUse hook):
โ Linked reqs, specs, and drift are injected as a system-reminder.
Manual equivalent: `loom context <file> --json`.
After implementing a feature:
โ `loom link <file> --req REQ-xxx` (or `--spec SPEC-xxx`)
During heartbeats:
โ `loom status --json` to surface drift
โ `loom cost` to keep an eye on hook overhead
Requirement domains
- terminology โ What things are called ("posts are called boats")
- behavior โ How features work ("reset requires 3-second hold")
- ui โ Visual/UX decisions ("mobile-friendly", "no markdown tables")
- data โ Data model constraints ("timestamps in UTC")
- architecture โ Technical decisions ("use PostgreSQL")
Data storage
~/.openclaw/loom/<project>/
โโโ loom.db # SQLite โ 6 entity tables + _loom_meta (pins embedding_dim)
โโโ .loom-specs.json # TestSpec JSON store
โโโ .loom-events.jsonl # User-meaningful event log (feeds `loom metrics`, `health-score`)
โโโ .hook-log.jsonl # PreToolUse hook activity (feeds `loom cost`)
โโโ .exec-log.jsonl # loom_exec run log (per-task latency, tokens, pass/fail)
โโโ PRIVATE.md # Private requirement IDs (excluded from public docs)
Example flow (three-layer + execution)
User: "The app should use half-hour increments for time selection"
Agent: loom extract --rationale "Matches appointment-booking domain"
โ REQ-042 {domain: data, value: "Time selection uses half-hour increments"}
Agent: loom spec REQ-042 -d "TimeSelector component: dropdown 00:00..23:30 step=30min; default round down; TZ local" \
-c "Dropdown options every 30 minutes" \
-c "Values round down to nearest 30min on arbitrary input"
โ SPEC-042a
Agent: loom decompose SPEC-042a --apply
โ 3 tasks persisted: dataclass, widget, wiring
Agent: loom_exec --loop --model qwen3.5:latest
โ T1 passes 4/4 tests in 4.6s
โ T2 passes 6/6 tests in 8.1s
โ T3 passes 3/3 tests in 5.2s
โ All code promoted to the working tree
Later, on Edit to time_selector.py:
PreToolUse hook โ loom context file
โ "Linked to REQ-042, SPEC-042a โ no drift"
User: "Actually, let's use 15-minute increments"
Agent: loom extract โ supersedes REQ-042, creates REQ-043
Next heartbeat:
Agent: loom status --json
โ DRIFT: lib/widgets/time_selector.py linked to superseded REQ-042
Files
src/loom/cli.py โ Main argparse CLI (registered as loom console script)
src/loom/exec_cli.py โ Small-model task executor (registered as loom_exec)
src/loom/store.py โ SQLite-backed LoomStore + dataclasses (Requirement, Specification, Pattern, Implementation, Task)
src/loom/services.py โ Shared logic between CLI and MCP server (decompose, apply_decomposition, task lifecycle, cost, metrics, health_score, conflict verification)
src/loom/docs.py โ REQUIREMENTS.md / TEST_SPEC.md generation, traceability matrix
src/loom/testspec.py โ JSON-backed TestSpec store
src/loom/embedding.py โ Pluggable provider dispatch (ollama / openai / hash) + LRU cache
src/loom/conflict_verify.py โ LLM-verified conflict pass
src/loom/runners.py โ Pluggable test-runner registry (pytest / dart_test / vitest / โฆ)
src/loom/templates.py โ loom init --template scaffolding
src/loom/prompts/ โ extract / link / decompose prompt templates (in-package; ship in the wheel)
src/loom/templates/ โ starter templates per runtime
scripts/loom, scripts/loom_exec โ thin shims for repo-clone use; pip install registers PATH equivalents
hooks/loom_pretool.py โ PreToolUse hook that injects context on Edit/Write
mcp_server/server.py โ MCP server exposing LoomStore as typed tools (Phase A + B shipped)
benchmarks/ โ capability + retrieval microbenchmarks
experiments/ โ bake-off harnesses and findings docs