| name | weave |
| description | Orchestrator-only convergence workflow: mine recent Claude/Codex JSONLs, weave cited findings into an action ledger, route every finding to a disposition, then red-team facts against raw logs. Triggers: weave, weave now, run weave, session weave, convergence weave. Use only when fleet is quiet; NOT for single-session mining or web research. |
/weave
At convergence, fan out deep cross-session mining, then prove each finding
turned into a real change — or kill it.
/weave is an orchestrator skill. The orchestrator wields it and relays the
mining sub-tasks to workers; workers do not self-invoke it. It operates
across sessions and repos at the moment the fleet goes quiet, catching the
cross-cutting issues no single worker ever sees — recurring frustrations, skill
gaps, anti-patterns, what-worked-vs-what-hurt — and routes each into a ledger
that makes waste visible.
The defining feature is not the fan-out. It is the action-ledger that
makes token-waste impossible to hide. Anyone can fan out N agents and produce
nice docs; the ledger is what forces every finding to a disposition and reports
conversion-to-change.
⚠️ This skill, and batch-session-miners before it, were lost once because
they lived as untracked WIP in gitignored docs.local/. A weave run's
harness, ledger, and retro are committed artifacts. Never leave them
untracked. That discipline is the whole reason this skill exists as a skill.
Triggers — arm vs fire
| Spoken | Effect |
|---|
| "weave" | Arms the skill: loads it, runs the convergence gate (scripts/convergence-gate.sh), and waits until all conditions pass. Never auto-fires. |
| "weave now" | Fires immediately, skipping the gate. Operator override — use only when the operator knows the fleet is quiet. |
1. Convergence gate (why it can't fire mid-flight)
A multi-way deep-mining fan-out next to the live Opus fleet = OOM + token
contention. The weave fires ONLY at convergence. All four must be true:
- 0 open PRs across golems + brainlayer + voicelayer.
- All worker panes idle.
- No in-flight Codex.
- Etan has SEEN + APPROVED the demo.
Plus a RAM gate (same lesson as content-demo's ram-gate.sh — quiesce first).
bash scripts/convergence-gate.sh
bash scripts/convergence-gate.sh --ack-demo
Condition #4 is not script-checkable — the gate stays BLOCKED until the
operator passes --ack-demo (or WEAVE_DEMO_ACK=1). Human/orchestrator in the
loop; never auto-fire. "weave now" is the only path that skips the gate.
2. The engine — session-mining fan-out
The weave mines the recent Claude + Codex session JSONLs (~/.claude/projects/**
~/.codex/sessions/**). "Web" = weaving a web across sessions, NOT web-search.
For Claude sessions it leans on the /skill-creator mining engine (the
deterministic parser session-miner.py — Claude-only — + the session-miner
sub-agent). Codex sessions have a different log shape and are NOT parsed by
session-miner.py; this skill's own prepare-mine-context.py handles both
formats (digest-if-present + keyword-grep excerpts), so a miner reads a uniform
context file regardless of source. The whole thing is wrapped by this skill's
reproducible harness:
WD=~/Gits/orchestrator/docs.local/weave-$(date +%F)
python3 scripts/weave-run.py discover --hours 24 --workdir "$WD"
python3 scripts/weave-run.py prepare --workdir "$WD"
python3 scripts/weave-run.py batches --workdir "$WD" --size 5
python3 scripts/weave-run.py aggregate --workdir "$WD" --tokens <N>
⚠️ The durable artifact is NOT the scratch WD. The bulky digests/findings
are regenerable; what must be committed (so it can't be lost like the
original weave) is the conclusions: the conversion-to-change metrics + the
high-importance routed findings + the retro. Commit those into this skill's
tracked retros/<date>.md (and brain_store them). Never leave the
conclusions only in docs.local/.
- Centerpieces first. The orchestrator's own session JSONLs hold the night's
decisions/corrections/failures.
discover tags them ★ and orders them first;
mine them deepest (references/topology.md).
- One miner = one session (not five shards of one). Parallelism = N sessions
at once, batches of ~5. Loop-until-dry, cap ~9 rounds — stop when a round
surfaces nothing new (don't burn the cap).
- Miners read the compact
mine-context/<label>.md (digest + grep excerpts),
then grep the raw JSONL only to quote verbatim. They never read a whole MB-scale
JSONL into context.
3. The miner contract (per session)
Each miner emits a findings JSONL — one object per line — at
<WD>/findings/<label>.jsonl:
{"id":"<label>#N","title":"...","detail":"...","evidence":"verbatim — source ref [line N]","type":"correction|frustration|anti-pattern|skill-gap|skill-candidate|decision|residual-bug|what-worked","track":"cmuxLayer|BrainLayer|VoiceLayer|MCL|MCP-layer|skill-creator|dashboard|plans|collab|cross-cutting","disposition":"MERGED-PR|PR-FILED|PR-FIX|SKILL-NEW|SKILL-EDIT|DEEP-RESEARCH|FOLLOW-UP|REJECTED|PARKED|KEEP|DUPLICATE","importance":1-10,"recurring":true|false}
Disposition → conversion class (what weave-ledger.py enforces): converged =
MERGED-PR/PR-FILED/PR-FIX/SKILL-NEW/SKILL-EDIT; open =
DEEP-RESEARCH/FOLLOW-UP; dropped (reason REQUIRED) =
REJECTED/PARKED/DUPLICATE; confirmation (excluded from the refined
denominator) = KEEP. (PR-FILED = PR opened but not yet merged;
FOLLOW-UP-FILED is accepted as an alias of FOLLOW-UP.)
Rules (inherit /never-fabricate + the session-miner discipline): verbatim
evidence with a [line N] or digest §N cite; no brain_store (files only);
dedup; suppress loop/cron noise; an empty session → an empty findings file, never
an invented one.
Dispatch (the bridge between batches and aggregate): spawn ONE miner per
session per batch (≤5 concurrent), centerpieces first. Two equivalent mechanisms:
- Workflow (preferred for staged mode): a
pipeline/parallel of agent()
calls, each given the miner prompt below + a findings JSON schema; the workflow
writes each result to <WD>/findings/<label>.jsonl.
session-miner sub-agent / Task calls from a skillCreator session (the
sub-agent is skill-creator-scoped): dispatch N Agent(subagent_type="session-miner", …)
in one message.
Miner prompt skeleton:
Mine ONE session for the weave. Read your context file <WD>/mine-context/<label>.md
(digest + grep excerpts). For verbatim quotes, grep the raw JSONL <src> by line —
never read the whole file. Emit findings (the JSON schema above), one per line, to
<WD>/findings/<label>.jsonl. Files only — no brain_store. Empty session → empty file.
Return: WEAVE_MINE_DONE <label> <count>.
VERIFY ON DISK — do not trust the agent's word (the weave's own thesis, applied to itself).
A miner may report success in its return value yet never have written the file
(observed live: a forced structured-output return competed with the Write step,
so ~70% of a 102-agent run self-reported wrote_file:true with no file on
disk). The findings FILE is authoritative, not the agent's claim. After every
batch run weave-run.py status --workdir "$WD" (it checks the actual files), and
re-mine any session with no file — with the file-Write as the terminal
deliverable and NO competing return schema. Loop batches until dry. (If you mine
via a Workflow with schema:, the agent often stops at the schema call and skips
the Write — prefer "Write the file, then reply WEAVE_MINE_DONE <label> <count>"
with no schema, and verify on disk.)
4. The action-ledger + conversion-to-change
scripts/weave-ledger.py aggregates all findings into ACTION-LEDGER.md +
ledger.json and computes the metric that decides if the weave was worth it.
Two denominators are reported — both, always:
- conversion-to-change (spec §4, the headline) = converged ÷ TOTAL findings.
This is the anti-waste number from the original spec ("0% conversion is
token-waste, full stop") — it keeps
KEEP confirmations in the denominator so
the ratio can't be flattered by reclassifying findings as "not actionable."
- conversion-to-change (refined) = converged ÷ ACTIONABLE, where actionable =
converged + open (
DEEP-RESEARCH,FOLLOW-UP) + dropped (REJECTED,PARKED,
DUPLICATE); KEEP is excluded because you can't "convert" a validated
what-worked. The refined number is informative, but the strict ÷-total number
is the one that governs the SHIP/RETIRE decision (see EVAL.md).
(converged = MERGED-PR+PR-FILED+PR-FIX+SKILL-NEW+SKILL-EDIT.)
- token cost per acted-on finding = weave tokens ÷ converged (
--tokens N).
- Routing is mandatory.
--strict exits non-zero if any finding has an
unknown disposition or is DROPPED without a reason. Route EVERY finding.
A weave with 0% conversion is token-burn — the ledger surfaces that instead of
letting "we produced N nice docs" pass for progress.
4b. Red-team fact-check — MANDATORY closing stage (anti-hallucination guard)
After synthesis, before ANY finding is trusted or acted on, a red-team
workflow verifies every load-bearing fact in the ledger + synthesis against
the raw JSONLs. This is non-negotiable — it is the anti-hallucination guard
for the L0 memory problem: a wrong fact that reaches the plan or brain_store
poisons every downstream decision. (Proven valuable: the 2026-05-29 weave's
red-team caught a wrong WhatsApp number for Etan plus several other wrong
facts that would otherwise have been acted on.)
Anchor on the highest-trust ground truth — what the OPERATOR said and did:
- "What Etan SAID" — every verbatim Etan quote / correction in the window.
Re-grep the cited JSONL line; confirm the quote is verbatim (not
paraphrased) and the number / name / path / PR# is exactly right.
- "What Etan FIXED" — his decisions/corrections this window. Confirm the
finding's claim about what was decided matches what the JSONL actually shows.
- Then sweep the high-importance (≥8) findings: every cited
[line N] must
resolve to the quoted text; every attribution (who did what, which repo, which
PR) must hold.
Mechanism (a fan-out workflow): one verifier per batch of claims, each
re-greps the raw JSONL and returns {claim, verbatim_match, correct_attribution, corrected_value, verdict}. Any claim that fails is corrected in place or
dropped with a reason in the ledger before the plan/retro/brain_store are
trusted. Default to skeptical: if a fact can't be confirmed against the JSONL,
treat it as unverified and flag it. (Same discipline as /never-fabricate + the
session-miner GAP-REPORT.)
A weave's findings are only as trustworthy as this stage makes them. No weave
output is "done" until the red-team fact-check has run and its corrections are
folded back into the ledger.
5. What the weave EMITS (it's the front of a snowball, not a report)
/weave → next-gen large-plan → orchestrator runs parallel tracks → ship → re-weave.
The ledger is the compounding instrument across loops.
Deliverable 2 is the highest-priority emit (Etan emphatic 3×), even though
Deliverable 1 is the terminal artifact. If you can only land one, land the gate.
- The next-gen large-plan (terminal artifact). A
/large-plan covering up
to 5 parallel tracks, each track populated by mined findings, not
invented. Big initial collab = modularize/componentize first, then 4–5
LEAD orchestrators (one per track). The spec's named tracks:
- cmuxLayer — fix deterministic pane placement (the recurring pain).
- BrainLayer — engine/package split; BrainBar = its own package.
- VoiceLayer.
- MCL (Meta-Comms Layer) — its own secure repo, cmux-adjacent, all AI
reviewers enforced; a deep-research candidate.
- MCP-layer.
Per the project-OS vision, each track's plan is owned by that domain LEAD;
the orchestrator owns its own large-plan; everything coordinates through
collab files (+ Google Drive + BrainLayer + MCL as a 4th channel), and
the conductor is a clickable drill-down (track → that domain's large-plan).
- The self-QA-before-handoff gate (HIGHEST priority). Formalize the rule
that closes the verify-gap: ship = build → FUNCTIONAL self-QA against the
fix-list → comparison artifact → THEN handoff. "Generated" ≠ "verified";
"merged" ≠ "converged into one verified build." Mechanical checks (PID running,
commit-matches) are NOT a functional pass. Concretely: gate merges on Codex
computer-use — actually click/screenshot/verify the UI (BrainBar / VoiceBar /
dashboard) before merge. (Same family as
/never-fabricate + the /qa-video
method-attribution rule.)
- Use CODEX for the CU pass, not Claude. Codex is strong at computer-use
and has driven the BrainBar menu-bar app before; Claude CU is weak at it.
So route visual/functional QA of menu-bar apps to Codex CU.
- Known gotcha (not a hard block): a CU session can hit a "BrainBar (not
installed) / 0 apps" grant dialog for the
LSUIElement menu-bar app — that's
a grant/focus state, not an impossibility. Fallbacks when it blocks direct
CU: screencapture CLI + coordinate clicks, or an in-app PNG-export
affordance (qa-video hotspot #13).
6. The snowball — retros make the next weave better
After each run, write retros/<date>.md: what we learned, what to improve next
time (better miner prompts, better disposition routing, what got missed), what
the red-team fact-check (§4b) caught and corrected (the wrong facts that would
otherwise have shipped), and a delta vs the prior weave. brain_store the
conclusions. The next weave starts
from the last retro — that's the compounding. (The reason this never snowballed
before: the weave was never built or committed. Fixed here, permanently.)
6b. Future dimension — model-change-tracking (weave-owned)
A planned weave dimension (don't block the first run on it): whenever a new model
drops, the weave should auto-surface skill-relevance actions. Research agents
track what changed between the new model and the prior one (thinking/capability
deltas, what it now does natively), and for each skill the weave proposes:
still needed? · model now smart enough → RETIRE · or make it LEANER? ·
what changed in the model's reasoning that affects it? This ties straight into
/skill-creator's capability-uplift vs encoded-preference classification:
capability-uplift skills obsolesce as models improve; encoded-preference skills
endure. Routing these proposals through the ledger turns each model bump into more
conversion-to-change. (Fold into a future iteration / retro — not the first build.)
Files
| Path | Role |
|---|
scripts/convergence-gate.sh | The 4-condition gate + RAM check; arms "weave", bypassed by "weave now" |
scripts/weave-run.py | Reproducible orchestrator: discover → prepare → batches → aggregate |
scripts/prepare-mine-context.py | Compact per-session context (digest + grep excerpts) for one miner; Claude + Codex formats |
scripts/weave-ledger.py | Action-ledger + conversion-to-change + routing-contract enforcement |
references/topology.md | flat-N vs staged, batch size, centerpieces-first, the round structure |
EVAL.md | Backtest baseline, flat-vs-staged eval, the conversion metric, smoke checks |
evals/fixtures/findings-{clean,violations}/ | Committed smoke fixtures: clean → --strict exit 0; violations → exit 2 |
Wiring (Etan's "right places")
- Invoked by the orchestrator at sprint close (the
/orc convergence step).
- The ledger output feeds the next sprint's backlog / the gen-N large-plan.
- The mining engine is
/skill-creator (session-miner + session-miner.py);
/weave is the orchestrator wrapper that arms it, gates it on convergence, and
routes its findings through the action-ledger. (batch-session-miners was
folded into /skill-creator — single source of truth for mining; /weave is
the only batch-mining orchestrator skill.)
Integration with other skills
/skill-creator — the mining engine (mine-session, session-miner sub-agent, the parser).
/large-plan — the weave's top output is a large-plan; tracks come from findings.
/never-fabricate — every finding cites verbatim evidence; no invented ledger rows.
/pr-loop — converting a SKILL-EDIT/PR-FIX finding to MERGED-PR goes through it.
/orc — convergence detection + dispatch of miners; surfaces the ledger to Etan.