-
Confirm the source of truth. There is exactly one .claude/workflows/<name>.js, it begins
with export const meta = {…} (a pure literal), and its body uses only the Workflow hooks
(agent/parallel/pipeline/phase/log/args/budget) — no import/export besides
meta, and top-level return/await are fine. You edit and prove the workflow on Claude;
pi inherits it. Never edit pi's copy of a prompt — there is no copy.
-
Set the credential ONCE in pi's own global config (per machine, not per repo). The model +
key live in pi's native ~/.pi/agent/models.json, which pi resolves for EVERY project — so no
product ever needs its own key, .env credential, or provider extension.
cp templates/models.json.example ~/.pi/agent/models.json
chmod 600 ~/.pi/agent/models.json
pi --list-models cp
The provider name MUST stay cp (that's what the driver passes as --provider). Any
OpenAI-compatible endpoint works (api: "openai-completions"). Skip this if it's already set up
on the machine. See reference/provider-and-headless.md.
-
Drop in the harness — verbatim. Copy templates/pi-runner/ into the repo (alongside
.claude/): extract.mjs, run.mjs, .env.example, .gitignore (and providers/coding-plan.ts
only if a provider needs a custom API impl / OAuth — models.json covers the OpenAI-compatible
case). You edit none of the engine files. run.mjs / extract.mjs are generic and stay
byte-identical across every repo (and this template) — a future fix is a one-file copy, never a
manual merge. The copy also brings viz-model.mjs + tui/ (the pi-tui cross-project monitor,
step 7). At adoption, register the folder as a namespace once — pi-tui add . — so it appears in
the global pi-tui console before its first run; every run thereafter self-registers too.
-
Configure the per-repo wiring .env. cp templates/pi-runner/.env.example pi-runner/.env,
then set the wiring only (no secret): PI_RUNNER_WORKFLOW (path to the .js, relative to
repo root) and, if your build runs in a subpackage, PI_RUNNER_CWD (where pi executes + where
node-reported relative artifact paths resolve). PI_RUNNER_ROOT defaults to pi-runner/'s parent.
Optionally PI_RUNNER_UNTIL (default --until during bring-up) and PI_CP_MODEL (pin a non-default
model id from models.json for this repo).
-
Sanity-check the DAG (free). node pi-runner/extract.mjs prints the realized stages — no
model invoked. Confirm node count + parallel lanes match the workflow you proved on Claude.
-
Dry-run (free), then live (background, --debug).
node pi-runner/run.mjs --run <id> --arg <k=v> --until <phase> --dry-run
node pi-runner/run.mjs --run <id> --arg <k=v> --until <phase> --debug
Pass the workflow's args with --arg k=v (repeatable) and --arg-file k=path (reads file
text, e.g. --arg-file brief=./brief.md). --until brings a long pipeline up one block at a
time so a bare run can't hit a later toolchain wall; its mirror --from <phase> (and the
--only <phase> shorthand) RESUMES from a node on its frozen upstream artifacts —
preflight-gated — so a one-node fix retests in one node, not a full replay.
-
Monitor as the console. Poll out/<id>/run-status.json (verified status — ok requires
artifacts on disk), or use the two generic monitors shipped in the kit:
node pi-runner/status.mjs --run <id>
node pi-runner/status.mjs --run <id> --every 5
node pi-runner/watch.mjs --run <id> --notify
watch.mjs is the wake-on-event sentinel for a backgrounded run — it stays silent (no console
spam) and exits with one summary line the moment the run finishes, a node errors, the driver goes
stale, or a node DEAD-stalls (past 10 min — NOT the noisy 45s transient cp pause). Both are
PID-free (driver-death is inferred from run-status staleness), so they work for any run with zero
wiring. Fleet = one background driver per instance, one watch.mjs each. See
reference/orchestration.md. You run every command; the user runs nothing.
Across all projects — pi-tui. status.mjs/watch.mjs watch ONE run; pi-tui is the standing
global console. Every run auto-registers its folder as a namespace in ~/.pi-runner/registry.json,
so a bare pi-tui (installed once: cd pi-runner/tui && npm install && npm link) lists every project
and its live runs with no flags — drill in to namespace → thread (run) → per-node detail (status ·
time · tokens · gantt · artifacts · live output). pi-tui add . registers a folder before its first
run; PI_RUNNER_NO_REGISTER=1 opts out. See pi-runner/tui/README.md.
-
Adopt the Output Contract (recommended — one paste). Paste templates/workflow-snippets/contract.js
into your workflow .js next to discipline(), and wrap each producing node's prompt with
contract({ artifacts:[…], owns:[…], readScope:[…] }). Now the driver verifies each node's REQUIRED
artifacts independent of the self-report — a clean exit missing one is blocked, not a false ok —
and (under --sandbox, step 12) the readScope becomes the node's OS-enforced read boundary.
Declare readScope on every producing node at the same time as artifacts/owns (it is part of
authoring a node, not a later bolt-on). This is already baked into the engine run.mjs; the snippet
is the only per-workflow edit. See reference/artifact-contract.md.
-
Harden for parallel fleets (opt-in — --worktree). For a multi-run fleet, add --worktree
(or PI_RUNNER_WORKTREE=1): each run executes in its OWN git worktree (branch pi/<id>), so
concurrent runs are PHYSICALLY isolated — a non-Claude model cannot see or clobber another run's files.
Pass the run's input via --arg/--brief (the worktree is a clean HEAD checkout). Merge-back
is a conflict-free union IF your project doesn't hand-edit a shared registration list — see the
auto-discovery enabler (templates/examples/auto-discover-registry.example.mjs) +
reference/worktree-isolation.md. Also engine-baked; --worktree is the only switch.
-
Arm the escalation gate (opt-in — PI_RUNNER_ESCALATE=1). A non-Claude model runs every node; on a
verified failure (artifact-contract breach / stuck-loop / timeout / degenerate — never self-
confidence) the driver consults a stronger, ideally cross-family model ONCE, fed the failure
evidence. Wiring is .env only: PI_RUNNER_ESCALATE_MODEL (+ optional PI_RUNNER_ESCALATE_PROVIDER),
PI_RUNNER_MAX_RETRIES. Pick a cross-family consult — a provider whose non-Claude default is already its
top tier has no headroom (DashScope cp: qwen3.7-max is the ceiling → escalate to minimax/MiniMax-M3).
DRIVER-NO-ESCALATE opts a pure gate out. Engine-baked; driver-side, no pi extension. See reference/escalation.md.
-
Tighten the loop with the node-contract extension (opt-in — PI_RUNNER_CONTRACT_EXT=1). Loads
extensions/node-contract.ts via -e: a typed submit_result tool (structured return — the model
calls it, so it can't botch the ```json fence; the driver reads result.details off the
tool_execution_end event, with the fenced-JSON parser as fallback) + an in-loop owned-paths
tool_call block (BLOCKS an out-of-lane write/edit before it lands, from the node's DRIVER-OWNS).
Per-node tool gating rides the same family: DRIVER-TOOLS / DRIVER-EXCLUDE-TOOLS markers →
--tools/--exclude-tools. Both spike-verified on qwen headless; see reference/artifact-contract.md.
Tool-gating doubles as a non-Claude-model BEHAVIOR LOCK, not only a write-safety rail. When prompt-craft
alone won't move a weak executor, cut its tools to FORCE the action shape: a non-Claude model fills a fresh
structured artifact far more reliably by whole-file write than by exact-match edit, so EXCLUDING
edit/read-chain tools until write is the only affordance is what finally made MiniMax WRITE a complete
blueprint.json instead of composing it in-head and returning it inline (two prompt-only redesigns had
failed first). Choose the gated set by the action you must FORCE, not only the writes you must forbid —
DRIVER-EXCLUDE-TOOLS is a structural lever (same family as the owned-paths block), and a structural
invariant belongs in the harness, not in more prose the model can ignore.
-
Lock the read-scope — standard per-node, OS-enforced under --sandbox (macOS). --worktree
stops a node writing outside its lane; it does NOT stop it reading a sibling's files (a non-Claude
model that can't find a component greps the whole tree + reads other units' source, bloating context
until it times out). The fix is two parts. (a) Author-time, always: declare a readScope on
EVERY producing node's contract({…}) — the same tier as artifacts/owns — so each node's prompt
carries a DRIVER-READ-SCOPE: marker naming its legitimate read surface (its own data/out dirs + the
shared skills/catalog it reads). Leaving a node un-scoped is the bug this prevents (in the reference
workflow, only the composer was scoped, so a non-Claude model read-thrashed an un-scoped node to a
timeout). (b) Fleet-time, opt-in: run with --sandbox (or PI_RUNNER_SANDBOX=1) so a scoped
node runs under macOS sandbox-exec (Seatbelt) and any read outside {toolchain ∪ declared scope}
returns EPERM — kernel-enforced and inherited by child grep/find/cat. Default OFF and
byte-identical when off (the markers are inert text); only a marked node is wrapped. Pair it with the
two behavioral watchdogs (PI_RUNNER_STALL_TIMEOUT silent-death kill, PI_RUNNER_TOOL_REPEAT_KILL
no-progress tool-thrash kill) that catch the degenerate classes the prompt can't. macOS only (a Linux
fleet would use bubblewrap — not wired). Engine-baked; sandbox/read-scope.sb is the profile. See
reference/read-scope-sandbox.md.
-
Seed the per-node output-criteria fixture (the judging standard). Creating a workflow's harness includes creating its acceptance-criteria fixture alongside the skill-system map — <repo>/.agents/skill-system-criteria.md, ONE entry per producing node (artifact → downstream purpose → acceptance criteria → red flags). The node set is exactly what extract.mjs already enumerates, so draft it with a per-node criteria-drafting workflow (one agent per node reads that node's skill + a real sample artifact + the brief, returns a structured {purpose, criteria, redFlags}) and write the returned entries to the fixture. This is the human-judged quality bar every future run is judged against — the complement to the mechanical Output Contract (existence/lane) and the sibling of the skill-system map (composition/diagnostics). It is a JUDGING fixture, NEVER injected into a node's prompt (that teaches-to-the-test and voids the clean-room signal that tells you whether the SKILL ITSELF produces good output). The hermes-skill-system loop then MAINTAINS it (sharpens a node's criteria whenever an edit changes what good output for that node means); edit it by hand too, whenever you decide a node should emit a different/richer shape.
A workflow ships with both an automated in-pipeline VERIFICATION surface (the verify nodes) AND the
human-judged criteria fixture (step 13). Production runs the verify nodes for stable, unattended output. But
during development/debugging — when you're babysitting the run — they're slow, and you (orchestrator + human)
judge better. Companion Mode makes the orchestrator the standing verifier:
Designing a workflow IS setting each node's input/output standards — this is the most meaningful place to fix
them, and where an edit must be reconciled against the rest. For every node: