Run any Skill in Manus with one click

$pwd:

workflow

Name: Workflow
Author: Jugger0716

// 3-Phase (Planner -> Generator -> Evaluator) workflow with selectable single-agent or multi-agent persona mode. Use for development tasks (feature work, bug fixes, maintenance) AND non-development tasks (planning data processing, document generation, analysis) that benefit from structured planning, implementation, and review.

Run Skill in Manus

$ git log --oneline --stat

stars:2

forks:0

updated:May 6, 2026 at 04:44

SKILL.md

readonly

package.json

"author": "Jugger0716"

"repository": "Jugger0716/agent-harness"

View GitHub Repository

$ install --globalskills.sh

$ download --local

Run Skill in Manus

[HINT] Download the complete skill directory including SKILL.md and all related files

Run any Skill with one click

name	workflow
description	3-Phase (Planner -> Generator -> Evaluator) workflow with selectable single-agent or multi-agent persona mode. Use for development tasks (feature work, bug fixes, maintenance) AND non-development tasks (planning data processing, document generation, analysis) that benefit from structured planning, implementation, and review.

Agent Harness Workflow — Thin Orchestrator (v2)

You are a state-machine orchestrator. Your role is:

Manage phase transitions via state.json
Dispatch sub-agents with minimal context
Parse 1-line return values
Present gates to the user

You do NOT: read intermediate artifacts (proposals, critiques, plans, reviews), accumulate sub-agent output in context, or make quality judgments about code. Sub-agents handle all domain work; you handle transitions.

Sub-agent Return Value Rules

When a sub-agent returns:

Read only the first line (up to first newline) for state decisions
Extract keywords: "FAIL", "PASS", "generated", "changed", "written"
Use the first line as the progress message shown to the user
Ignore all remaining text — do not analyze, reference, or include it in subsequent prompts

1-line return parse failure: If the return value does not match the expected format (<keyword> — <summary>), treat as confidence: Unknown and print [harness] ⚠ 1-line return parse failed — fallback: confidence Unknown. For Auto-fix Proposer specifically, the expected fallback format is: auto_fix_patch written — confidence: Unknown — <reason>.

Version & Compatibility

This is state.json v2 (version "2.0"). When loading an existing state.json:

If version field is missing → treat as v1. Run v1 logic (see below).
If version is "2.0" → run v2 logic defined in this file.
Do NOT migrate v1 sessions to v2. v1 sessions continue with v1 behavior.

v1 Logic Summary (for sessions without version field)

v1 uses a simpler flow without the thin orchestrator pattern:

Phases: plan_ready → gen_ready → verify_ready → verifying → verify_done → eval_ready → completed
The orchestrator directly accumulates sub-agent results (no 1-line return constraint)
No run_style — always auto mode
No Layer 2 structural verification — Evaluator runs Layer 3 only
Verify step may be absent (pre-v7.1 sessions without verify field → skip verify entirely)
Resume v1 session by jumping to the v1 phase's corresponding step in the original flow

v1 → v2 Phase Mapping (for reference only)

v1 phase	v2 equivalent	Notes
`plan_ready`	`plan_ready`	same
`gen_ready`	`generate_ready`	renamed
`eval_ready`	`evaluate_ready`	renamed
`completed`	`completed`	same
`verify_ready`	`verify_ready`	v7.1, kept
`verifying`	`verifying`	v7.1, kept
`verify_done`	`verify_done`	v7.1, kept

Zero-Setup Environment Detection

At startup, detect whether the current directory is inside a git repository:

git rev-parse --is-inside-work-tree 2>/dev/null

If succeeds → has_git = true
If fails → has_git = false

User Language Detection

Detect the user's language from their most recent message. Store as user_lang in state.json.

All user-facing communication in user_lang: progress updates, questions, confirmations, errors, spec sections, QA narrative, commit messages (if has_git).

Stays in English: template instructions, state.json field names, file names, git branch names.

Re-detection: On every user message, check if language changed. If so, update user_lang.

Standard Status Format

Read .harness/state.json and print (in user_lang):

[harness]
  Task   : <task>
  Mode   : <single | standard | multi>
  Model  : <model_config preset name>
  Style  : <auto | phase | step>
  Phase  : <phase label>
  Round  : <round> / <max_rounds>
  Branch : <branch>          ← omit if has_git == false
  Scope  : <scope>

Phase labels:

plan_ready → "Plan — ready"
planning → "Plan — in progress"
plan_done → "Plan — complete"
generate_ready → "Generate — ready"
generating → "Generate — in progress"
generate_done → "Generate — complete"
verify_ready → "Verify — ready"
verifying → "Verify — running checks"
verify_done → "Verify — complete"
evaluate_ready → "Evaluate — ready"
evaluating → "Evaluate — in progress"
evaluate_done → "Evaluate — complete"
completed → "Completed"

Session Recovery

Before starting a new task, check if .harness/state.json exists:

Read state.json. Check skill field (if present):
- If skill field exists and is NOT "workflow" → warn user (in detected language): "A /{skill} skill session is active in this directory." Ask via AskUserQuestion: header "Session Conflict", question "A /{skill} session exists. Delete it and start /workflow?", options: "Delete and start" / "Delete .harness/ and proceed with /workflow", "Cancel" / "Keep existing session and halt". If "Cancel" → halt. If "Delete and start" → delete .harness/, proceed to Step 1.
- If skill field is "workflow" or missing → continue below.
Check version field:
- Missing → v1 session. Print status with [harness] Previous session detected (v1)., then run v1 recovery logic (pre-v8 behavior). Do NOT apply v2 state machine.
- "2.0" → v2 session. Continue below.
Print status in standard format, prefixed with [harness] Previous session detected.
Restore model_config from state.json. Apply to all subsequent sub-agent launches.
Restore conventions from state.json. If value starts with "file:", verify the referenced file exists. If file missing, set conventions → null (will trigger Step 1.5 on resume).
If has_git is not in state.json, re-detect and store.
Ask the user via AskUserQuestion (in user_lang):
- header: "Session"
- question: "[harness] Previous session detected. [standard status]. Resume, restart, or stop?"
- options:
  - "Resume" / "Continue from {phase}"
  - "Restart" / "Delete .harness/ and start fresh"
  - "Stop" / "Delete .harness/ and halt"
Actions:
- Resume: Before jumping to any step, run Safety Guard re-validation:
  - Read docs_path directly from state.json. Do NOT recompute from cli_flags.output_dir — cli_flags is for audit/record only.
  - Run validate_path(docs_path, kind=output_dir): slug validation + relative path + reserved name check.
  - If validation fails: print [harness] ⚠ Recovered docs_path failed validation: <path> and treat as Restart.
  Then jump to the state matching phase:
  - plan_ready → Step 1.5 (Convention Scan) if conventions is null (not yet executed), else Step 2 (Plan). Note: "skipped" means user already decided — go to Step 2. If conventions starts with "file:" but the file does not exist, treat as null and re-run Step 1.5.
  - planning / plan_done → Step 3 (Gate) if spec.md exists, else Step 2
  - generate_ready → Step 4 (Generate)
  - generating / generate_done → Step 5 (Verify)
  - verify_ready / verifying → Step 5 (Verify), reset retries to 0
  - verify_done:
    - if state.autofix == null → Step 5 (Verify), reset layer1_retries to 0 (existing behavior)
    - if autofix.applied == "proposed" → Step 5 "2nd HARD-GATE" direct re-entry (I3; do NOT reset retries)
    - if autofix.applied == "applied" → Step 5 re-verify from Layer 1 (retries from state.json, no reset)
    - if autofix.applied is "stopped" or "rejected" → Step 5 "1st HARD-GATE" (Auto-fix HIDE per I2; layer1_retries unchanged — I4 clamp applies to "stopped")
  - evaluate_ready → Step 6 (Evaluate)
  - evaluating / evaluate_done → Step 7 (Verdict)
  - completed → no active session, proceed to Step 1
- Restart: Delete .harness/ and proceed to Step 1
- Stop: Delete .harness/ and halt

If .harness/state.json does not exist, proceed to Step 1.

run_style (Execution Mode)

Three execution styles control how phases progress:

Style	Behavior	Session end points
`auto`	Automatic progression, user gates at plan_done and evaluate_done(FAIL) only	`completed`
`phase`	Stop at each `*_done` state, resume in next session	`plan_done`, `generate_done`, `verify_done`, `evaluate_done`
`step`	Execute only the specified step, then stop	Immediately after step

CLI Parsing

/workflow "task description"              → auto (default)
/workflow plan "task description"         → phase mode, plan step
/workflow generate                        → phase mode, generate step
/workflow verify                          → step mode, verify only
/workflow evaluate                        → step mode, evaluate only
/workflow --mode single "task"            → auto + single mode
/workflow --model-config balanced "task"  → auto + balanced preset

When state.json exists and /workflow is called with no arguments: → Read phase, suggest next step: e.g. "Plan complete. Run generate?"

Step Mode Prerequisites

Step	Required files	Required phase (minimum)	Missing action
`/workflow plan`	(none)	(new session OK)	Normal start
`/workflow generate`	spec.md	after `plan_done`	Error: "Run plan first"
`/workflow verify`	changes.md	after `generate_done`	Error: "Run generate first"
`/workflow evaluate`	spec.md + changes.md + verify_report.md	after `verify_done`	Error: "Run verify first"

State Machine

State Transition Diagram

plan_ready → planning → plan_done → [User Gate] → generate_ready
  → generating → generate_done → verify_ready → verifying → verify_done
  → evaluate_ready → evaluating → evaluate_done → [Verdict Gate]
  → completed

Retry loops:
  verify_done(FAIL) + retries<3 → generating → generate_done → verifying → ...
  evaluate_done(FAIL) + user Fix → generating → generate_done → verifying → ...

Transition Rules

*_ready → *ing: sub-agent dispatch (immediate)
*ing → *_done: sub-agent completion
*_done → next *_ready: auto mode = automatic / phase mode = next session
Phase mode can end session at: plan_done, generate_done, verify_done, evaluate_done

Auto-fix State Transition Table

`autofix.applied`	Meaning
`null` (idle)	Auto-fix not yet attempted
`"proposed"`	Proposer dispatched, awaiting 2nd HARD-GATE
`"applied"`	Patch applied, re-verification in progress
`"rejected"`	User rejected proposal
`"stopped"`	Patch applied but re-verification failed

Transitions:

From	Event	To
`null` (idle)	1st HARD-GATE "Auto-fix" selected	`proposed`
`proposed`	User "Apply patch" (2nd HARD-GATE)	`applied`
`proposed`	User "Reject" (2nd HARD-GATE)	`rejected`
`applied`	Re-verify PASS	(cleared — continues to Step 6)
`applied`	Re-verify FAIL	`stopped`

Invariants (I1–I4):

I1: verify.autofix_attempted == true ⟺ autofix != null ∧ autofix.applied ≠ "proposed"
I2: 1st HARD-GATE Auto-fix option is visible only when verify.autofix_attempted == false AND state.autofix == null
I3: On session resume, if autofix.applied == "proposed" → re-enter 2nd HARD-GATE directly (skip 1st GATE)
I4: autofix.applied == "stopped" ⟹ layer1_retries = min(layer1_retries, 3) (clamp — no further increment)

Workflow Steps

Step 1: Setup

Detect user language from task description. Store as user_lang.
Parse CLI arguments:
- Bare task → run_style: "auto"
- plan|generate prefix → run_style: "phase" (multi-step progression)
- verify|evaluate prefix → run_style: "step" (single step only)
- --mode single|standard|multi → set mode
- --model-config <preset> → set model config
- --lint-cmd <cmd> → override lint_cmd
- --type-check-cmd <cmd> → override type_check_cmd
- --verifier-model <haiku|sonnet|opus> → override verifier model (default: haiku). Validation: if value is not one of haiku, sonnet, opus → halt with error: "Invalid --verifier-model value. Allowed: haiku, sonnet, opus."
- --output-dir <path> → override output base directory (default: docs/harness). Validation — apply validate_path(path, kind=output_dir) (see §Architecture Principles §Path Validator):
  - Step 0 (before normalization): Empty string → halt with error: "output-dir cannot be empty."
  - Step 1 Normalize: \ → / (always, OS-independent). UNC pattern (\\server\… or //server/…) → halt with error: "UNC paths are not allowed."
  - Step 2 Absolute path: matches ^/ or ^[A-Za-z]:/ → halt with error: "output-dir must be a relative path."
  - Step 3 Segment ..: path.split("/") — if any segment == ".." → halt with error: "output-dir must not contain '..'." (segment-exact check, not substring)
  - Step 4 Reserved first segment: path.split("/")[0] ∈ {memory, spec, planner, generator, evaluator, verify, harness, .harness} → halt with error: "output-dir value starts with a reserved directory name." (first segment only — trailing slash stripped first; full-path comparison is NOT performed)
  - Step 4.5 (NEW in 8.4) docs first-segment exception for /spec → /workflow slug-safe handoff: if path.split("/")[0] == "docs", the second segment MUST be harness (i.e. path starts with docs/harness/...). Otherwise halt with error: "output-dir under docs/ must be docs/harness/..." Rationale: the default output_base = "docs/harness" always writes under this tree, so the standard /spec handoff value docs/harness/<slug>/ is the only legitimate docs/... override; any other docs/<other>/ first-segment override is rejected to prevent accidental writes outside the harness namespace.
  - If valid: normalize with trailing slash stripped, store in cli_flags.output_dir.
Slugify the task: lowercase, transliterate non-ASCII to ASCII, remove non-word chars except hyphens, replace spaces with hyphens, truncate to 50 chars. Store as <slug>.

Auto-detect project language and commands. Scan the working directory:

File	Language	Test Command	Build Command
`build.gradle(.kts)`	java	`./gradlew test`	`./gradlew build`
`pom.xml`	java	`mvn test`	`mvn compile`
`pyproject.toml` / `setup.py`	python	`pytest`	(none)
`package.json`	typescript	`npm test`	`npm run build`
`*.csproj`	csharp	`dotnet test`	`dotnet build`
`go.mod`	go	`go test ./...`	`go build ./...`
`Cargo.toml`	rust	`cargo test`	`cargo build`

If none match, set language to "unknown", test/build commands to null.

Auto-detect lint command (skip if --lint-cmd provided). Check in order, stop at first match:

#	Detection	Condition	lint_cmd
1	Read `package.json`	`scripts.lint` key exists	`npm run lint`
2	Glob	`.eslintrc` / `eslint.config.` exists	`npx eslint .`
3	Read `pyproject.toml`	`[tool.ruff]` section exists	`ruff check .`
4	Glob+Read	`.pylintrc` exists OR `pyproject.toml` has `[tool.pylint]`	`pylint {scope}`
5	Glob	`.golangci.yml` / `.golangci.yaml` exists	`golangci-lint run`
6	Glob	`Cargo.toml` exists	`cargo clippy`

If none match → null (SKIPPED during verify).

Auto-detect type-check command (skip if --type-check-cmd provided). Check in order, stop at first match:

#	Detection	Condition	type_check_cmd
1	Glob	`tsconfig.json` exists	`npx tsc --noEmit`
2	Glob+Read	`mypy.ini` exists OR `pyproject.toml` has `[tool.mypy]`	`mypy .`
3	Glob+Read	`pyrightconfig.json` exists OR `pyproject.toml` has `[tool.pyright]`	`pyright`
4-6	—	`*.csproj` / `go.mod` / `Cargo.toml`	null (build includes type-check)

If none match → null (SKIPPED during verify).

Determine docs_path:

output_base = cli_flags.output_dir ?? "docs/harness"
docs_path = output_base + "/" + <slug> + "/"

Create directories: .harness/, .harness/planner/, .harness/generator/, {docs_path}

Immediately after docs_path is determined, write partial state.json (crash recovery checkpoint):

{ "version": "2.0", "task": "<task>", "cli_flags": {...}, "user_lang": "<lang>",
  "has_git": <bool>, "created_at": "<ISO8601>", "docs_path": "<docs_path>", "slug": "<slug>" }

Remaining fields (mode, model_config, etc.) are null until Step 1.11 final write.

Create git branch (if has_git): git checkout -b harness/<slug>. Skip if has_git == false.
Mode selection: If --mode provided, use it. Otherwise, ask via AskUserQuestion (in user_lang):
- header: "Mode"
- question: "Select workflow mode:"
- options:
  - "standard (Recommended)" / "2 specialists analyze + synthesize. ~1.5x tokens. Balanced depth"
  - "single" / "1 agent. Fast, token-saving. Best for simple tasks"
  - "multi" / "3 specialists + cross-critique. ~2-2.5x tokens. Deepest analysis"
Model configuration: If --model-config provided, use it. Otherwise, ask via AskUserQuestion (in user_lang):
- header: "Model"
- question: "Select model configuration for sub-agents:"
- options:
  - "default" / "Inherit parent model, no changes"
  - "all-opus" / "All sub-agents use Opus (highest quality)"
  - "balanced (Recommended)" / "Sonnet executor + Opus advisor/evaluator (cost-efficient)"
  - "economy" / "Haiku executor + Sonnet advisor/evaluator (max savings)"
If "Other": parse executor:<model>,advisor:<model>,evaluator:<model>. Validate — only opus, sonnet, haiku. Max 3 retries, then default to balanced. Fill missing roles from balanced defaults.

Store as model_config: { "preset": "<name>", "executor": "<model|null>", "advisor": "<model|null>", "evaluator": "<model|null>", "verifier": "<resolved-verifier>" }. For default preset: { "preset": "default", "verifier": "<resolved-verifier>" }.

10.5. Verifier model determination: model_config.verifier = cli_flags.verifier_model ?? "haiku" (CLI flag takes priority; preset default is always haiku). Store resolved value in model_config.verifier.

**Backward compat (Session Recovery):** When resuming a v2.0 session that lacks `cli_flags`, `verify.autofix_attempted`, `autofix`, or `docs_path`, resolve missing fields to defaults in memory without writing back to state.json:
- `cli_flags` → `{ "verifier_model": null, "output_dir": null }`
- `model_config.verifier` (missing) → `"haiku"`
- `verify.autofix_attempted` (missing) → `false`
- `autofix` (missing) → `null`
- `docs_path` (missing) → `"docs/harness/<slug>/"` (reconstruct from `task` slug)
This ensures `state.get(field, default)` pattern — KeyError-free resume for all pre-v8.1 sessions.

**docs_path usage rule**: Always read `docs_path` directly from state.json. Do NOT recompute from `cli_flags.output_dir`. `cli_flags` is for audit/record purposes only. Safety Guard in Session Recovery also uses `docs_path` directly (not recomputed).

11. Write .harness/state.json:

{
  "version": "2.0",
  "task": "<task>",
  "mode": "single|standard|multi",
  "run_style": "auto|phase|step",
  "model_config": {
    "preset": "<name>",
    "executor": "<model|null>",
    "advisor": "<model|null>",
    "evaluator": "<model|null>",
    "verifier": "<haiku|sonnet|opus>"
  },
  "cli_flags": {
    "verifier_model": null,
    "output_dir": null
  },
  "user_lang": "<lang>",
  "has_git": true,
  "repo_name": "<name>",
  "repo_path": "<path>",
  "phase": "plan_ready",
  "round": 1,
  "max_rounds": 3,
  "max_files": 20,
  "scope": "<scope or (no limit)>",
  "branch": "harness/<slug>",
  "lang": "<detected>",
  "build_cmd": "<cmd or null>",
  "test_cmd": "<cmd or null>",
  "lint_cmd": "<cmd or null>",
  "type_check_cmd": "<cmd or null>",
  "verify": {
    "layer1_result": null,
    "layer1_retries": 0,
    "layer2_result": null,
    "layer2_retries": 0,
    "todo_blocking": false,
    "autofix_attempted": false
  },
  "autofix": null,
  "docs_path": "<output_base>/<slug>/",
  "conventions": null,
  "created_at": "<ISO8601>",
  "updated_at": "<ISO8601>"
}

cli_flags.verifier_model and cli_flags.output_dir are null by default (no CLI override). verify.autofix_attempted starts false each new session (session-wide once-only limit — not reset on round increment). autofix starts null; transitions to { "last_patch_path": "...", "applied": "proposed"|"applied"|"rejected"|"stopped", "triggered_at": "<ISO8601>" } during H2 flow.

Print setup summary (in user_lang):

[harness] Task started!
  Directory : <path>
  Branch    : harness/<slug>     ← omit if has_git == false
  Mode      : <single | standard | multi>
  Model     : <preset>
  Verifier  : <model_config.verifier>    ← always shown
  Style     : <auto | phase | step>
  Language  : <lang>
  Test      : <test_cmd or "none">
  Build     : <build_cmd or "none">
  Lint      : <lint_cmd or "none">
  TypeCheck : <type_check_cmd or "none">
  Scope     : <scope>
  Output    : <docs_path>

If model_config.verifier is sonnet or opus, also print:

  ⚠ Verifier set to <model> — high cost for mechanical verification. haiku is usually sufficient.

Proceed to Step 1.5 (Convention Scan). If run_style == "step" and the CLI step is not plan, check prerequisites and jump to the requested step after Step 1.5 completes.

Step 1.5: Convention Scan

This step runs after Setup and before Plan, in all modes.

Persisted Spec Artifacts Check (NEW in 8.4):

Before running CLAUDE.md richness check, look for {docs_path}conventions.md (persisted by /spec Phase 3 in slug-matched directory). (m7) {docs_path} is read from state.json (set by Step 1 step 7 — see §state.json schema).

Evaluation order (NEW in 8.4 v2 hardening — explicit decision tree to remove M2 vs Resume idempotency ambiguity):

IF  state.conventions == "file:.harness/conventions.md"  THEN
    IF  .harness/conventions.md exists  THEN
        // (M2) Skip — live .harness/conventions.md is authoritative on /workflow resume.
        skip Persisted Spec Artifacts Check entirely.
        proceed to CLAUDE.md richness flow below.
    ELSE  IF  {docs_path}conventions.md exists  THEN
        // Resume idempotency — re-copy /spec snapshot.
        copy {docs_path}conventions.md → .harness/conventions.md.
        proceed to Step 2 (Plan) — skip rich/sparse/missing trichotomy.
    ELSE
        // Both files missing — reset state and fall through.
        state.conventions = null  (atomic single-write).
        proceed to CLAUDE.md richness flow below (treat as fresh execution).
    END
ELIF  state.conventions IN { null, "skipped" }  THEN
    // Fresh /workflow session OR explicitly skipped — fall through.
    IF  {docs_path}conventions.md exists  THEN
        copy {docs_path}conventions.md → .harness/conventions.md.
        set state.conventions = "file:.harness/conventions.md".
        proceed to Step 2 (Plan).
    ELSE
        proceed to CLAUDE.md richness flow below.
    END
END

(M2) Skip condition for resume (covered by the first IF branch above): If state.conventions == "file:.harness/conventions.md" AND .harness/conventions.md already exists (e.g., a prior /workflow session scanned conventions itself, then the session was paused and resumed), skip this entire Persisted Spec Artifacts Check and proceed directly to the existing CLAUDE.md richness flow below — the live .harness/conventions.md is authoritative for resumed /workflow sessions and must NOT be overwritten by a possibly-stale /spec copy.

Otherwise (fresh /workflow session, or .harness/conventions.md missing — see decision tree above):

File {docs_path}conventions.md exists → copy content to .harness/conventions.md, set state.conventions → "file:.harness/conventions.md", skip the rich/sparse/missing trichotomy entirely. Proceed to Step 2 (Plan).
File {docs_path}conventions.md does NOT exist → proceed with the existing CLAUDE.md richness flow below.

Resume idempotency: if state.conventions == "file:.harness/conventions.md" but .harness/conventions.md is missing (e.g., .harness/ was deleted between sessions) AND the M2 skip condition above did NOT trigger, the persisted check re-copies from {docs_path}conventions.md if still present. If both .harness/conventions.md AND {docs_path}conventions.md are missing, reset state.conventions = null (this reset MUST be a state.json atomic single-write read-modify-write — same single-write contract /spec defines in its §Atomicity Contract; do NOT only update in-memory because the next Session Recovery resume will re-read state.json and find the stale "file:..." value, looping the same fallback) and fall through to the existing CLAUDE.md richness flow (treat as fresh execution — no convention context available).

CLAUDE.md Richness Check:

Check if CLAUDE.md exists in the repository root.
If it exists, count lines: wc -l CLAUDE.md (or read and count).
Richness determination:
- Exists AND ≥ 50 lines → rich → skip scan, read CLAUDE.md content as conventions.
- Exists AND < 50 lines → sparse → proceed to scan Q&A.
- Does not exist → missing → proceed to scan Q&A.

conventions field contract: Always stores one of three values:

null → Step 1.5 not yet executed (initial state)
"skipped" → user explicitly chose to skip convention scan
"file:<path>" → conventions available at the given path (e.g., "file:.harness/conventions.md")

Conventions injection rule (used by Step 2): When conventions starts with "file:", read the file at the path after the prefix. If the file does not exist, treat as null and re-run Step 1.5. When conventions is null or "skipped", pass {conventions} as empty string.

If rich (CLAUDE.md ≥ 50 lines):

Copy CLAUDE.md content to .harness/conventions.md (so all convention sources use the same path pattern).
Store conventions → "file:.harness/conventions.md" in state.json.
Print: [harness] Conventions: CLAUDE.md detected (rich). Copied to .harness/conventions.md Proceed to Step 2 (Plan).

If sparse or missing:

Ask via AskUserQuestion (in user_lang):

header: "Convention Scan"
question: "No rich CLAUDE.md found. Scan codebase to auto-detect project conventions (DB, API, file structure, test patterns)? This helps the Planner align with existing patterns."
options:
- "Scan" / "Run convention scanner sub-agent (~1 token overhead)"
- "Skip" / "Proceed without convention data"

If "Skip": Set conventions → "skipped" in state.json. Print: [harness] Conventions: skipped. Proceed to Step 2.

If "Scan":

Read template: {CLAUDE_PLUGIN_ROOT}/templates/planner/convention_scanner.md
Fill variables: {repo_path}, {lang}, {scope}, {output_path} = .harness/conventions.md.
Dispatch 1 sub-agent (convention scanner). Model: if preset ≠ "default", use model_config.advisor (or haiku for economy).
Parse return — first line should contain "conventions written".
Verify .harness/conventions.md exists.
- If file does NOT exist (sub-agent reported success but file missing): warn user (in user_lang): "Convention scan completed but output file not found." Ask via AskUserQuestion: header "Convention Scan Failed", question "Output file missing. Retry or skip?", options: "Retry" / "Re-run scanner", "Skip" / "Proceed without conventions". If "Retry" → re-dispatch sub-agent (max 2 retries). If "Skip" → set conventions → "skipped". Do NOT store a "file:" reference to a non-existent file.
Store conventions → "file:.harness/conventions.md" in state.json.
Print: [harness] Conventions: scanned and saved to .harness/conventions.md

Update state.json: updated_at → now. Proceed to Step 2 (Plan).

Step 2: Plan Phase

Update state.json: phase → "plan_ready", updated_at → now.

Print: [harness] Phase: Plan

Discovery Notes Injection (NEW in 8.4) — applies to all planner dispatches in single/standard/multi mode:

Before any planner sub-agent dispatch, prepare:

qa_discovery_notes = read content of {docs_path}qa_notes.md:
- File missing → empty string "" (silent — pre-8.4 session or fresh /workflow run with no preceding /spec).
- (s2) File exists but read fails (permission, encoding, IO error) → warn user (in user_lang): "Failed to read {docs_path}qa_notes.md: . Discovery Notes will be empty for planner injection." Then fall back to empty string "" and proceed (do NOT abort the dispatch — empty Discovery Notes is harmless per the backward-compat note below).
critic_findings = read content of {docs_path}critic_findings.md using the same pattern (missing → empty silently; read failure → warn + empty fallback).

When dispatching ANY planner template (architect.md, senior_developer.md, qa_specialist.md, planner_single.md), pass {qa_discovery_notes} and {critic_findings} as keyword variables (always — even when empty, to avoid str.format() KeyError on the new placeholders).

(m4) Scope of injection: this injection applies ONLY to the initial proposal dispatch sub-agents — single mode dispatch (Step 2 single mode step 3) + standard/multi mode 2a "Independent Proposals" dispatches. Synthesis (standard 2b, multi 2c) and Cross-Critique (multi 2b) sub-agents do NOT receive {qa_discovery_notes} / {critic_findings} directly — they read proposal files which already incorporate this context (each proposal sub-agent at 2a-time received the Discovery Notes and embedded the relevant insights into its proposal output). Do NOT double-inject downstream; the synthesis/cross-critique templates do not declare these placeholders.

Backward compat: pre-8.4 sessions where these files do not exist receive empty strings — templates render the ## Discovery Notes from Spec Phase section with empty sub-bodies, which is harmless.

(M10) Common dispatch variable set (canonical — referenced by all 3 mode-specific dispatch steps): every planner sub-agent receives the following template variables; mode-specific dispatch steps below say "Common dispatch variable set + " instead of re-enumerating, so a future variable addition only updates this canonical list:

{task_description} — from state.json
{repo_path} — from state.json (or cwd for greenfield)
{lang} — from state.json (project primary language detection)
{scope} — from state.json (file scope filter)
{user_lang} — from state.json
{qa_discovery_notes} — prepared above (always pass; empty string fallback)
{critic_findings} — prepared above (always pass; empty string fallback)
{conventions} — derived from state.conventions per §Conventions injection rule (always pass; empty string fallback) Mode-specific dispatch steps add {output_path} (architect/senior_developer/qa_specialist) or {spec_path} (planner_single only).

Step 2 — single mode

Update phase → "planning".
Read template: {CLAUDE_PLUGIN_ROOT}/templates/planner/planner_single.md
Dispatch 1 sub-agent with prompt built from the Common dispatch variable set (see §Step 2 Discovery Notes Injection above) plus mode-specific extra {spec_path} = {docs_path}spec.md.
- Model: if preset ≠ "default", use model_config.advisor.
Parse return → extract first line. Print: ✓ {first line}
Verify spec.md exists.
Update phase → "plan_done", updated_at → now.

Step 2 — standard mode

2a: Independent Proposals (Parallel)

Update phase → "planning".
Read templates: architect.md, senior_developer.md
Dispatch 2 sub-agents in parallel. Each gets the Common dispatch variable set (see §Step 2 Discovery Notes Injection above) plus mode-specific extra {output_path} = .harness/planner/proposal_<persona>.md.
- Model: if preset ≠ "default", use model_config.advisor.
Parse returns. Print: ✓ 2 proposals generated
Verify both proposal files exist.

2b: Synthesis

Read template: synthesis_standard.md
Dispatch 1 sub-agent with: {task_description}, {user_lang}, {all_proposals} (read both proposals, concatenate with author labels), {spec_path}.
- Model: if preset ≠ "default", use model_config.advisor.
Parse return. Print: ✓ {first line}
Verify spec.md exists.
Update phase → "plan_done", updated_at → now.

Step 2 — multi mode

2a: Independent Proposals (Parallel)

Update phase → "planning".
Read templates: architect.md, senior_developer.md, qa_specialist.md
Dispatch 3 sub-agents in parallel. Each gets the Common dispatch variable set (see §Step 2 Discovery Notes Injection above) plus mode-specific extra {output_path} = .harness/planner/proposal_<persona>.md.
- Model: if preset ≠ "default", use model_config.advisor.
Parse returns. Print: ✓ 3 proposals generated
Verify all 3 proposal files exist.

2b: Cross-Critique (Parallel)

Read template: cross_critique.md
Read all 3 proposals.
For each persona, prepare critique prompt with the OTHER two proposals.
Dispatch 3 sub-agents in parallel. Each writes to .harness/planner/critique_<persona>.md.
- Model: if preset ≠ "default", use model_config.advisor.
Parse returns. Print: ✓ 3 cross-critiques completed
Verify all 3 critique files exist.

2c: Synthesis

Read template: synthesis.md
Read all 6 files (3 proposals + 3 critiques).
Dispatch 1 sub-agent with: {task_description}, {user_lang}, {all_proposals}, {all_critiques}, {spec_path}.
- Model: if preset ≠ "default", use model_config.advisor.
Parse return. Print: ✓ {first line}
Verify spec.md exists.
Update phase → "plan_done", updated_at → now.

After Plan Phase

Print: [harness] Plan complete.

If run_style == "phase" or (run_style == "step" and requested step was plan): Print spec.md path, inform user session can end. Halt.

If run_style == "auto": Continue to Step 3 (Gate).

Step 3: HARD GATE — Spec Confirmation

Read and show spec.md to the user. Ask via AskUserQuestion (in `user_lang`): - header: "Spec" - question: "Review the spec above. Implementation consumes significant tokens. Confirm to proceed." - options: - "Proceed" / "Start implementation as specified" - "Modify" / "Edit the spec, then re-confirm" - "Stop" / "Halt the workflow"

If "Modify": update spec.md and re-present. If "Stop": halt. Only "Proceed" advances.

Update state.json: phase → "generate_ready", updated_at → now.

Step 4: Generate Phase

Print: [harness] Phase: Generate

Print: Dispatching generator sub-agent...

Step 4 — single mode

Update phase → "generating", updated_at → now.
Read template: generator_single.md
Prepare prompt: {spec_content} from spec.md, {qa_feedback} from qa_report.md if round > 1 else "(First round)", {round_num}, {scope}, {max_files}, {user_lang}, {changes_path} = {docs_path}changes.md.
- If retry (from verify/evaluate failure): add {verify_failure} = 1-line FAIL summary, {verify_report_path} = {docs_path}verify_report.md.
- Model: if preset ≠ "default", use model_config.executor.
Dispatch 1 sub-agent.
Parse return. Print: ✓ {first line}
Verify changes.md exists.
Update phase → "generate_done", updated_at → now.

Step 4 — standard mode

4a: Implementation Plan

Update phase → "generating", updated_at → now.
Read template: lead_developer.md
Dispatch 1 sub-agent (Lead Developer): {spec_content}, {qa_feedback}, {repo_path}, {lang}, {scope}, {max_files}, {user_lang}, {output_path} = .harness/generator/plan.md.
- Model: if preset ≠ "default", use model_config.executor.
Parse return. Print: ✓ Plan: {first line}
Verify .harness/generator/plan.md exists.

4b: Combined Advisory Review

Read template: combined_advisor.md
Read .harness/generator/plan.md.
Dispatch 1 sub-agent (Combined Advisor): {spec_content}, {plan_content}, {repo_path}, {lang}, {test_cmd}, {user_lang}, {output_path} = .harness/generator/review_combined.md.
- Model: if preset ≠ "default", use model_config.advisor.
Parse return. Print: ✓ Review: {first line}
Verify .harness/generator/review_combined.md exists.

4c: Implementation

Read template: implementation_standard.md
Read plan + review from .harness/generator/.
Prepare prompt: {spec_content}, {plan_content}, {advisor_review}, {qa_feedback}, {repo_path}, {lang}, {scope}, {max_files}, {user_lang}, {round_num}, {changes_path} = {docs_path}changes.md.
- If retry: add {verify_failure}, {verify_report_path} = {docs_path}verify_report.md.
- Model: if preset ≠ "default", use model_config.executor.
Dispatch 1 sub-agent.
Parse return. Print: ✓ Code: {first line}
Verify {docs_path}changes.md exists.
Update phase → "generate_done", updated_at → now.

Step 4 — multi mode

4a: Implementation Plan

Same as standard 4a.

4b: Advisory Review (Parallel)

Read templates: code_quality_advisor.md, test_stability_advisor.md
Read .harness/generator/plan.md.
Dispatch 2 sub-agents in parallel. Each writes to .harness/generator/review_<advisor>.md.
- Model: if preset ≠ "default", use model_config.advisor.
Parse returns. Print: ✓ Reviews: 2 advisors completed
Verify both review files exist.

4c: Implementation

Read template: implementation.md
Read plan + both reviews from .harness/generator/.
Prepare prompt: {spec_content}, {plan_content}, {code_quality_review}, {test_stability_review}, {qa_feedback}, {repo_path}, {lang}, {scope}, {max_files}, {user_lang}, {round_num}, {changes_path} = {docs_path}changes.md.
- If retry: add {verify_failure}, {verify_report_path} = {docs_path}verify_report.md.
- Model: if preset ≠ "default", use model_config.executor.
Dispatch 1 sub-agent.
Parse return. Print: ✓ Code: {first line}
Verify {docs_path}changes.md exists.
Update phase → "generate_done", updated_at → now.

After Generate Phase

Print: [harness] Generate complete.

If run_style == "phase" or (run_style == "step" and requested step was generate): Inform user, halt.

If run_style == "auto": Continue to Step 5 (Verify).

Step 5: Verify Phase (Layer 1 — Mechanical)

First entry only (from generate_done, not from retry loop): Update state.json: phase → "verify_ready", verify.layer1_result → null, verify.layer1_retries → 0, updated_at → now.

Retry re-entry (from Generator retry): Update state.json: phase → "verify_ready", verify.layer1_result → null, updated_at → now. Do NOT reset layer1_retries — it was already incremented at retry dispatch.

Print: [harness] Phase: Verify (Layer 1 — Mechanical)

Read template: {CLAUDE_PLUGIN_ROOT}/templates/verify/verify_layer1.md
Prepare prompt with:
- {build_cmd}: from state.json (or "SKIP" if null)
- {test_cmd}: from state.json (or "SKIP" if null)
- {lint_cmd}: from state.json (or "SKIP" if null)
- {type_check_cmd}: from state.json (or "SKIP" if null)
- {changes_md_path}: {docs_path}changes.md
- {verify_report_path}: {docs_path}verify_report.md
- {todo_blocking}: from state.json verify.todo_blocking
Update phase → "verifying", updated_at → now.
Dispatch Verify sub-agent with model: model_config.verifier (default: haiku; override via --verifier-model).
Parse return — first line:
- Contains "PASS" → verify.layer1_result → "PASS"
- Contains "FAIL" → verify.layer1_result → "FAIL"
Update phase → "verify_done", updated_at → now.

If PASS:

Print:

[harness] Verify (Layer 1) complete.
  Result : PASS
  {first line from sub-agent}

Continue to Step 6.

If FAIL and retries < 3:

Increment verify.layer1_retries in state.json. Print:

[harness] Verify (Layer 1) FAIL — retrying Generator (attempt {layer1_retries}/3)
  {first line error summary}

Generator retry — regardless of mode, launch one implementation sub-agent only (no re-plan, no re-review):

single → generator_single.md
standard → implementation_standard.md (with existing plan.md + review_combined.md)
multi → implementation.md (with existing plan.md + reviews)

Add to prompt:

{verify_failure} = the 1-line FAIL summary
{verify_report_path} = {docs_path}verify_report.md
Model: if preset ≠ "default", use model_config.executor.

Update phase → "generating", updated_at → now (skip generate_ready — retry is automatic, no user gate). After retry sub-agent completes: phase → "generate_done", updated_at → now, then loop back to Step 5 (re-run verify).

If FAIL and retries >= 3:

Print:

[harness] Verify (Layer 1) FAIL — max retries reached (3/3)
  Latest error: {first line error summary}
  See: {docs_path}verify_report.md

Ask via AskUserQuestion (in `user_lang`): - header: "Verify" - question: "Mechanical verification failed after 3 attempts. [error summary]" - options: - "Auto-fix proposal" / "Let AI (Opus) analyze the failure and propose a minimal diff (1 attempt only)" ← **HIDE this option if `verify.autofix_attempted == true OR state.autofix != null`** (see §State Machine — I2) - "Continue to Evaluator" / "Skip remaining verify issues, proceed to QA" - "Stop" / "Halt for manual intervention. Review verify_report.md"

If "Continue": proceed to Step 6. If "Stop": halt (keep phase as verify_done).

If "Auto-fix proposal":

verify.autofix_attempted is set to true only after the 2nd HARD-GATE decision (Apply/Reject/Stop), NOT at Proposer dispatch. This ensures session interruption between dispatch and the 2nd gate does not consume the once-only right (I1). On session resume with autofix.applied == "proposed": re-enter 2nd HARD-GATE directly using saved autofix.last_patch_path — skip 1st GATE (I3).

Update state.json: autofix → { "last_patch_path": ".harness/generator/auto_fix_patch.md", "applied": "proposed", "triggered_at": "<ISO8601>" }
Read template: {CLAUDE_PLUGIN_ROOT}/templates/generator/auto_fix_proposer.md
Fill variables (pass paths only — Proposer sub-agent reads files directly):
- {spec_path} = {docs_path}spec.md
- {changes_md_path} = {docs_path}changes.md
- {verify_report_path} = {docs_path}verify_report.md
- {failing_files_list} = Orchestrator reads verify_report.md directly to extract file paths (explicit exception to §Architecture Principles #1 — path extraction only, no content analysis). After extraction:
  - Apply validate_path(path, kind=file_reference) to each path.
  - Violations: drop path + print [harness] ⚠ Path validation failed: <path> — excluded from Proposer input
  - Cap: maximum 5 paths. Excess paths dropped silently.
  - If 0 valid paths remain: print [harness] ⚠ No valid file paths found — Proposer input will be empty
- {output_path} = .harness/generator/auto_fix_patch.md
Dispatch Auto-fix Proposer sub-agent with model: model_config.advisor ?? "opus".
- If model_config.preset == "default", use "opus" (explicit upgrade — 2nd GATE UI will warn cost).
Parse return 1-line. Extract confidence level. If return format is non-standard (cannot parse confidence), treat as confidence: Unknown and print [harness] ⚠ 1-line return parse failed — fallback: confidence Unknown.
Verify .harness/generator/auto_fix_patch.md exists.
Empty patch check: verify auto_fix_patch.md contains at least one ```diff code block AND at least one @@ hunk header.
- If absent: skip Apply, print [harness] ⚠ Patch file is empty or has no diff block — apply skipped, return to HARD-GATE (Auto-fix hidden).

Show confidence level + 1-line summary from patch file. Print before question: `[harness] ℹ Auto-fix model: {model_config.advisor ?? 'opus'}` Ask via AskUserQuestion (in `user_lang`): - header: "Auto-fix" - question: "Proposed fix generated (confidence: {level}). [If confidence == Low: ⚠ Low confidence — review the diff carefully before applying.] Apply the patch?" - options: - "Apply patch" / "Apply the proposed diff and re-run Layer 1 verification (retry counter unchanged)" - "Reject" / "Discard proposal, return to previous gate (Auto-fix option hidden)" - "Stop" / "Halt for manual intervention"

After 2nd HARD-GATE decision, set verify.autofix_attempted = true in state.json.

If "Apply patch":

Before applying: snapshot current state via git stash (if has_git == true) or copy changed files to .harness/autofix_pre_apply/ (if has_git == false).
Pre-apply path validation: parse all --- a/<path> and +++ b/<path> headers from auto_fix_patch.md (metadata only — 4 header lines per hunk; hunk body is not parsed). Apply validate_path(path, kind=diff_target) to each path.
- Print to user: [harness] Applying patch to: <path list>
- If any path fails validation: reject Apply, print [harness] ✗ Diff path validation failed: <path>, return to HARD-GATE (Auto-fix hidden).
Apply unified diff from .harness/generator/auto_fix_patch.md using Edit tool.
- If any hunk fails to apply: restore from snapshot, warn user "Apply failed — reverted to pre-apply state.", return to HARD-GATE (retries >= 3, Auto-fix hidden).
Update state.json: autofix.applied → "applied". Reset verify.layer1_result → null.
Re-run Layer 1 Verify (retry counter layer1_retries unchanged — do NOT increment):
- PASS → proceed to Step 6.
- FAIL → update state.json: autofix.applied → "stopped", layer1_retries = min(layer1_retries, 3) (clamp — see §State Machine I4). Return to FAIL retries >= 3 HARD-GATE (Auto-fix option hidden since verify.autofix_attempted == true).

If "Reject":

Update state.json: autofix.applied → "rejected".
Return to FAIL retries >= 3 HARD-GATE (Auto-fix option hidden).

Layer 2 FAIL path: Auto-fix proposal does NOT apply to Layer 2 structural failures (Step 7). Mechanical diff cannot fix structural issues.

After Verify Phase

If run_style == "phase" or (run_style == "step" and requested step was verify): Inform user of result, halt.

If run_style == "auto": Continue to Step 6.

Step 6: Evaluate Phase (Layer 2 + Layer 3)

Update state.json: phase → "evaluate_ready", updated_at → now.

Print: [harness] Phase: Evaluate (Layer 2+3)

Print: Dispatching evaluator sub-agent...

Read template: {CLAUDE_PLUGIN_ROOT}/templates/evaluator/evaluator_prompt.md
Prepare prompt:
- {spec_content} from spec.md
- {changed_files_list} — file paths only from changes.md, strip all "reason" descriptions (anchoring prevention)
- {test_available}, {build_cmd}, {test_cmd}, {round_num}, {scope}, {user_lang}
- {qa_report_path} = {docs_path}qa_report.md
- {verify_context}:
  - If verify.layer1_result == "PASS": "Layer 1 PASSED — build/test/lint/type-check verified. See {docs_path}verify_report.md"
  - If verify.layer1_result == "FAIL" (user chose Continue): "Layer 1 FAILED (user proceeded despite failures) — see {docs_path}verify_report.md. Pay extra attention to build/test correctness."
  - If verify skipped: "Layer 1 was not executed for this session."
- Do NOT include: Generator reasoning, implementation plans, advisor reviews, or references to "Generator"/"AI"/"agent".
Update phase → "evaluating", updated_at → now.
Dispatch Evaluator sub-agent using subagent_type: "superpowers:code-reviewer" if available.
- Model: if preset ≠ "default", use model_config.evaluator.
Parse return — first line:
- Contains "PASS" → verify.layer2_result → "PASS". Print: ✓ {first line}
- Contains "FAIL L2" → verify.layer2_result → "FAIL". Print: ✗ {first line}
- Contains "FAIL L3" → verify.layer2_result → "PASS" (Layer 2 passed). Print: ✗ {first line}
- Contains "FAIL" (no layer indicator) → treat as L3 FAIL. verify.layer2_result → "PASS".
Update phase → "evaluate_done", updated_at → now.

Print: [harness] Evaluate complete.

If run_style == "phase" or (run_style == "step" and requested step was evaluate): Inform user, halt.

If run_style == "auto": Continue to Step 7.

Step 7: Verdict & Loop

Read qa_report.md. Look for "### Verdict: PASS" or "### Verdict: FAIL". Also check verify.layer2_result from state.json to determine failing layer.

If PASS:

Update state.json: phase → "completed", updated_at → now. Print: [harness] ✓ QA PASS — task complete. Proceed to Step 8.

If FAIL — Layer 2 (verify.layer2_result == "FAIL") and layer2_retries < 2:

Layer 2 failed. Auto-retry without user gate (same pattern as Layer 1 retry).

Increment verify.layer2_retries in state.json. Print:

[harness] Evaluate FAIL (Layer 2) — retrying Generator (attempt {layer2_retries}/2)
  {first line from evaluator}

Launch one implementation sub-agent (retry, no re-plan/re-review) — same rules as Layer 1 retry:

single → generator_single.md
standard → implementation_standard.md (with existing plan.md + review_combined.md)
multi → implementation.md (with existing plan.md + reviews)

Add {verify_failure} = evaluator 1-line FAIL, {verify_report_path} = {docs_path}qa_report.md. Model: if preset ≠ "default", use model_config.executor.

Update phase → "generating", updated_at → now (skip generate_ready). After retry completes: phase → "generate_done", updated_at → now, then run full Verify → Evaluate pipeline (Step 5 → Step 6 → Step 7).

If FAIL — Layer 2 and layer2_retries >= 2:

Print:

[harness] Evaluate FAIL (Layer 2) — max retries reached (2/2)
  Failing items: {summary from qa_report.md}

Ask via AskUserQuestion (in user_lang):

header: "QA"
question: "Layer 2 structural verification failed after 2 retries. [failing items]"
options:
- "Fix" / "Run next round"
- "Accept as-is" / "Finish without fixing"

If "Fix": same as Layer 3 Fix below. If "Accept as-is": phase → "completed", proceed to Step 8.

If FAIL — Layer 3 (verify.layer2_result == "PASS") and rounds remaining (round < max_rounds):

Ask via AskUserQuestion (in user_lang):

header: "QA"
question: "QA result: FAIL (Layer 3). [failure summary from qa_report.md Fix Instructions]."
options:
- "Fix" / "Run next round to fix FAIL items"
- "Accept as-is" / "Finish without fixing"

If "Fix":

Increment round, reset verify.layer1_retries → 0, verify.layer1_result → null, verify.layer2_result → null, verify.layer2_retries → 0.
Update updated_at → now.
Go to Step 4 (Generate).

If "Accept as-is":

Update phase → "completed", updated_at → now.
Proceed to Step 8.

If FAIL and max rounds reached:

Update phase → "completed", updated_at → now. Print: [harness] Max rounds reached. Remaining issues in qa_report.md. Proceed to Step 8.

Step 8: Cleanup & Finalize

Artifact Cleanup Safety Guard

Before deleting any output directory:

Read docs_path from state.json. If missing/null: reconstruct as "docs/harness/<slug>/". Extract <slug> = last path segment before the trailing /.
Validate slug: If empty/null/whitespace → ABORT, warn user.
Path depth check: docs_path must be a relative path exactly one level below its parent directory. slug must NOT be memory (reserved), must NOT contain .. or /, must NOT be .. Always verify: Path(docs_path).resolve() ⊆ Path.cwd() (symlink escape prevention — no has_git condition). If any fail → ABORT.
Display before delete: Print exact target path before executing.

Full output base cleanup (only on explicit user request):

List all subdirectories with file counts.
If docs/harness/memory/ exists, warn separately: "Contains team knowledge from /memory skill."
Warn: "docs/ is git-ignored — all artifacts permanently deleted."
Confirm via AskUserQuestion (yes/no).

If has_git == true:

Ask via AskUserQuestion (in user_lang):

header: "Commit"
question: "Implementation complete. Choose how to finish:"
options:
- "Commit code only (Recommended)" / "Clean artifacts, commit code changes only"
- "Commit all" / "Commit everything including artifacts"
- "No commit" / "Clean .harness/ only, keep changes in working tree"

Actions (apply Safety Guard before each delete):

"Commit code only": (NEW in 8.4 — protect persisted spec artifacts) Apply this exact 5-step sequence:
1. (M8) Safety Guard validation on {docs_path} — apply the full Artifact Cleanup Safety Guard above (slug check + path depth + Path.cwd() containment) BEFORE any staging or deletion. If validation fails, ABORT: do NOT stage, do NOT delete. Surface the failed check to the user. Both .harness/ and {docs_path} remain intact for manual recovery.
2. Stage spec-persistence files for commit (only if the source file exists — silently skip missing files):
  - {docs_path}qa_notes.md
  - {docs_path}critic_findings.md
  - {docs_path}conventions.md
  (s4) Per-file staging failure handling: if git add <file> fails for a specific file (permission, .gitignore conflict, etc.), warn the user (in user_lang): "Failed to stage <file>: . Spec artifact may not be in git history." Continue with remaining files — do NOT abort the whole sequence on a single staging failure. The code commit (step 5) is more critical than any individual artifact preservation.
3. Delete .harness/ (the Safety Guard already validated the parent context).
4. Delete {docs_path} working-directory contents.
5. Commit code changes plus the staged spec artifacts.
(m2) git index vs working directory note: between step 2 (stage) and step 4 (delete working dir), the spec artifact files are staged in the git index but then removed from the working directory. This is correct behavior — git add captures a snapshot to the index at stage-time; a subsequent working-directory rm does NOT mark the staged files as deleted in the index (a working-tree-only delete after index-stage is a no-op for the index — it only changes the working tree, not the staged content). The final commit (step 5) therefore includes the 3 staged artifacts as additions even though they no longer exist on disk; they remain recoverable from git history.
"Commit all": delete .harness/, stage + commit {docs_path} + code
"No commit": delete .harness/ only

If has_git == false:

Inform user artifacts are in {docs_path}. Delete .harness/ only. No git operations.

Model Selection

Preset	executor	advisor	evaluator	verifier
default	(parent)	(parent)	(parent)	haiku (default)
all-opus	opus	opus	opus	haiku (default)
balanced	sonnet	opus	opus	haiku (default)
economy	haiku	sonnet	sonnet	haiku (default)

Planner Phase Sub-agents

Sub-agent	Role	default	all-opus	balanced	economy
Architect	advisor	(no override)	opus	opus	sonnet
Senior Developer	advisor	(no override)	opus	opus	sonnet
QA Specialist	advisor	(no override)	opus	opus	sonnet

Generator Phase Sub-agents

Sub-agent	Role	default	all-opus	balanced	economy
Lead Developer	executor	(no override)	opus	sonnet	haiku
Code Quality Advisor	advisor	(no override)	opus	opus	sonnet
Test & Stability Advisor	advisor	(no override)	opus	opus	sonnet
Combined Advisor	advisor	(no override)	opus	opus	sonnet
Generator (single)	executor	(no override)	opus	sonnet	haiku

Evaluator Phase Sub-agents

Sub-agent	Role	default	all-opus	balanced	economy
Evaluator	evaluator	(no override)	opus	opus	sonnet

Verify Phase Sub-agents

Sub-agent	Role	default	all-opus	balanced	economy
Verify (Layer 1)	verifier	haiku (default)	haiku (default)	haiku (default)	haiku (default)

Verifier defaults to haiku across all presets. Layer 1 only executes commands and parses exit codes — lowest-cost model is always sufficient. Override with --verifier-model sonnet|opus for sensitive mechanical verification (e.g., concurrency, complex test failures). Opt-in only. When set to sonnet or opus, a cost warning is shown in Setup Summary.

Applying model config: When launching any sub-agent, if model_config.preset ≠ "default", pass model per the table. Sub-agents do NOT read model_config from state.json — the orchestrator passes the model at launch.

User Interaction Rules

All user-facing questions MUST use AskUserQuestion tool when available.

If AskUserQuestion available → use it
If not available or fails → present as text with numbered options
Every option: label (short) + description (specific)
"Other" (free text) is automatically appended
Translate all text to user_lang

Architecture Principles

The following 5 principles are invariant constraints for the harness Orchestrator.

Orchestrator reads no intermediate files. Exceptions:
- spec.md at plan gate
- qa_report.md at verdict gate
- verify_report.md path for user message
- verify_report.md failing-file extraction for Auto-fix Proposer dispatch: Orchestrator reads verify_report.md to extract failing file paths only (no content analysis). Extracted paths pass through Path Validator (kind=file_reference) and are capped at 5. See §Step 5 — Auto-fix dispatch for the exact procedure.
- Apply-before --- a/ / +++ b/ diff header lines (4 lines of metadata only — hunk body is delegated to Edit tool). This is NOT a violation of this principle.
Auto-fix Proposer is the only sub-agent that directly Reads source files. Other sub-agents receive content only through template variables.
Paths only to sub-agents; never file contents (ephemeral critiques/proposals at synthesis step excepted).
Session-wide invariants (see §State Machine — Auto-fix State Transition Table):
- Auto-fix: at most 1 attempt per session (verify.autofix_attempted once-only — not reset on round increment).
- Layer 1 retries: max 3. Do NOT reset after Auto-fix Apply.
All external paths pass through Path Validator before use (see §Path Validator below).

Path Validator

Orchestrator internal conceptual function. Call sites: --output-dir parsing (Step 1.2), {failing_files_list} injection (Step 5), Edit tool unified diff Apply (Step 5), Session Recovery re-validation (Session Recovery).

validate_path(path, kind) where kind ∈ {output_dir, file_reference, diff_target}

  0. (kind == output_dir only) Empty string → halt "output-dir cannot be empty."
  1. Normalize: \ → / (OS-independent). UNC (\\server\share or //server/share) → halt.
  2. Absolute path: ^/ or ^[A-Za-z]:/ → halt.
  3. Segment-level ..: path.split("/") — any segment == ".." → halt (exact segment match, not substring).
  4. kind-specific:
     - output_dir:
         First segment (path.split("/")[0]) ∉ {memory, spec, planner, generator,
         evaluator, verify, harness, .harness}.
         Special case (NEW in 8.4): if first segment == "docs", second segment
         MUST == "harness" (path startswith "docs/harness/"). Else halt:
         "output-dir under docs/ must be docs/harness/..." (allows /spec handoff
         path docs/harness/<slug>/ while still blocking other docs/* overrides.)
     - file_reference (failing_files_list):
         (a) relative path, (b) no .. segment, (c) inside repo_path,
         (d) outside .harness/, docs/harness/*, memory/.
     - diff_target (unified diff --- a/ / +++ b/ headers):
         file_reference conditions + inside scope filter +
         outside .harness/, docs/harness/, memory/, .git/.
  5. On failure: return specific halt message describing the violation.

Attack vector → Path Validator step mapping:

Attack vector	Blocked at step
`--output-dir .harness`	Step 4 (kind=output_dir, first segment reserved)
`--output-dir docs/../../etc`	Step 3 (segment `..` rejection)
`--output-dir \\server\share`	Step 1 (normalization + UNC rejection)
`--output-dir /absolute/path`	Step 2 (absolute path rejection)
`--output-dir memory/foo`	Step 4 (first segment reserved)
`--output-dir` (empty)	Step 0 (empty string, kind=output_dir)

Key Rules

Never skip phases. Always Plan → Generate → Verify → Evaluate.
Confirmation gates are non-negotiable. No implicit approval.
Stay within scope. Do not modify files outside scope.
Evaluator must be isolated. Anchor-free input. Never pass Generator reasoning.
Planner proposals must be independent. Never share one persona's work with another during proposal.
Generator advisors review the plan, not code. Advisory before implementation.
Use available skills. Search by keyword, not plugin name. Proceed without if none found.
User language. All user-facing output in user_lang.
Intermediate outputs are ephemeral. Only final artifacts preserved in docs/.
Orchestrator reads no intermediate files. See §Architecture Principles for full exception list.
1-line return parsing. Only first line of sub-agent return is used for state decisions.

name	workflow
description	3-Phase (Planner -> Generator -> Evaluator) workflow with selectable single-agent or multi-agent persona mode. Use for development tasks (feature work, bug fixes, maintenance) AND non-development tasks (planning data processing, document generation, analysis) that benefit from structured planning, implementation, and review.

Agent Harness Workflow — Thin Orchestrator (v2)

You are a state-machine orchestrator. Your role is:

Manage phase transitions via state.json
Dispatch sub-agents with minimal context
Parse 1-line return values
Present gates to the user

Sub-agent Return Value Rules

When a sub-agent returns:

Read only the first line (up to first newline) for state decisions
Extract keywords: "FAIL", "PASS", "generated", "changed", "written"
Use the first line as the progress message shown to the user
Ignore all remaining text — do not analyze, reference, or include it in subsequent prompts

Version & Compatibility

This is state.json v2 (version "2.0"). When loading an existing state.json:

If version field is missing → treat as v1. Run v1 logic (see below).
If version is "2.0" → run v2 logic defined in this file.
Do NOT migrate v1 sessions to v2. v1 sessions continue with v1 behavior.

v1 Logic Summary (for sessions without version field)

v1 uses a simpler flow without the thin orchestrator pattern:

Phases: plan_ready → gen_ready → verify_ready → verifying → verify_done → eval_ready → completed
The orchestrator directly accumulates sub-agent results (no 1-line return constraint)
No run_style — always auto mode
No Layer 2 structural verification — Evaluator runs Layer 3 only
Verify step may be absent (pre-v7.1 sessions without verify field → skip verify entirely)
Resume v1 session by jumping to the v1 phase's corresponding step in the original flow

v1 → v2 Phase Mapping (for reference only)

v1 phase	v2 equivalent	Notes
`plan_ready`	`plan_ready`	same
`gen_ready`	`generate_ready`	renamed
`eval_ready`	`evaluate_ready`	renamed
`completed`	`completed`	same
`verify_ready`	`verify_ready`	v7.1, kept
`verifying`	`verifying`	v7.1, kept
`verify_done`	`verify_done`	v7.1, kept

Zero-Setup Environment Detection

At startup, detect whether the current directory is inside a git repository:

git rev-parse --is-inside-work-tree 2>/dev/null

If succeeds → has_git = true
If fails → has_git = false

User Language Detection

Detect the user's language from their most recent message. Store as user_lang in state.json.

All user-facing communication in user_lang: progress updates, questions, confirmations, errors, spec sections, QA narrative, commit messages (if has_git).

Stays in English: template instructions, state.json field names, file names, git branch names.

Re-detection: On every user message, check if language changed. If so, update user_lang.

Standard Status Format

Read .harness/state.json and print (in user_lang):

[harness]
  Task   : <task>
  Mode   : <single | standard | multi>
  Model  : <model_config preset name>
  Style  : <auto | phase | step>
  Phase  : <phase label>
  Round  : <round> / <max_rounds>
  Branch : <branch>          ← omit if has_git == false
  Scope  : <scope>

Phase labels:

plan_ready → "Plan — ready"
planning → "Plan — in progress"
plan_done → "Plan — complete"
generate_ready → "Generate — ready"
generating → "Generate — in progress"
generate_done → "Generate — complete"
verify_ready → "Verify — ready"
verifying → "Verify — running checks"
verify_done → "Verify — complete"
evaluate_ready → "Evaluate — ready"
evaluating → "Evaluate — in progress"
evaluate_done → "Evaluate — complete"
completed → "Completed"

Session Recovery

Before starting a new task, check if .harness/state.json exists:

Read state.json. Check skill field (if present):
- If skill field exists and is NOT "workflow" → warn user (in detected language): "A /{skill} skill session is active in this directory." Ask via AskUserQuestion: header "Session Conflict", question "A /{skill} session exists. Delete it and start /workflow?", options: "Delete and start" / "Delete .harness/ and proceed with /workflow", "Cancel" / "Keep existing session and halt". If "Cancel" → halt. If "Delete and start" → delete .harness/, proceed to Step 1.
- If skill field is "workflow" or missing → continue below.
Check version field:
- Missing → v1 session. Print status with [harness] Previous session detected (v1)., then run v1 recovery logic (pre-v8 behavior). Do NOT apply v2 state machine.
- "2.0" → v2 session. Continue below.
Print status in standard format, prefixed with [harness] Previous session detected.
Restore model_config from state.json. Apply to all subsequent sub-agent launches.
Restore conventions from state.json. If value starts with "file:", verify the referenced file exists. If file missing, set conventions → null (will trigger Step 1.5 on resume).
If has_git is not in state.json, re-detect and store.
Ask the user via AskUserQuestion (in user_lang):
- header: "Session"
- question: "[harness] Previous session detected. [standard status]. Resume, restart, or stop?"
- options:
  - "Resume" / "Continue from {phase}"
  - "Restart" / "Delete .harness/ and start fresh"
  - "Stop" / "Delete .harness/ and halt"
Actions:
- Resume: Before jumping to any step, run Safety Guard re-validation:
  - Read docs_path directly from state.json. Do NOT recompute from cli_flags.output_dir — cli_flags is for audit/record only.
  - Run validate_path(docs_path, kind=output_dir): slug validation + relative path + reserved name check.
  - If validation fails: print [harness] ⚠ Recovered docs_path failed validation: <path> and treat as Restart.
  Then jump to the state matching phase:
  - plan_ready → Step 1.5 (Convention Scan) if conventions is null (not yet executed), else Step 2 (Plan). Note: "skipped" means user already decided — go to Step 2. If conventions starts with "file:" but the file does not exist, treat as null and re-run Step 1.5.
  - planning / plan_done → Step 3 (Gate) if spec.md exists, else Step 2
  - generate_ready → Step 4 (Generate)
  - generating / generate_done → Step 5 (Verify)
  - verify_ready / verifying → Step 5 (Verify), reset retries to 0
  - verify_done:
    - if state.autofix == null → Step 5 (Verify), reset layer1_retries to 0 (existing behavior)
    - if autofix.applied == "proposed" → Step 5 "2nd HARD-GATE" direct re-entry (I3; do NOT reset retries)
    - if autofix.applied == "applied" → Step 5 re-verify from Layer 1 (retries from state.json, no reset)
    - if autofix.applied is "stopped" or "rejected" → Step 5 "1st HARD-GATE" (Auto-fix HIDE per I2; layer1_retries unchanged — I4 clamp applies to "stopped")
  - evaluate_ready → Step 6 (Evaluate)
  - evaluating / evaluate_done → Step 7 (Verdict)
  - completed → no active session, proceed to Step 1
- Restart: Delete .harness/ and proceed to Step 1
- Stop: Delete .harness/ and halt

If .harness/state.json does not exist, proceed to Step 1.

run_style (Execution Mode)

Three execution styles control how phases progress:

Style	Behavior	Session end points
`auto`	Automatic progression, user gates at plan_done and evaluate_done(FAIL) only	`completed`
`phase`	Stop at each `*_done` state, resume in next session	`plan_done`, `generate_done`, `verify_done`, `evaluate_done`
`step`	Execute only the specified step, then stop	Immediately after step

CLI Parsing

/workflow "task description"              → auto (default)
/workflow plan "task description"         → phase mode, plan step
/workflow generate                        → phase mode, generate step
/workflow verify                          → step mode, verify only
/workflow evaluate                        → step mode, evaluate only
/workflow --mode single "task"            → auto + single mode
/workflow --model-config balanced "task"  → auto + balanced preset

When state.json exists and /workflow is called with no arguments: → Read phase, suggest next step: e.g. "Plan complete. Run generate?"

Step Mode Prerequisites

Step	Required files	Required phase (minimum)	Missing action
`/workflow plan`	(none)	(new session OK)	Normal start
`/workflow generate`	spec.md	after `plan_done`	Error: "Run plan first"
`/workflow verify`	changes.md	after `generate_done`	Error: "Run generate first"
`/workflow evaluate`	spec.md + changes.md + verify_report.md	after `verify_done`	Error: "Run verify first"

State Machine

State Transition Diagram

plan_ready → planning → plan_done → [User Gate] → generate_ready
  → generating → generate_done → verify_ready → verifying → verify_done
  → evaluate_ready → evaluating → evaluate_done → [Verdict Gate]
  → completed

Retry loops:
  verify_done(FAIL) + retries<3 → generating → generate_done → verifying → ...
  evaluate_done(FAIL) + user Fix → generating → generate_done → verifying → ...

Transition Rules

*_ready → *ing: sub-agent dispatch (immediate)
*ing → *_done: sub-agent completion
*_done → next *_ready: auto mode = automatic / phase mode = next session
Phase mode can end session at: plan_done, generate_done, verify_done, evaluate_done

Auto-fix State Transition Table

`autofix.applied`	Meaning
`null` (idle)	Auto-fix not yet attempted
`"proposed"`	Proposer dispatched, awaiting 2nd HARD-GATE
`"applied"`	Patch applied, re-verification in progress
`"rejected"`	User rejected proposal
`"stopped"`	Patch applied but re-verification failed

Transitions:

From	Event	To
`null` (idle)	1st HARD-GATE "Auto-fix" selected	`proposed`
`proposed`	User "Apply patch" (2nd HARD-GATE)	`applied`
`proposed`	User "Reject" (2nd HARD-GATE)	`rejected`
`applied`	Re-verify PASS	(cleared — continues to Step 6)
`applied`	Re-verify FAIL	`stopped`

Invariants (I1–I4):

I1: verify.autofix_attempted == true ⟺ autofix != null ∧ autofix.applied ≠ "proposed"
I2: 1st HARD-GATE Auto-fix option is visible only when verify.autofix_attempted == false AND state.autofix == null
I3: On session resume, if autofix.applied == "proposed" → re-enter 2nd HARD-GATE directly (skip 1st GATE)
I4: autofix.applied == "stopped" ⟹ layer1_retries = min(layer1_retries, 3) (clamp — no further increment)

Workflow Steps

Step 1: Setup

Detect user language from task description. Store as user_lang.
Parse CLI arguments:
- Bare task → run_style: "auto"
- plan|generate prefix → run_style: "phase" (multi-step progression)
- verify|evaluate prefix → run_style: "step" (single step only)
- --mode single|standard|multi → set mode
- --model-config <preset> → set model config
- --lint-cmd <cmd> → override lint_cmd
- --type-check-cmd <cmd> → override type_check_cmd
- --verifier-model <haiku|sonnet|opus> → override verifier model (default: haiku). Validation: if value is not one of haiku, sonnet, opus → halt with error: "Invalid --verifier-model value. Allowed: haiku, sonnet, opus."
- --output-dir <path> → override output base directory (default: docs/harness). Validation — apply validate_path(path, kind=output_dir) (see §Architecture Principles §Path Validator):
  - Step 0 (before normalization): Empty string → halt with error: "output-dir cannot be empty."
  - Step 1 Normalize: \ → / (always, OS-independent). UNC pattern (\\server\… or //server/…) → halt with error: "UNC paths are not allowed."
  - Step 2 Absolute path: matches ^/ or ^[A-Za-z]:/ → halt with error: "output-dir must be a relative path."
  - Step 3 Segment ..: path.split("/") — if any segment == ".." → halt with error: "output-dir must not contain '..'." (segment-exact check, not substring)
  - Step 4 Reserved first segment: path.split("/")[0] ∈ {memory, spec, planner, generator, evaluator, verify, harness, .harness} → halt with error: "output-dir value starts with a reserved directory name." (first segment only — trailing slash stripped first; full-path comparison is NOT performed)
  - Step 4.5 (NEW in 8.4) docs first-segment exception for /spec → /workflow slug-safe handoff: if path.split("/")[0] == "docs", the second segment MUST be harness (i.e. path starts with docs/harness/...). Otherwise halt with error: "output-dir under docs/ must be docs/harness/..." Rationale: the default output_base = "docs/harness" always writes under this tree, so the standard /spec handoff value docs/harness/<slug>/ is the only legitimate docs/... override; any other docs/<other>/ first-segment override is rejected to prevent accidental writes outside the harness namespace.
  - If valid: normalize with trailing slash stripped, store in cli_flags.output_dir.
Slugify the task: lowercase, transliterate non-ASCII to ASCII, remove non-word chars except hyphens, replace spaces with hyphens, truncate to 50 chars. Store as <slug>.

Auto-detect project language and commands. Scan the working directory:

File	Language	Test Command	Build Command
`build.gradle(.kts)`	java	`./gradlew test`	`./gradlew build`
`pom.xml`	java	`mvn test`	`mvn compile`
`pyproject.toml` / `setup.py`	python	`pytest`	(none)
`package.json`	typescript	`npm test`	`npm run build`
`*.csproj`	csharp	`dotnet test`	`dotnet build`
`go.mod`	go	`go test ./...`	`go build ./...`
`Cargo.toml`	rust	`cargo test`	`cargo build`

If none match, set language to "unknown", test/build commands to null.

Auto-detect lint command (skip if --lint-cmd provided). Check in order, stop at first match:

#	Detection	Condition	lint_cmd
1	Read `package.json`	`scripts.lint` key exists	`npm run lint`
2	Glob	`.eslintrc` / `eslint.config.` exists	`npx eslint .`
3	Read `pyproject.toml`	`[tool.ruff]` section exists	`ruff check .`
4	Glob+Read	`.pylintrc` exists OR `pyproject.toml` has `[tool.pylint]`	`pylint {scope}`
5	Glob	`.golangci.yml` / `.golangci.yaml` exists	`golangci-lint run`
6	Glob	`Cargo.toml` exists	`cargo clippy`

If none match → null (SKIPPED during verify).

Auto-detect type-check command (skip if --type-check-cmd provided). Check in order, stop at first match:

#	Detection	Condition	type_check_cmd
1	Glob	`tsconfig.json` exists	`npx tsc --noEmit`
2	Glob+Read	`mypy.ini` exists OR `pyproject.toml` has `[tool.mypy]`	`mypy .`
3	Glob+Read	`pyrightconfig.json` exists OR `pyproject.toml` has `[tool.pyright]`	`pyright`
4-6	—	`*.csproj` / `go.mod` / `Cargo.toml`	null (build includes type-check)

If none match → null (SKIPPED during verify).

Determine docs_path:

output_base = cli_flags.output_dir ?? "docs/harness"
docs_path = output_base + "/" + <slug> + "/"

Create directories: .harness/, .harness/planner/, .harness/generator/, {docs_path}

Immediately after docs_path is determined, write partial state.json (crash recovery checkpoint):

{ "version": "2.0", "task": "<task>", "cli_flags": {...}, "user_lang": "<lang>",
  "has_git": <bool>, "created_at": "<ISO8601>", "docs_path": "<docs_path>", "slug": "<slug>" }

Remaining fields (mode, model_config, etc.) are null until Step 1.11 final write.

Create git branch (if has_git): git checkout -b harness/<slug>. Skip if has_git == false.
Mode selection: If --mode provided, use it. Otherwise, ask via AskUserQuestion (in user_lang):
- header: "Mode"
- question: "Select workflow mode:"
- options:
  - "standard (Recommended)" / "2 specialists analyze + synthesize. ~1.5x tokens. Balanced depth"
  - "single" / "1 agent. Fast, token-saving. Best for simple tasks"
  - "multi" / "3 specialists + cross-critique. ~2-2.5x tokens. Deepest analysis"
Model configuration: If --model-config provided, use it. Otherwise, ask via AskUserQuestion (in user_lang):
- header: "Model"
- question: "Select model configuration for sub-agents:"
- options:
  - "default" / "Inherit parent model, no changes"
  - "all-opus" / "All sub-agents use Opus (highest quality)"
  - "balanced (Recommended)" / "Sonnet executor + Opus advisor/evaluator (cost-efficient)"
  - "economy" / "Haiku executor + Sonnet advisor/evaluator (max savings)"
If "Other": parse executor:<model>,advisor:<model>,evaluator:<model>. Validate — only opus, sonnet, haiku. Max 3 retries, then default to balanced. Fill missing roles from balanced defaults.

Store as model_config: { "preset": "<name>", "executor": "<model|null>", "advisor": "<model|null>", "evaluator": "<model|null>", "verifier": "<resolved-verifier>" }. For default preset: { "preset": "default", "verifier": "<resolved-verifier>" }.

**Backward compat (Session Recovery):** When resuming a v2.0 session that lacks `cli_flags`, `verify.autofix_attempted`, `autofix`, or `docs_path`, resolve missing fields to defaults in memory without writing back to state.json:
- `cli_flags` → `{ "verifier_model": null, "output_dir": null }`
- `model_config.verifier` (missing) → `"haiku"`
- `verify.autofix_attempted` (missing) → `false`
- `autofix` (missing) → `null`
- `docs_path` (missing) → `"docs/harness/<slug>/"` (reconstruct from `task` slug)
This ensures `state.get(field, default)` pattern — KeyError-free resume for all pre-v8.1 sessions.

**docs_path usage rule**: Always read `docs_path` directly from state.json. Do NOT recompute from `cli_flags.output_dir`. `cli_flags` is for audit/record purposes only. Safety Guard in Session Recovery also uses `docs_path` directly (not recomputed).

11. Write .harness/state.json:

{
  "version": "2.0",
  "task": "<task>",
  "mode": "single|standard|multi",
  "run_style": "auto|phase|step",
  "model_config": {
    "preset": "<name>",
    "executor": "<model|null>",
    "advisor": "<model|null>",
    "evaluator": "<model|null>",
    "verifier": "<haiku|sonnet|opus>"
  },
  "cli_flags": {
    "verifier_model": null,
    "output_dir": null
  },
  "user_lang": "<lang>",
  "has_git": true,
  "repo_name": "<name>",
  "repo_path": "<path>",
  "phase": "plan_ready",
  "round": 1,
  "max_rounds": 3,
  "max_files": 20,
  "scope": "<scope or (no limit)>",
  "branch": "harness/<slug>",
  "lang": "<detected>",
  "build_cmd": "<cmd or null>",
  "test_cmd": "<cmd or null>",
  "lint_cmd": "<cmd or null>",
  "type_check_cmd": "<cmd or null>",
  "verify": {
    "layer1_result": null,
    "layer1_retries": 0,
    "layer2_result": null,
    "layer2_retries": 0,
    "todo_blocking": false,
    "autofix_attempted": false
  },
  "autofix": null,
  "docs_path": "<output_base>/<slug>/",
  "conventions": null,
  "created_at": "<ISO8601>",
  "updated_at": "<ISO8601>"
}

cli_flags.verifier_model and cli_flags.output_dir are null by default (no CLI override). verify.autofix_attempted starts false each new session (session-wide once-only limit — not reset on round increment). autofix starts null; transitions to { "last_patch_path": "...", "applied": "proposed"|"applied"|"rejected"|"stopped", "triggered_at": "<ISO8601>" } during H2 flow.

Print setup summary (in user_lang):

[harness] Task started!
  Directory : <path>
  Branch    : harness/<slug>     ← omit if has_git == false
  Mode      : <single | standard | multi>
  Model     : <preset>
  Verifier  : <model_config.verifier>    ← always shown
  Style     : <auto | phase | step>
  Language  : <lang>
  Test      : <test_cmd or "none">
  Build     : <build_cmd or "none">
  Lint      : <lint_cmd or "none">
  TypeCheck : <type_check_cmd or "none">
  Scope     : <scope>
  Output    : <docs_path>

If model_config.verifier is sonnet or opus, also print:

  ⚠ Verifier set to <model> — high cost for mechanical verification. haiku is usually sufficient.

Proceed to Step 1.5 (Convention Scan). If run_style == "step" and the CLI step is not plan, check prerequisites and jump to the requested step after Step 1.5 completes.

Step 1.5: Convention Scan

This step runs after Setup and before Plan, in all modes.

Persisted Spec Artifacts Check (NEW in 8.4):

Evaluation order (NEW in 8.4 v2 hardening — explicit decision tree to remove M2 vs Resume idempotency ambiguity):

IF  state.conventions == "file:.harness/conventions.md"  THEN
    IF  .harness/conventions.md exists  THEN
        // (M2) Skip — live .harness/conventions.md is authoritative on /workflow resume.
        skip Persisted Spec Artifacts Check entirely.
        proceed to CLAUDE.md richness flow below.
    ELSE  IF  {docs_path}conventions.md exists  THEN
        // Resume idempotency — re-copy /spec snapshot.
        copy {docs_path}conventions.md → .harness/conventions.md.
        proceed to Step 2 (Plan) — skip rich/sparse/missing trichotomy.
    ELSE
        // Both files missing — reset state and fall through.
        state.conventions = null  (atomic single-write).
        proceed to CLAUDE.md richness flow below (treat as fresh execution).
    END
ELIF  state.conventions IN { null, "skipped" }  THEN
    // Fresh /workflow session OR explicitly skipped — fall through.
    IF  {docs_path}conventions.md exists  THEN
        copy {docs_path}conventions.md → .harness/conventions.md.
        set state.conventions = "file:.harness/conventions.md".
        proceed to Step 2 (Plan).
    ELSE
        proceed to CLAUDE.md richness flow below.
    END
END

Otherwise (fresh /workflow session, or .harness/conventions.md missing — see decision tree above):

File {docs_path}conventions.md exists → copy content to .harness/conventions.md, set state.conventions → "file:.harness/conventions.md", skip the rich/sparse/missing trichotomy entirely. Proceed to Step 2 (Plan).
File {docs_path}conventions.md does NOT exist → proceed with the existing CLAUDE.md richness flow below.

CLAUDE.md Richness Check:

Check if CLAUDE.md exists in the repository root.
If it exists, count lines: wc -l CLAUDE.md (or read and count).
Richness determination:
- Exists AND ≥ 50 lines → rich → skip scan, read CLAUDE.md content as conventions.
- Exists AND < 50 lines → sparse → proceed to scan Q&A.
- Does not exist → missing → proceed to scan Q&A.

conventions field contract: Always stores one of three values:

null → Step 1.5 not yet executed (initial state)
"skipped" → user explicitly chose to skip convention scan
"file:<path>" → conventions available at the given path (e.g., "file:.harness/conventions.md")

If rich (CLAUDE.md ≥ 50 lines):

Copy CLAUDE.md content to .harness/conventions.md (so all convention sources use the same path pattern).
Store conventions → "file:.harness/conventions.md" in state.json.
Print: [harness] Conventions: CLAUDE.md detected (rich). Copied to .harness/conventions.md Proceed to Step 2 (Plan).

If sparse or missing:

Ask via AskUserQuestion (in user_lang):

header: "Convention Scan"
question: "No rich CLAUDE.md found. Scan codebase to auto-detect project conventions (DB, API, file structure, test patterns)? This helps the Planner align with existing patterns."
options:
- "Scan" / "Run convention scanner sub-agent (~1 token overhead)"
- "Skip" / "Proceed without convention data"

If "Skip": Set conventions → "skipped" in state.json. Print: [harness] Conventions: skipped. Proceed to Step 2.

If "Scan":

Read template: {CLAUDE_PLUGIN_ROOT}/templates/planner/convention_scanner.md
Fill variables: {repo_path}, {lang}, {scope}, {output_path} = .harness/conventions.md.
Dispatch 1 sub-agent (convention scanner). Model: if preset ≠ "default", use model_config.advisor (or haiku for economy).
Parse return — first line should contain "conventions written".
Verify .harness/conventions.md exists.
- If file does NOT exist (sub-agent reported success but file missing): warn user (in user_lang): "Convention scan completed but output file not found." Ask via AskUserQuestion: header "Convention Scan Failed", question "Output file missing. Retry or skip?", options: "Retry" / "Re-run scanner", "Skip" / "Proceed without conventions". If "Retry" → re-dispatch sub-agent (max 2 retries). If "Skip" → set conventions → "skipped". Do NOT store a "file:" reference to a non-existent file.
Store conventions → "file:.harness/conventions.md" in state.json.
Print: [harness] Conventions: scanned and saved to .harness/conventions.md

Update state.json: updated_at → now. Proceed to Step 2 (Plan).

Step 2: Plan Phase

Update state.json: phase → "plan_ready", updated_at → now.

Print: [harness] Phase: Plan

Discovery Notes Injection (NEW in 8.4) — applies to all planner dispatches in single/standard/multi mode:

Before any planner sub-agent dispatch, prepare:

qa_discovery_notes = read content of {docs_path}qa_notes.md:
- File missing → empty string "" (silent — pre-8.4 session or fresh /workflow run with no preceding /spec).
- (s2) File exists but read fails (permission, encoding, IO error) → warn user (in user_lang): "Failed to read {docs_path}qa_notes.md: . Discovery Notes will be empty for planner injection." Then fall back to empty string "" and proceed (do NOT abort the dispatch — empty Discovery Notes is harmless per the backward-compat note below).
critic_findings = read content of {docs_path}critic_findings.md using the same pattern (missing → empty silently; read failure → warn + empty fallback).

Backward compat: pre-8.4 sessions where these files do not exist receive empty strings — templates render the ## Discovery Notes from Spec Phase section with empty sub-bodies, which is harmless.

{task_description} — from state.json
{repo_path} — from state.json (or cwd for greenfield)
{lang} — from state.json (project primary language detection)
{scope} — from state.json (file scope filter)
{user_lang} — from state.json
{qa_discovery_notes} — prepared above (always pass; empty string fallback)
{critic_findings} — prepared above (always pass; empty string fallback)
{conventions} — derived from state.conventions per §Conventions injection rule (always pass; empty string fallback) Mode-specific dispatch steps add {output_path} (architect/senior_developer/qa_specialist) or {spec_path} (planner_single only).

Step 2 — single mode

Update phase → "planning".
Read template: {CLAUDE_PLUGIN_ROOT}/templates/planner/planner_single.md
Dispatch 1 sub-agent with prompt built from the Common dispatch variable set (see §Step 2 Discovery Notes Injection above) plus mode-specific extra {spec_path} = {docs_path}spec.md.
- Model: if preset ≠ "default", use model_config.advisor.
Parse return → extract first line. Print: ✓ {first line}
Verify spec.md exists.
Update phase → "plan_done", updated_at → now.

Step 2 — standard mode

2a: Independent Proposals (Parallel)

Update phase → "planning".
Read templates: architect.md, senior_developer.md
Dispatch 2 sub-agents in parallel. Each gets the Common dispatch variable set (see §Step 2 Discovery Notes Injection above) plus mode-specific extra {output_path} = .harness/planner/proposal_<persona>.md.
- Model: if preset ≠ "default", use model_config.advisor.
Parse returns. Print: ✓ 2 proposals generated
Verify both proposal files exist.

2b: Synthesis

Read template: synthesis_standard.md
Dispatch 1 sub-agent with: {task_description}, {user_lang}, {all_proposals} (read both proposals, concatenate with author labels), {spec_path}.
- Model: if preset ≠ "default", use model_config.advisor.
Parse return. Print: ✓ {first line}
Verify spec.md exists.
Update phase → "plan_done", updated_at → now.

Step 2 — multi mode

2a: Independent Proposals (Parallel)

Update phase → "planning".
Read templates: architect.md, senior_developer.md, qa_specialist.md
Dispatch 3 sub-agents in parallel. Each gets the Common dispatch variable set (see §Step 2 Discovery Notes Injection above) plus mode-specific extra {output_path} = .harness/planner/proposal_<persona>.md.
- Model: if preset ≠ "default", use model_config.advisor.
Parse returns. Print: ✓ 3 proposals generated
Verify all 3 proposal files exist.

2b: Cross-Critique (Parallel)

Read template: cross_critique.md
Read all 3 proposals.
For each persona, prepare critique prompt with the OTHER two proposals.
Dispatch 3 sub-agents in parallel. Each writes to .harness/planner/critique_<persona>.md.
- Model: if preset ≠ "default", use model_config.advisor.
Parse returns. Print: ✓ 3 cross-critiques completed
Verify all 3 critique files exist.

2c: Synthesis

Read template: synthesis.md
Read all 6 files (3 proposals + 3 critiques).
Dispatch 1 sub-agent with: {task_description}, {user_lang}, {all_proposals}, {all_critiques}, {spec_path}.
- Model: if preset ≠ "default", use model_config.advisor.
Parse return. Print: ✓ {first line}
Verify spec.md exists.
Update phase → "plan_done", updated_at → now.

After Plan Phase

Print: [harness] Plan complete.

If run_style == "phase" or (run_style == "step" and requested step was plan): Print spec.md path, inform user session can end. Halt.

If run_style == "auto": Continue to Step 3 (Gate).

Step 3: HARD GATE — Spec Confirmation

If "Modify": update spec.md and re-present. If "Stop": halt. Only "Proceed" advances.

Update state.json: phase → "generate_ready", updated_at → now.

Step 4: Generate Phase

Print: [harness] Phase: Generate

Print: Dispatching generator sub-agent...

Step 4 — single mode

Update phase → "generating", updated_at → now.
Read template: generator_single.md
Prepare prompt: {spec_content} from spec.md, {qa_feedback} from qa_report.md if round > 1 else "(First round)", {round_num}, {scope}, {max_files}, {user_lang}, {changes_path} = {docs_path}changes.md.
- If retry (from verify/evaluate failure): add {verify_failure} = 1-line FAIL summary, {verify_report_path} = {docs_path}verify_report.md.
- Model: if preset ≠ "default", use model_config.executor.
Dispatch 1 sub-agent.
Parse return. Print: ✓ {first line}
Verify changes.md exists.
Update phase → "generate_done", updated_at → now.

Step 4 — standard mode

4a: Implementation Plan

Update phase → "generating", updated_at → now.
Read template: lead_developer.md
Dispatch 1 sub-agent (Lead Developer): {spec_content}, {qa_feedback}, {repo_path}, {lang}, {scope}, {max_files}, {user_lang}, {output_path} = .harness/generator/plan.md.
- Model: if preset ≠ "default", use model_config.executor.
Parse return. Print: ✓ Plan: {first line}
Verify .harness/generator/plan.md exists.

4b: Combined Advisory Review

Read template: combined_advisor.md
Read .harness/generator/plan.md.
Dispatch 1 sub-agent (Combined Advisor): {spec_content}, {plan_content}, {repo_path}, {lang}, {test_cmd}, {user_lang}, {output_path} = .harness/generator/review_combined.md.
- Model: if preset ≠ "default", use model_config.advisor.
Parse return. Print: ✓ Review: {first line}
Verify .harness/generator/review_combined.md exists.

4c: Implementation

Read template: implementation_standard.md
Read plan + review from .harness/generator/.
Prepare prompt: {spec_content}, {plan_content}, {advisor_review}, {qa_feedback}, {repo_path}, {lang}, {scope}, {max_files}, {user_lang}, {round_num}, {changes_path} = {docs_path}changes.md.
- If retry: add {verify_failure}, {verify_report_path} = {docs_path}verify_report.md.
- Model: if preset ≠ "default", use model_config.executor.
Dispatch 1 sub-agent.
Parse return. Print: ✓ Code: {first line}
Verify {docs_path}changes.md exists.
Update phase → "generate_done", updated_at → now.

Step 4 — multi mode

4a: Implementation Plan

Same as standard 4a.

4b: Advisory Review (Parallel)

Read templates: code_quality_advisor.md, test_stability_advisor.md
Read .harness/generator/plan.md.
Dispatch 2 sub-agents in parallel. Each writes to .harness/generator/review_<advisor>.md.
- Model: if preset ≠ "default", use model_config.advisor.
Parse returns. Print: ✓ Reviews: 2 advisors completed
Verify both review files exist.

4c: Implementation

Read template: implementation.md
Read plan + both reviews from .harness/generator/.
Prepare prompt: {spec_content}, {plan_content}, {code_quality_review}, {test_stability_review}, {qa_feedback}, {repo_path}, {lang}, {scope}, {max_files}, {user_lang}, {round_num}, {changes_path} = {docs_path}changes.md.
- If retry: add {verify_failure}, {verify_report_path} = {docs_path}verify_report.md.
- Model: if preset ≠ "default", use model_config.executor.
Dispatch 1 sub-agent.
Parse return. Print: ✓ Code: {first line}
Verify {docs_path}changes.md exists.
Update phase → "generate_done", updated_at → now.

After Generate Phase

Print: [harness] Generate complete.

If run_style == "phase" or (run_style == "step" and requested step was generate): Inform user, halt.

If run_style == "auto": Continue to Step 5 (Verify).

Step 5: Verify Phase (Layer 1 — Mechanical)

First entry only (from generate_done, not from retry loop): Update state.json: phase → "verify_ready", verify.layer1_result → null, verify.layer1_retries → 0, updated_at → now.

Print: [harness] Phase: Verify (Layer 1 — Mechanical)

Read template: {CLAUDE_PLUGIN_ROOT}/templates/verify/verify_layer1.md
Prepare prompt with:
- {build_cmd}: from state.json (or "SKIP" if null)
- {test_cmd}: from state.json (or "SKIP" if null)
- {lint_cmd}: from state.json (or "SKIP" if null)
- {type_check_cmd}: from state.json (or "SKIP" if null)
- {changes_md_path}: {docs_path}changes.md
- {verify_report_path}: {docs_path}verify_report.md
- {todo_blocking}: from state.json verify.todo_blocking
Update phase → "verifying", updated_at → now.
Dispatch Verify sub-agent with model: model_config.verifier (default: haiku; override via --verifier-model).
Parse return — first line:
- Contains "PASS" → verify.layer1_result → "PASS"
- Contains "FAIL" → verify.layer1_result → "FAIL"
Update phase → "verify_done", updated_at → now.

If PASS:

Print:

[harness] Verify (Layer 1) complete.
  Result : PASS
  {first line from sub-agent}

Continue to Step 6.

If FAIL and retries < 3:

Increment verify.layer1_retries in state.json. Print:

[harness] Verify (Layer 1) FAIL — retrying Generator (attempt {layer1_retries}/3)
  {first line error summary}

Generator retry — regardless of mode, launch one implementation sub-agent only (no re-plan, no re-review):

single → generator_single.md
standard → implementation_standard.md (with existing plan.md + review_combined.md)
multi → implementation.md (with existing plan.md + reviews)

Add to prompt:

{verify_failure} = the 1-line FAIL summary
{verify_report_path} = {docs_path}verify_report.md
Model: if preset ≠ "default", use model_config.executor.

If FAIL and retries >= 3:

Print:

[harness] Verify (Layer 1) FAIL — max retries reached (3/3)
  Latest error: {first line error summary}
  See: {docs_path}verify_report.md

If "Continue": proceed to Step 6. If "Stop": halt (keep phase as verify_done).

If "Auto-fix proposal":

verify.autofix_attempted is set to true only after the 2nd HARD-GATE decision (Apply/Reject/Stop), NOT at Proposer dispatch. This ensures session interruption between dispatch and the 2nd gate does not consume the once-only right (I1). On session resume with autofix.applied == "proposed": re-enter 2nd HARD-GATE directly using saved autofix.last_patch_path — skip 1st GATE (I3).

Update state.json: autofix → { "last_patch_path": ".harness/generator/auto_fix_patch.md", "applied": "proposed", "triggered_at": "<ISO8601>" }
Read template: {CLAUDE_PLUGIN_ROOT}/templates/generator/auto_fix_proposer.md
Fill variables (pass paths only — Proposer sub-agent reads files directly):
- {spec_path} = {docs_path}spec.md
- {changes_md_path} = {docs_path}changes.md
- {verify_report_path} = {docs_path}verify_report.md
- {failing_files_list} = Orchestrator reads verify_report.md directly to extract file paths (explicit exception to §Architecture Principles #1 — path extraction only, no content analysis). After extraction:
  - Apply validate_path(path, kind=file_reference) to each path.
  - Violations: drop path + print [harness] ⚠ Path validation failed: <path> — excluded from Proposer input
  - Cap: maximum 5 paths. Excess paths dropped silently.
  - If 0 valid paths remain: print [harness] ⚠ No valid file paths found — Proposer input will be empty
- {output_path} = .harness/generator/auto_fix_patch.md
Dispatch Auto-fix Proposer sub-agent with model: model_config.advisor ?? "opus".
- If model_config.preset == "default", use "opus" (explicit upgrade — 2nd GATE UI will warn cost).
Parse return 1-line. Extract confidence level. If return format is non-standard (cannot parse confidence), treat as confidence: Unknown and print [harness] ⚠ 1-line return parse failed — fallback: confidence Unknown.
Verify .harness/generator/auto_fix_patch.md exists.
Empty patch check: verify auto_fix_patch.md contains at least one ```diff code block AND at least one @@ hunk header.
- If absent: skip Apply, print [harness] ⚠ Patch file is empty or has no diff block — apply skipped, return to HARD-GATE (Auto-fix hidden).

After 2nd HARD-GATE decision, set verify.autofix_attempted = true in state.json.

If "Apply patch":

Before applying: snapshot current state via git stash (if has_git == true) or copy changed files to .harness/autofix_pre_apply/ (if has_git == false).
Pre-apply path validation: parse all --- a/<path> and +++ b/<path> headers from auto_fix_patch.md (metadata only — 4 header lines per hunk; hunk body is not parsed). Apply validate_path(path, kind=diff_target) to each path.
- Print to user: [harness] Applying patch to: <path list>
- If any path fails validation: reject Apply, print [harness] ✗ Diff path validation failed: <path>, return to HARD-GATE (Auto-fix hidden).
Apply unified diff from .harness/generator/auto_fix_patch.md using Edit tool.
- If any hunk fails to apply: restore from snapshot, warn user "Apply failed — reverted to pre-apply state.", return to HARD-GATE (retries >= 3, Auto-fix hidden).
Update state.json: autofix.applied → "applied". Reset verify.layer1_result → null.
Re-run Layer 1 Verify (retry counter layer1_retries unchanged — do NOT increment):
- PASS → proceed to Step 6.
- FAIL → update state.json: autofix.applied → "stopped", layer1_retries = min(layer1_retries, 3) (clamp — see §State Machine I4). Return to FAIL retries >= 3 HARD-GATE (Auto-fix option hidden since verify.autofix_attempted == true).

If "Reject":

Update state.json: autofix.applied → "rejected".
Return to FAIL retries >= 3 HARD-GATE (Auto-fix option hidden).

Layer 2 FAIL path: Auto-fix proposal does NOT apply to Layer 2 structural failures (Step 7). Mechanical diff cannot fix structural issues.

After Verify Phase

If run_style == "phase" or (run_style == "step" and requested step was verify): Inform user of result, halt.

If run_style == "auto": Continue to Step 6.

Step 6: Evaluate Phase (Layer 2 + Layer 3)

Update state.json: phase → "evaluate_ready", updated_at → now.

Print: [harness] Phase: Evaluate (Layer 2+3)

Print: Dispatching evaluator sub-agent...

Read template: {CLAUDE_PLUGIN_ROOT}/templates/evaluator/evaluator_prompt.md
Prepare prompt:
- {spec_content} from spec.md
- {changed_files_list} — file paths only from changes.md, strip all "reason" descriptions (anchoring prevention)
- {test_available}, {build_cmd}, {test_cmd}, {round_num}, {scope}, {user_lang}
- {qa_report_path} = {docs_path}qa_report.md
- {verify_context}:
  - If verify.layer1_result == "PASS": "Layer 1 PASSED — build/test/lint/type-check verified. See {docs_path}verify_report.md"
  - If verify.layer1_result == "FAIL" (user chose Continue): "Layer 1 FAILED (user proceeded despite failures) — see {docs_path}verify_report.md. Pay extra attention to build/test correctness."
  - If verify skipped: "Layer 1 was not executed for this session."
- Do NOT include: Generator reasoning, implementation plans, advisor reviews, or references to "Generator"/"AI"/"agent".
Update phase → "evaluating", updated_at → now.
Dispatch Evaluator sub-agent using subagent_type: "superpowers:code-reviewer" if available.
- Model: if preset ≠ "default", use model_config.evaluator.
Parse return — first line:
- Contains "PASS" → verify.layer2_result → "PASS". Print: ✓ {first line}
- Contains "FAIL L2" → verify.layer2_result → "FAIL". Print: ✗ {first line}
- Contains "FAIL L3" → verify.layer2_result → "PASS" (Layer 2 passed). Print: ✗ {first line}
- Contains "FAIL" (no layer indicator) → treat as L3 FAIL. verify.layer2_result → "PASS".
Update phase → "evaluate_done", updated_at → now.

Print: [harness] Evaluate complete.

If run_style == "phase" or (run_style == "step" and requested step was evaluate): Inform user, halt.

If run_style == "auto": Continue to Step 7.

Step 7: Verdict & Loop

Read qa_report.md. Look for "### Verdict: PASS" or "### Verdict: FAIL". Also check verify.layer2_result from state.json to determine failing layer.

If PASS:

Update state.json: phase → "completed", updated_at → now. Print: [harness] ✓ QA PASS — task complete. Proceed to Step 8.

If FAIL — Layer 2 (verify.layer2_result == "FAIL") and layer2_retries < 2:

Layer 2 failed. Auto-retry without user gate (same pattern as Layer 1 retry).

Increment verify.layer2_retries in state.json. Print:

[harness] Evaluate FAIL (Layer 2) — retrying Generator (attempt {layer2_retries}/2)
  {first line from evaluator}

Launch one implementation sub-agent (retry, no re-plan/re-review) — same rules as Layer 1 retry:

single → generator_single.md
standard → implementation_standard.md (with existing plan.md + review_combined.md)
multi → implementation.md (with existing plan.md + reviews)

Add {verify_failure} = evaluator 1-line FAIL, {verify_report_path} = {docs_path}qa_report.md. Model: if preset ≠ "default", use model_config.executor.

If FAIL — Layer 2 and layer2_retries >= 2:

Print:

[harness] Evaluate FAIL (Layer 2) — max retries reached (2/2)
  Failing items: {summary from qa_report.md}

Ask via AskUserQuestion (in user_lang):

header: "QA"
question: "Layer 2 structural verification failed after 2 retries. [failing items]"
options:
- "Fix" / "Run next round"
- "Accept as-is" / "Finish without fixing"

If "Fix": same as Layer 3 Fix below. If "Accept as-is": phase → "completed", proceed to Step 8.

If FAIL — Layer 3 (verify.layer2_result == "PASS") and rounds remaining (round < max_rounds):

Ask via AskUserQuestion (in user_lang):

header: "QA"
question: "QA result: FAIL (Layer 3). [failure summary from qa_report.md Fix Instructions]."
options:
- "Fix" / "Run next round to fix FAIL items"
- "Accept as-is" / "Finish without fixing"

If "Fix":

Increment round, reset verify.layer1_retries → 0, verify.layer1_result → null, verify.layer2_result → null, verify.layer2_retries → 0.
Update updated_at → now.
Go to Step 4 (Generate).

If "Accept as-is":

Update phase → "completed", updated_at → now.
Proceed to Step 8.

If FAIL and max rounds reached:

Update phase → "completed", updated_at → now. Print: [harness] Max rounds reached. Remaining issues in qa_report.md. Proceed to Step 8.

Step 8: Cleanup & Finalize

Artifact Cleanup Safety Guard

Before deleting any output directory:

Read docs_path from state.json. If missing/null: reconstruct as "docs/harness/<slug>/". Extract <slug> = last path segment before the trailing /.
Validate slug: If empty/null/whitespace → ABORT, warn user.
Path depth check: docs_path must be a relative path exactly one level below its parent directory. slug must NOT be memory (reserved), must NOT contain .. or /, must NOT be .. Always verify: Path(docs_path).resolve() ⊆ Path.cwd() (symlink escape prevention — no has_git condition). If any fail → ABORT.
Display before delete: Print exact target path before executing.

Full output base cleanup (only on explicit user request):

List all subdirectories with file counts.
If docs/harness/memory/ exists, warn separately: "Contains team knowledge from /memory skill."
Warn: "docs/ is git-ignored — all artifacts permanently deleted."
Confirm via AskUserQuestion (yes/no).

If has_git == true:

Ask via AskUserQuestion (in user_lang):

header: "Commit"
question: "Implementation complete. Choose how to finish:"
options:
- "Commit code only (Recommended)" / "Clean artifacts, commit code changes only"
- "Commit all" / "Commit everything including artifacts"
- "No commit" / "Clean .harness/ only, keep changes in working tree"

Actions (apply Safety Guard before each delete):

"Commit code only": (NEW in 8.4 — protect persisted spec artifacts) Apply this exact 5-step sequence:
1. (M8) Safety Guard validation on {docs_path} — apply the full Artifact Cleanup Safety Guard above (slug check + path depth + Path.cwd() containment) BEFORE any staging or deletion. If validation fails, ABORT: do NOT stage, do NOT delete. Surface the failed check to the user. Both .harness/ and {docs_path} remain intact for manual recovery.
2. Stage spec-persistence files for commit (only if the source file exists — silently skip missing files):
  - {docs_path}qa_notes.md
  - {docs_path}critic_findings.md
  - {docs_path}conventions.md
  (s4) Per-file staging failure handling: if git add <file> fails for a specific file (permission, .gitignore conflict, etc.), warn the user (in user_lang): "Failed to stage <file>: . Spec artifact may not be in git history." Continue with remaining files — do NOT abort the whole sequence on a single staging failure. The code commit (step 5) is more critical than any individual artifact preservation.
3. Delete .harness/ (the Safety Guard already validated the parent context).
4. Delete {docs_path} working-directory contents.
5. Commit code changes plus the staged spec artifacts.
(m2) git index vs working directory note: between step 2 (stage) and step 4 (delete working dir), the spec artifact files are staged in the git index but then removed from the working directory. This is correct behavior — git add captures a snapshot to the index at stage-time; a subsequent working-directory rm does NOT mark the staged files as deleted in the index (a working-tree-only delete after index-stage is a no-op for the index — it only changes the working tree, not the staged content). The final commit (step 5) therefore includes the 3 staged artifacts as additions even though they no longer exist on disk; they remain recoverable from git history.
"Commit all": delete .harness/, stage + commit {docs_path} + code
"No commit": delete .harness/ only

If has_git == false:

Inform user artifacts are in {docs_path}. Delete .harness/ only. No git operations.

Model Selection

Preset	executor	advisor	evaluator	verifier
default	(parent)	(parent)	(parent)	haiku (default)
all-opus	opus	opus	opus	haiku (default)
balanced	sonnet	opus	opus	haiku (default)
economy	haiku	sonnet	sonnet	haiku (default)

Planner Phase Sub-agents

Sub-agent	Role	default	all-opus	balanced	economy
Architect	advisor	(no override)	opus	opus	sonnet
Senior Developer	advisor	(no override)	opus	opus	sonnet
QA Specialist	advisor	(no override)	opus	opus	sonnet

Generator Phase Sub-agents

Sub-agent	Role	default	all-opus	balanced	economy
Lead Developer	executor	(no override)	opus	sonnet	haiku
Code Quality Advisor	advisor	(no override)	opus	opus	sonnet
Test & Stability Advisor	advisor	(no override)	opus	opus	sonnet
Combined Advisor	advisor	(no override)	opus	opus	sonnet
Generator (single)	executor	(no override)	opus	sonnet	haiku

Evaluator Phase Sub-agents

Sub-agent	Role	default	all-opus	balanced	economy
Evaluator	evaluator	(no override)	opus	opus	sonnet

Verify Phase Sub-agents

Sub-agent	Role	default	all-opus	balanced	economy
Verify (Layer 1)	verifier	haiku (default)	haiku (default)	haiku (default)	haiku (default)

Verifier defaults to haiku across all presets. Layer 1 only executes commands and parses exit codes — lowest-cost model is always sufficient. Override with --verifier-model sonnet|opus for sensitive mechanical verification (e.g., concurrency, complex test failures). Opt-in only. When set to sonnet or opus, a cost warning is shown in Setup Summary.

User Interaction Rules

All user-facing questions MUST use AskUserQuestion tool when available.

If AskUserQuestion available → use it
If not available or fails → present as text with numbered options
Every option: label (short) + description (specific)
"Other" (free text) is automatically appended
Translate all text to user_lang

Architecture Principles

The following 5 principles are invariant constraints for the harness Orchestrator.

Orchestrator reads no intermediate files. Exceptions:
- spec.md at plan gate
- qa_report.md at verdict gate
- verify_report.md path for user message
- verify_report.md failing-file extraction for Auto-fix Proposer dispatch: Orchestrator reads verify_report.md to extract failing file paths only (no content analysis). Extracted paths pass through Path Validator (kind=file_reference) and are capped at 5. See §Step 5 — Auto-fix dispatch for the exact procedure.
- Apply-before --- a/ / +++ b/ diff header lines (4 lines of metadata only — hunk body is delegated to Edit tool). This is NOT a violation of this principle.
Auto-fix Proposer is the only sub-agent that directly Reads source files. Other sub-agents receive content only through template variables.
Paths only to sub-agents; never file contents (ephemeral critiques/proposals at synthesis step excepted).
Session-wide invariants (see §State Machine — Auto-fix State Transition Table):
- Auto-fix: at most 1 attempt per session (verify.autofix_attempted once-only — not reset on round increment).
- Layer 1 retries: max 3. Do NOT reset after Auto-fix Apply.
All external paths pass through Path Validator before use (see §Path Validator below).

Path Validator

validate_path(path, kind) where kind ∈ {output_dir, file_reference, diff_target}

  0. (kind == output_dir only) Empty string → halt "output-dir cannot be empty."
  1. Normalize: \ → / (OS-independent). UNC (\\server\share or //server/share) → halt.
  2. Absolute path: ^/ or ^[A-Za-z]:/ → halt.
  3. Segment-level ..: path.split("/") — any segment == ".." → halt (exact segment match, not substring).
  4. kind-specific:
     - output_dir:
         First segment (path.split("/")[0]) ∉ {memory, spec, planner, generator,
         evaluator, verify, harness, .harness}.
         Special case (NEW in 8.4): if first segment == "docs", second segment
         MUST == "harness" (path startswith "docs/harness/"). Else halt:
         "output-dir under docs/ must be docs/harness/..." (allows /spec handoff
         path docs/harness/<slug>/ while still blocking other docs/* overrides.)
     - file_reference (failing_files_list):
         (a) relative path, (b) no .. segment, (c) inside repo_path,
         (d) outside .harness/, docs/harness/*, memory/.
     - diff_target (unified diff --- a/ / +++ b/ headers):
         file_reference conditions + inside scope filter +
         outside .harness/, docs/harness/, memory/, .git/.
  5. On failure: return specific halt message describing the violation.

Attack vector → Path Validator step mapping:

Attack vector	Blocked at step
`--output-dir .harness`	Step 4 (kind=output_dir, first segment reserved)
`--output-dir docs/../../etc`	Step 3 (segment `..` rejection)
`--output-dir \\server\share`	Step 1 (normalization + UNC rejection)
`--output-dir /absolute/path`	Step 2 (absolute path rejection)
`--output-dir memory/foo`	Step 4 (first segment reserved)
`--output-dir` (empty)	Step 0 (empty string, kind=output_dir)

Key Rules

Never skip phases. Always Plan → Generate → Verify → Evaluate.
Confirmation gates are non-negotiable. No implicit approval.
Stay within scope. Do not modify files outside scope.
Evaluator must be isolated. Anchor-free input. Never pass Generator reasoning.
Planner proposals must be independent. Never share one persona's work with another during proposal.
Generator advisors review the plan, not code. Advisory before implementation.
Use available skills. Search by keyword, not plugin name. Proceed without if none found.
User language. All user-facing output in user_lang.
Intermediate outputs are ephemeral. Only final artifacts preserved in docs/.
Orchestrator reads no intermediate files. See §Architecture Principles for full exception list.
1-line return parsing. Only first line of sub-agent return is used for state decisions.