تشغيل أي مهارة في Manus بنقرة واحدة

plan-iterate

Name: Plan Iterate
Author: grahama1970

Evidence-gated phase iteration for implementation plans. Use when a task must proceed phase by phase with deterministic artifacts, validation logs, named external reviewer verdicts, default scillm GPT-5.5 high review, optional reviewer comparison, blocker tracking, and fail-closed acceptance before advancing; especially for security, correctness, deployment, or report-hardening work.

تشغيل في Manus

بيانات المهارة

النجوم٣

التفرعات١

آخر تحديث٢٧ مايو ٢٠٢٦ في ١٥:٠٦

مستكشف الملفات

12 ملفات

SKILL.md

readonly

المزيد من هذا المستودع

نفس المستودع

surf

grahama1970/agent-skills

Unified browser automation for AI agents. Uses surf-cli extension when available (full features), falls back to CDP (zero-config). Navigate, read with element refs, click, type, screenshot.

2026-06-033

scillm

grahama1970/agent-skills

Universal LLM proxy on localhost:4001. Surfaces: chat/batch completions, scillm exec, OpenCode serve (coding delegate), OpenCode transport (DAG/SSE), standing Codex agents. Chutes, Gemini, Claude/Codex OAuth, OpenCode Go, Ollama. Auto-routes by model name. ZIP/PDF, JSON repair, batch pools.

2026-06-013

ask

grahama1970/agent-skills

Zero cognitive-load learning and querying skill. Learn about a topic or persona (e.g., "Lisa Feldman Barrett") by discovering, ingesting, and extracting knowledge — or ask questions against what's been learned. Supports multi-hour deep learning with progress tracking, persona profiles, and nightly incremental updates. Uses Federated Taxonomy for multi-hop graph traversal across knowledge domains. Composes: dogpile, discover-books, ingest-youtube, fetcher, extractor, memory, taxonomy, task-monitor.

2026-05-303

ccopy

grahama1970/agent-skills

Copy the last complete Cursor user/assistant turn to the clipboard (Codex-style /copy). On modern Cursor Agent installs reads ~/.cursor/projects/*/agent-transcripts/*.jsonl when SQLite bubbleId rows are absent. Use for ccopy, cursor copy, or export last Cursor turn.

2026-05-293

phart-dag-chart

grahama1970/agent-skills

Validate ask/scillm DAG JSON and render PHART 1.5 ASCII decision-tree charts for terminals and dry-run output. DAG.json in → chart on stdout or actionable errors on stderr (no tracebacks). Python 3.14+ with PHART from github.com/scottvr/phart.

2026-05-273

best-practices-skills

grahama1970/agent-skills

Best practices for designing and structuring agent skills: SKILL.md frontmatter rules, triggers, progressive disclosure, and when to use scripts vs references.

2026-05-273

المصدر

grahama1970

grahama1970/agent-skills

فتح مستودع GitHub عرض مستودعات المنشئ

Install

Download

تشغيل في Manus

مفيد لـSOC

محللو ضمان جودة البرمجيات والمختبرونمهن الحاسوب والرياضيات15-1253L4

تشغيل أي مهارة بنقرة واحدة

name	plan-iterate
description	Evidence-gated phase iteration for implementation plans. Use when a task must proceed phase by phase with deterministic artifacts, validation logs, named external reviewer verdicts, default scillm GPT-5.5 high review, optional reviewer comparison, blocker tracking, and fail-closed acceptance before advancing; especially for security, correctness, deployment, or report-hardening work.

Plan Iterate

Use this skill to control implementation phases:

define phase -> implement -> validate locally -> package evidence -> external review -> patch blockers -> accept -> advance next phase or stop blocked

It does not replace /plan, /review-plan, /orchestrate, $hack, or review-extraction. It verifies whether an implemented phase satisfied its contract with evidence.

Parent Role

$plan-iterate is the parent phase controller. Domain review skills run inside the phase; they do not replace the phase ledger or acceptance gate.

Deterministic Outer Loop Contract

$plan-iterate must be implemented as a state machine, not as an instruction to a project agent to "keep going." A project-agent message is never a terminal condition. Only the $plan-iterate controller can make the phase terminal, and it may emit PASS/accepted only when both the stored aggregation gate and the deterministic validation/evidence gates pass.

Canonical control flow:

while overall goals remain unmet:
  select next runtime-ready phase from the plan graph
  if no runtime-ready phase exists but an agent-actionable blocked/failing phase exists: resume that phase
  if no runtime-ready or agent-actionable phase exists: stop only with proven BLOCKED or HUMAN_REQUIRED
  while iteration < externally_configured_max_iterations:
    project agent applies exactly one patch iteration
    project agent writes project-agent-result.json
    deterministic validators run and write logs/artifacts
    relevant $review-* skills build read-only bundles/artifacts
    $scillm gpt-5.5 high aggregates those bundles into review_result.json
    controller reads review_result.json
    if PASS: mark phase complete/accepted when deterministic evidence also passes, then immediately advance to the next runtime-ready phase
    if NEEDS_CHANGES: create the next patch iteration plan and continue
    if BLOCKED or INSUFFICIENT_EVIDENCE: stop and ask/interview human
    if max iterations reached: stop and ask/interview human

Cross-Phase Continuation Contract

When the human has supplied clear overall goals and a plan graph or active phase sequence exists, $plan-iterate continues across phase boundaries until one of the explicit stop states occurs. accepted is terminal only for the current phase. It is not permission for the project agent to yield, summarize, or wait for another prompt when the next phase is runtime-ready.

After a phase reaches accepted, the controller must inspect the active plan graph and do exactly one of the following:

start the next runtime-ready phase in the same run/session;
mark all overall goals complete only when there is no remaining phase or unmet acceptance contract item, and project knowledge does not name remaining executable work; or
stop with BLOCKED, MAX_ITERATIONS_REACHED, or HUMAN_REQUIRED and write the exact missing runtime fields, repeated blocker, exhausted iteration limit, or human decision needed to proceed.

If no later phase exists but PROJECT_KNOWLEDGE.md names a current blocker, unmet goal, or Next agent-executable candidate, the controller must not emit OVERALL_COMPLETE. For agent-executable remaining work it emits decision=PROJECT_GOALS_REMAIN, should_continue=true, and next_action.type=create_next_phase_from_project_knowledge. For unmet work that lacks an agent-executable next candidate, it stops with HUMAN_REQUIRED.

Do not ask the human to say "proceed" between accepted phases when the next phase has clear goals, runtime readiness, and no unresolved blocker. A handoff sentence such as "Next phase is ..." is not a controller terminal state and must not replace actually starting the next phase. The externally configured maximum iteration count limits patch attempts inside a phase; it does not convert a successful phase acceptance into a whole-plan stop.

Non-Ambiguous Continuation And Stop Taxonomy

$plan-iterate decisions are mechanical. The project agent must never infer a stop from phase prose, a ledger label, or an accepted current phase.

Controller decision meanings:

CONTINUE                The agent must keep working. No final response is allowed.
PROJECT_GOALS_REMAIN    The agent must create/resume a phase from project knowledge.
OVERALL_COMPLETE        A final response is allowed only after guard-final also permits it.
BLOCKED                 A final/blocking response is allowed only with repeated escalated stop evidence.
MAX_ITERATIONS_REACHED  A final/blocking response is allowed only when the configured iteration cap is exhausted.
HUMAN_REQUIRED          A final/blocking response is allowed only for a named human-only dependency.

HUMAN_REQUIRED must not be used for:

a failed validator with an obvious diagnostic;
a missing artifact the agent can regenerate;
a reviewer needs_changes finding;
external_review_blocked without repeated escalated blocker evidence;
a later phase that is blocked/failing but still agent-actionable;
missing runtime fields when the agent can create or supersede the phase from the known product goal.

external_review_blocked and local_validation_failed are recovery states by default. They become hard stops only when the phase ledger proves all of the following:

the active blocker repeated across at least two focused loops;
the blocker is marked escalated:true;
$dogpile or a recorded not-applicable rationale has been evaluated when outside research could plausibly help;
the stop record names the exact credential, admin action, policy decision, unavailable client/device, or product-scope decision that the agent cannot discover or safely assume.

After an accepted phase, the controller must check, in order:

later runtime-ready phases;
later agent-actionable local_validation_failed or external_review_blocked phases;
PROJECT_KNOWLEDGE.md for Next agent-executable candidate or equivalent remaining work;
non-actionable repeated/escalated blockers;
only then OVERALL_COMPLETE.

Any final response that claims completion, readiness, blocked status, or human need before this ordering has been evaluated is invalid.

Required project-agent result shape:

{
  "iteration": 2,
  "status": "patch_applied | blocked | needs_human",
  "files_changed": [],
  "commands_run": [],
  "remaining_blockers": [],
  "requires_human": false
}

Required controller terminal states:

PASS
BLOCKED
MAX_ITERATIONS_REACHED
HUMAN_REQUIRED

done, looks good, ready, empty output, stopped output, or any other project-agent prose must be ignored as closure evidence. If the project agent stops without a gate terminal state, the controller records a non-status event or blocker reason. This record is not a PHASE_STATUS.status value:

{
  "event_state": "needs_attention",
  "phase_status": "external_review_blocked",
  "reason": "project_agent_stopped_without_terminal_verdict",
  "safe_default": "do_not_mark_complete",
  "resume_hint": "Resume from review_result.json blockers."
}

The maximum iteration count must be supplied by the caller/plan graph. Do not hardcode a max iteration value inside prompts or project-agent instructions.

Before any final response or handoff summary, run the continuation guard against the active repo ledger:

/home/graham/workspace/experiments/agent-skills/skills/plan-iterate/run.sh guard-final --repo-root . --message-file /tmp/final-message.txt

The guard emits plan_iterate.continuation_guard.v1. If block_final=true, do not send the final response. If should_continue=true, no final response is allowed at all: continue execution and use only non-final progress updates while tools are still running. Do not downgrade to BLOCKED, MAX_ITERATIONS_REACHED, or HUMAN_REQUIRED. Those stop labels are allowed only when the guard also says should_continue=false and the message names the exact non-agent dependency. A Codex Stop hook calls the same guard, so this is a required preflight, not optional documentation.

guard-final must also block final responses when all phase ledgers are terminal but PROJECT_KNOWLEDGE.md names agent-executable remaining work. In that case the only valid action is to create or resume the next phase; a summary may be emitted only as a non-final progress update.

Fixable Work Is Not A Stop

Do not treat local_validation_failed, external_review_blocked, a failed canary, a network diagnostic failure, a missing artifact, or a reviewer needs_changes finding as permission to yield. Those are phase ledger states, not proof that the project agent is blocked.

A project agent may stop only when the controller state is one of the explicit hard stops and the stop record proves why the agent cannot take the next action:

HUMAN_REQUIRED: a credential, policy choice, external account/admin action, unavailable client device, or product-scope decision cannot be discovered or safely assumed.
MAX_ITERATIONS_REACHED: the externally configured phase iteration limit is exhausted and the current unresolved issues are recorded.
BLOCKED: the same blocker survived the required focused implementation or validation loops, the $dogpile escalation was evaluated when applicable, and the phase ledger records the repeated blocker, occurrence count, escalation evidence, and exact human decision or external dependency.

If the blocker is first-pass, has an obvious next diagnostic, has an alternate implementation path, or can be worked around without violating the product contract, it is agent-actionable. Continue debugging, patch the implementation, or create/supersede a phase that proves the same product goal through the fastest acceptable path. For example, if a Tailscale/Funnel route is failing but the product goal is "client can use Sparta Chat within five days", the agent must either continue Tailscale diagnostics or record a superseding client-access phase for another authenticated route; it must not stop merely because the original route failed.

Use this routing by default:

Phase Surface	Domain Loop / Reviewer	Deterministic Evidence
UI / UX / visual design	`$review-design`	`$test-interactions`, `$surf` screenshots/CDP
Code implementation	`$review-code`	tests, typecheck, lint, build, runtime smoke
Prompt contract	`$review-prompt`	validators, expected-response fixtures, consumer smoke
Security / extraction / compliance	domain-specific review skill	raw proof artifacts and domain validators

For design phases, $review-design should own the design reviewer loop, and $test-interactions should run inside that loop as the deterministic live-DOM verifier and focused screenshot source. $plan-iterate records the resulting validation logs, screenshots, reviewer receipts, blocker ledger, and acceptance decision.

For $review-design iterate, $review-code loops, and $review-prompt loops, the main project agent must treat reviewer calls as read-only unless the human explicitly authorizes an isolated implementation worker with a clear write scope. The reviewer batch may write review artifacts and suggested changes, but it must not edit production code, mutate the implementation, or mark the phase accepted. $plan-iterate records each loop as domain_review_loops[] with mutates_production=false, the reviewer persona, immutable goal, context artifact, relevant best-practices-* skill inputs, discrete end_state, pointers to state.json, events.jsonl, aggregate_verdict.json, and a human-facing matrix.

Each domain review iteration must also create or update three project-agent plan artifacts:

implementation/patch plan: what the main project agent will change or intentionally decline to change;
validation/evidence plan: deterministic commands, screenshots, fixtures, validators, or smoke checks that will prove the next state;
review/escalation plan: which read-only reviewer loop runs next, what it will inspect, and when $dogpile, $ask, WebGPT, or human clarification is required.

These are not three separate phase plans. They are per-round control artifacts inside the current $plan-iterate phase so a reviewer loop cannot silently become an implementation owner.

For code phases, $review-code may run a code reviewer loop, but tests, typecheck, lint, build, runtime smoke, and any required rendered verification still decide whether implementation evidence is admissible. Reviewer patches or diffs are suggestions to the main project agent, not repository ownership.

For prompt phases, standalone $review-prompt may own a coded prompt-improvement loop over an explicit workspace or candidate artifacts. When embedded inside $plan-iterate, $review-prompt is read-only against production scope and may mutate only candidate artifacts or a temporary workspace. If the human authorizes a mutating worker, record that as a separate project-agent patch iteration or implementation-worker artifact, not as a domain_review_loops[] reviewer entry. The main project agent applies any prompt, schema, fixture, validator, or consumer-smoke changes to production files. $plan-iterate records the terminal audit and blocks acceptance when deterministic prompt gates fail. Model findings cannot override validator, fixture, expected-response, schema, or consumer-smoke failures.

Core Rules

Do not mark a phase done from prose, status text, or absence of errors.
Keep raw artifacts, validation logs, and reviewer responses in the phase directory.
Treat scillm/WebGPT/external review as a bounded checkpoint, not an outsourced reasoning loop.
Give $scillm bounded, replayable progress context on repeated reviews; do not rely on hidden CLI transcript state.
Name the adjudicator for every review result: webgpt, scillm, human, or deterministic_verifier.
A reviewer verdict is a receipt, not closure. Only the $plan-iterate controller may close a phase, and only after both the stored aggregation gate and deterministic validation/evidence gates pass.
Domain reviewer/subagent calls must be read-only: they may critique and suggest changes to the main project agent, but the main project agent owns all file edits, Memory-significant mutations, deterministic validation, and phase state.
Domain review loops must leave a machine-readable and human-readable trace: state.json, events.jsonl, aggregate_verdict.json, a context artifact, a relevant best-practices-* skill list, three per-iteration plan artifacts, and a matrix suited to the surface: screenshot-to-finding for design, file/diff-to-finding for code, or prompt-fixture/gate-to-finding for prompts. Every loop event must end in a discrete state such as verified, needs_patch, blocked, failed, or skipped.
Canonical domain_review_loops[].end_state values are verified, needs_patch, blocked, failed, and skipped. Domain skills may keep local states, but must map them into this enum before phase aggregation: satisfied -> verified, needs_changes -> needs_patch, insufficient_evidence -> blocked, halted with blockers -> blocked, waiting_review/running -> failed, and non-applicable skip artifacts with verified applicability -> skipped.
Do not call heuristic classification, label copy, or report generation a review unless a named reviewer actually reviewed replayable evidence.
Use reviewer comparison when the phase crosses a trust boundary or prior solo iteration produced a false green.
Escalate before closure when raw evidence disagrees with report claims, a blocker repeats twice, or the human disproves a green claim.
When a blocker repeats after two focused implementation/review loops, run $dogpile for source-derived outside insight before another solo patch attempt, unless the blocker is provably local-only and external research has no plausible value. Record the dogpile query, report path, useful findings, and any "not applicable" rationale in phase progress context.
If the same blocker repeats after applying or evaluating $dogpile findings, stop execution instead of continuing solo iteration. Set the phase to external_review_blocked or local_validation_failed as appropriate, write the repeated blocker and dogpile summary into phase progress context, and ask the human concise clarifying questions tied to the unresolved decision or missing evidence.
Run canaries before batching: one positive, one negative, one ambiguous/insufficient-evidence, one expected failure, and one regression fixture.
For security-sensitive work, require raw proof artifacts, command logs, manifest hashes, and post-patch verification before accepting closure claims.

Phase States

Use only these states:

planned
implementing
local_validation_failed
ready_for_review
external_review_blocked
external_review_passed
accepted
superseded
abandoned

A phase is complete only when its status is accepted.

CLI

Run commands from the repository root that owns the phase work:

skills/plan-iterate/run.sh init --phase phase-01-report-cleanup
skills/plan-iterate/run.sh record-skill-context --phase phase-01-report-cleanup --context /tmp/headless-skill-context.md
skills/plan-iterate/run.sh record-context --phase phase-01-report-cleanup --context /tmp/phase-context.md
skills/plan-iterate/run.sh record-plan-graph --phase phase-01-report-cleanup --graph /tmp/phase-plan.dag.json
skills/plan-iterate/run.sh inspect-plan-graph --phase phase-01-report-cleanup --graph /tmp/phase-plan.dag.json
skills/plan-iterate/run.sh continue --phase phase-01-report-cleanup
skills/plan-iterate/run.sh package --phase phase-01-report-cleanup --output /tmp/phase-01-review.zip
skills/plan-iterate/run.sh record-review --phase phase-01-report-cleanup --verdict blocked --review scillm-gpt55-review.md
skills/plan-iterate/run.sh status

Use --root DIR when the phase state should live outside the current repository. The default root is .plan-iterate.

continue is the controller-owned cross-phase decision command. It emits a machine-readable plan_iterate.continue_decision.v1 JSON object with decision, controller_terminal_state, stop, should_continue, active_phase, and next_action. After an accepted phase, it refuses to treat the plan as yielded when a later runtime-ready phase exists or when PROJECT_KNOWLEDGE.md names an agent-executable next candidate. Later runtime-ready phases return decision=CONTINUE and name the next phase. Project-knowledge continuations return decision=PROJECT_GOALS_REMAIN and next_action.type=create_next_phase_from_project_knowledge. It returns a stopping decision only for explicit controller states such as BLOCKED, HUMAN_REQUIRED, or MAX_ITERATIONS_REACHED. Project agents should invoke $interview only when continue returns a stopping decision that requires human input or the agent is genuinely blocked/confused.

Run the medium-complexity local sanity before relying on the skill after edits:

skills/plan-iterate/sanity.sh

The sanity creates a temporary git repo, proves missing skill context fails closed, records headless skill context, records progress context, packages a review ZIP, records two distinct passing reviewer receipts against the same bundle, and closes reviewer comparison.

Phase Status Contract

Each phase owns:

.plan-iterate/<phase-id>/
  PHASE_STATUS.json
  PHASE_REVIEW_REQUEST.md
  reviews/

PHASE_STATUS.json must use schema plan_iterate.phase_status.v1 and include:

{
  "schema": "plan_iterate.phase_status.v1",
  "phase_id": "phase-03-evolution-decision-log",
  "plan_id": "",
  "status": "ready_for_review",
  "implementation_summary": "...",
  "acceptance_contract": [],
  "changed_files": [],
  "validation_commands": [],
  "evidence_artifacts": [],
  "progress_context_artifacts": [],
  "skill_context_artifacts": [],
  "plan_graph_artifacts": [],
  "active_plan_graph_artifact": "",
  "domain_review_loops": [],
  "review_artifacts": [],
  "known_caveats": [],
  "claims": [],
  "blockers": [],
  "memory_context": {
    "collection": "plan_iterate_phase_context",
    "keys": []
  },
  "reviewer_policy": {
    "required": true,
    "comparison_required": false,
    "closure_rule": "deterministic_validation_and_external_review"
  },
  "review_results": [],
  "review_comparison": {
    "agreement": "pending",
    "closure_allowed": false,
    "reason": ""
  },
  "review_status": "pending"
}

Validation command entries must include command, exit_code, and optional log. Claims must cite evidence artifacts. Blockers with two or more occurrences must be marked escalated: true. Project agents may create a DAG JSON and record it with record-plan-graph. The graph is the replayable plan input; PHASE_STATUS.json is the ledger. A recorded graph must include exec_graph_version, graph_id, graph_goal, positive integer self_improvement_iterations, optional review_iteration_limits with integer review_code, review_design, and review_prompt maxima, review_fanout_limits with integer review_code, review_design, and review_prompt maxima, and a non-empty nodes[] list. Newly recorded graphs must also have a plan_id. If the source graph omits plan_id, record-plan-graph generates one from graph_id, writes it into the stored graph copy, records it in PHASE_STATUS.plan_id, and records it on the matching plan_graph_artifacts[] entry. Existing older ledgers without plan_id remain readable history and should not be invalidated solely because they predate this field. When review_iteration_limits is omitted, each domain inherits self_improvement_iterations. Review fanout nodes must include review_scopes[]; each scope must name scope, model, agent, contract, review_level, and proof_level. active_plan_graph_artifact must point at an entry in plan_graph_artifacts. For cross-phase traceability, record-plan-graph also maintains a plan index:

.plan-iterate/plans/<plan-id>/
  PLAN_STATUS.json
  phases.json

PLAN_STATUS.json and phases.json map the durable plan_id to every phase ledger that has recorded a graph for that plan. The phase ledger remains the source of truth for phase acceptance; the plan index answers which phases belong to the same overall plan and which phase was most recently updated. Recording a graph also writes a sibling runtime_readiness_artifact with schema plan_iterate.graph_runtime_readiness.v1. The readiness artifact is the human/project-agent surface for missing execution fields. It lists every node, the adapter that would be used, required fields, present fields, missing fields, inferred fields, and whether the graph can be compiled to runtime execution. inspect-plan-graph prints the same report without recording it.

Recording a graph must also write a sibling plan_ascii_artifact, normally plan-graphs/<timestamp>-<graph-id>-plan-ascii.md. This artifact is the always-available human-readable execution map. It must show:

the graph id, graph goal, active phase ledger status, and legend;
every plan node in dependency order;
which nodes are completed, active, pending, blocked, manual, or runtime-ready;
which groups are sequential and which same-depth dependency lanes may run concurrently;
missing runtime fields and the next action for nodes that are blocked or manual.

The ASCII artifact is informational, not closure evidence. It may mark a node completed only from explicit node completion fields, an accepted matching phase ledger, or another deterministic status source recorded in the phase. Unknown or unproven work stays pending/manual/blocked; do not infer completion from intent, proximity, or reviewer prose.

Semantic node execution readiness is fail-closed:

review-code needs review context, files, and review_scopes[].scope, model, agent, contract, review_level, and proof_level. Files may be inferred from PHASE_STATUS.changed_files; context is never inferred.
review-design needs a persona plus screenshots or a test-interactions manifest.
test-interactions needs a manifest, or a URL plus persona, and an output directory.
review-prompt needs the prompt/template, concrete fixture/context, expected response, validator or smoke command, and consumer/schema.
project-agent nodes are manual until they include an explicit command, handoff, or task plus acceptance evidence.
Runtime scillm.exec nodes (local_command, scillm_call, scillm_batch, codex_exec, claude_print, deterministic render/verifier) must include the command, model, prompt, items, or manifest required by their runtime type.

domain_review_loops entries must include skill, persona, immutable_goal, context_artifact, non-empty best_practice_skills, state_artifact, events_artifact, aggregate_artifact, matrix_artifact, iteration_plans, end_state, and mutates_production:false. best_practice_skills entries must name best-practices-* skills that were available to the reviewer loop. matrix_artifact is the human-facing ledger for the domain loop: screenshots/findings for design, files/diffs/findings for code, or prompt fixtures/gates/findings for prompt contracts.

Each iteration_plans[] entry must include a positive round and exactly these three phase-relative artifacts:

implementation_plan_artifact
validation_plan_artifact
review_plan_artifact

The referenced context, state, event, aggregate, matrix, and iteration-plan artifacts must be listed in evidence_artifacts, review_artifacts, or progress_context_artifacts. external_review_passed and accepted cannot include unresolved loop end states: needs_patch, blocked, or failed. review_results must cite stored review_artifacts with response/request/bundle/receipt hashes, phase_subject_sha256, skill_context_sha256, invocation metadata, and recorded_at. For repeated reviews, review_results after the first must include progress_context_sha256, and the phase must include progress_context_artifacts or Arango-backed memory_context.keys. Normalized review_results[].verdict values are pass, needs_changes, blocked, insufficient_evidence, conditional_pass, malformed, and no_verdict. accepted fails closed on any unresolved non-pass review result or stale-subject pass. Every prior non-pass or stale-subject result must have resolution_status before the controller may emit accepted. When a repeated review passes after a prior blocked, needs_changes, or insufficient_evidence, conditional_pass, malformed, or no_verdict result, keep the earlier result as audit history and mark it with resolution_status: "resolved_by_later_pass" plus the resolving reviewer and timestamp. Do not delete or rewrite the earlier reviewer artifact. When a repeated review passes after an earlier PASS that was bound to an older phase subject, keep the earlier PASS as audit history and mark it with resolution_status: "superseded_by_later_pass" plus the resolving reviewer and timestamp. The latest unsuperseded PASS is the closure evidence. Implementation claims may cite only deterministic evidence_artifacts, not reviewer receipts. accepted always requires hashed changed file bytes, hashed deterministic evidence artifacts, hashed validation logs bound to the current phase subject, at least one claim covering the acceptance contract, and passing deterministic validation/evidence gates. If reviewer_policy.required=false, only external review/comparison is waived; deterministic validation is never waived. For review-gated $plan-iterate phases, accepted additionally requires review_comparison.closure_allowed=true and a current stored aggregation gate PASS. phase_subject_sha256 binds the acceptance contract, changed file paths and hashes, deterministic evidence, validation commands, claims, reviewer policy, known caveats, blockers, progress context references, skill context references, and memory context references.

Review Bundle

package creates a ZIP containing:

PHASE_STATUS.json
PHASE_REVIEW_REQUEST.md
changed-files.diff
manifest.json
validation-logs/
evidence-artifacts/
progress-context/
skill-context/
domain-review-loops/
plan-graphs/*-plan-ascii.md
reviews/

The package fails closed when claims lack artifacts, evidence files are missing, progress context files are missing, skill context files are missing, validation commands lack exit codes, accepted validation logs lack current hashes, review results lack named adjudicators/provenance, review artifact hashes do not match current bytes, repeated reviews lack progress context, accepted phases have unresolved non-pass review results or stale-subject passes, repeated blockers are un-escalated, reviewer comparison does not allow closure for review-gated phases, or accepted phases lack passing validation and deterministic evidence.

External Reviewer Loop

Use external reviewers only at phase checkpoints, canaries, or escalation triggers:

project agent implements phase
plan-iterate packages evidence
domain review skill bundles the phase completion for review
scillm/WebGPT/human reviews the domain bundle and phase evidence
project agent records verdict
project agent patches exact blockers
repeat until pass or stop

Preserve reviewer outputs as reviews/<timestamp>-<reviewer>-response.md through record-review.

Repeated Blocker Research Escalation

$plan-iterate should not keep cycling through the same local hypothesis when a blocker survives two focused patch/validation/review attempts. At that point, run $dogpile as the default research escalation before the next patch attempt.

Use $dogpile to search for:

upstream library issues, examples, and implementation patterns;
similar GitHub bugs, PRs, and code paths;
current documentation or migration notes;
known UI, browser, accessibility, or framework edge cases;
design precedents when the blocker is product/UX clarity rather than code.

The dogpile output is advisory, not closure. The project agent still owns the patch, deterministic validation, screenshots, tests, and reviewer loop. Store the dogpile report or partial-result artifacts in the phase as progress context and cite only deterministic evidence for acceptance claims.

If $dogpile returns an ambiguity handoff, ask the human or narrow the query from phase context before spending more review cycles. If provider lanes degrade, record the degraded provider status and use successful sources; do not treat a partial dogpile failure as total research failure.

If the blocker still repeats after $dogpile has been evaluated, stop the loop. Do not run another implementation pass from the same hypothesis. The stop record must include:

the blocker text and occurrence count;
the validation/review artifacts showing the repeated failure;
the dogpile query, report path, and relevant findings or degraded sources;
what was tried after dogpile;
the exact human clarification needed to proceed.

Ask at most three concise questions. Each question should map to a decision that cannot be resolved from code, deterministic evidence, reviewer receipts, or dogpile results.

Each phase completion review should use the domain-appropriate bundle:

$review-design bundle or loop artifacts for UI/UX phases. It must include fresh rendered screenshots and, when interaction matters, $test-interactions results plus focused/container screenshot evidence. For $review-design iterate, include the loop state.json, events.jsonl, aggregate_verdict.json, per-section verdicts, and the screenshot/review matrix, immutable goal, context artifact, relevant best-practice skills, and three per-round plans so the main project agent can evaluate the loop without reading raw model transcripts.
$review-code bundle for code phases. It must include the current scoped diff, relevant tests/checks, expected contracts, non-goals, known caveats, blockers, progress context, and headless skill context. For $review-code loops, include state.json, events.jsonl, aggregate_verdict.json, per-round reviewer verdicts, applied/rejected patch records, test results, immutable goal, context artifact, relevant best-practice skills, three per-round plans, and a file/diff-to-finding matrix. The project agent applies patches and deterministic tests/checks decide closure.
$review-prompt final audit bundle for prompt-contract phases. It must include prompt templates, concrete fixtures, expected responses, validators, smoke output, scoring/keep decisions, and any $ask/WebGPT final gate artifacts. For $review-prompt loops, include state.json, events.jsonl, aggregate_verdict.json or terminal audit.json, per-round model requests and responses, score/decision artifacts, validator/smoke outputs, and a prompt-fixture/gate-to-finding matrix, immutable goal, context artifact, relevant best-practice skills, and three per-round plans. Deterministic gates and keep/revert decisions decide whether a prompt candidate is admissible.

The default reviewer path for phase-level semantic review is the domain-appropriate bundle -> $scillm gpt-5.5 high reasoning. The reviewer may critique the phase; the project agent still owns code changes, prompt changes, UI changes, and deterministic validation.

For mixed phases, build all applicable domain review bundles concurrently:

$review-code bundle/artifacts   -> code findings
$review-design bundle/artifacts -> visual/interaction findings
$review-prompt final audit      -> prompt-contract findings or skipped_fail_closed with verdict not_applicable_verified

Then send the relevant bundle set plus deterministic evidence to $scillm gpt-5.5 with top-level reasoning_effort: "high" and no max_tokens. The $scillm response is the phase gate artifact and must answer:

{
  "verdict": "PASS | NEEDS_CHANGES | BLOCKED | INSUFFICIENT_EVIDENCE",
  "goals_met": false,
  "highest_severity": "critical | high | medium | low | none",
  "issues": [],
  "next_plan": [],
  "human_questions": []
}

Any unresolved medium, high, or critical issue means the controller must create the next project-agent patch plan or stop for human input; the phase cannot be accepted.

Default reviewer:

scillm:gpt-5.5-high: default external reviewer for targeted phase/code/contract review. Use $scillm with model: "gpt-5.5" and top-level reasoning_effort: "high", preferably streaming for long review bundles. It is replayable, scriptable, and exposes machine-readable reasoning proof.

Escalation and comparison reviewer use:

webgpt: optional escalation for strategy, taxonomy decisions, report clarity, cross-project process review, or an independent human-facing judgment check using the real $ask WebGPT route ($ask webgpt or --oracle-backend webgpt). $surf is only the browser transport owned by $ask; do not call it directly for review work, and do not make WebGPT the default when $scillm can review the same bundle.
scillm:gpt-5.5-high: hard semantic or multimodal canaries; expensive, not for broad batches.
scillm:claude-sonnet-high: adversarial plan, prompt, and implementation critique.
scillm:gemini-flash-high: long-context PDF/report review when latency and cost matter.
scillm:oc-kimi: bounded visual batches after canaries prove the prompt, schema, and image attachment contract.
scillm:opencode-deepseek: text-only code or schema review; do not use for visual bbox decisions.
human: policy, semantic, or residual ambiguity that the project agent and reviewers cannot resolve.

Record the default scillm GPT-5.5 high review:

Use a replayable request JSON that includes model: "gpt-5.5" and top-level reasoning_effort: "high". For long bundles, prefer SSE streaming and preserve the request JSON, SSE/raw response, extracted review markdown, and final scillm_reasoning proof. Do not set max_tokens; reasoning models can consume internal reasoning tokens and low caps can produce empty output.

For headless reviewers and subprocess agents, record a compact skill context artifact before packaging review evidence. Treat headless calls as skill-blind unless the request explicitly includes the relevant skill names, absolute SKILL.md paths, runtime entrypoints, artifact protocol, and role boundaries. The skill context should state which component is orchestrator, reviewer, implementer, memory store, and human escalation path.

Record headless skill context:

skills/plan-iterate/run.sh record-skill-context \
  --phase phase-01-report-cleanup \
  --context /tmp/headless-skill-context.md

For the default phase-review loop, the context should mention at minimum $plan-iterate, $review-code, $scillm, $memory, and $interview. Add $code-runner or $subagent-runner only when that phase actually uses them.

For repeated reviews, include a bounded progress context artifact and store the same compact context in ArangoDB through $memory using the plan_iterate_phase_context collection. ArangoDB $memory is the default source of progress history; local progress-context/ files are hashable mirrors for bundle replay, not the primary history store. The context should be source-derived: prior reviewer findings, blocker ledger, decision log, current delta, and artifact hashes. Store only compact progress context in ArangoDB; keep large raw artifacts on disk and reference them by path/hash.

Record progress context before a non-first review:

skills/plan-iterate/run.sh record-context \
  --phase phase-01-report-cleanup \
  --context /tmp/phase-01-progress-context.md \
  --memory-key phase-01-review-002

record-context copies the context into progress-context/, records its SHA-256 in progress_context_artifacts, and adds the ArangoDB key to memory_context.keys. By default it writes the compact context to $memory via /upsert. Use --skip-memory-upsert only for isolated tests or when memory is operationally unavailable; report that gap because repeated reviews should not depend only on hidden CLI state or local fallback files.

skills/plan-iterate/run.sh record-review \
  --phase phase-01-report-cleanup \
  --reviewer scillm-gpt55-high \
  --adjudicator-kind scillm \
  --verdict needs_changes \
  --review /tmp/scillm-gpt55-review.md \
  --review-request /tmp/scillm-request.json \
  --review-bundle /tmp/phase-review.zip \
  --invocation-command "curl /v1/chat/completions model=gpt-5.5 reasoning_effort=high" \
  --invocation-receipt /tmp/scillm-http-response.json \
  --model gpt-5.5

Record an optional WebGPT escalation review:

skills/ask/run.sh ask "Review the phase bundle at /tmp/phase-review.zip. Return PASS, NEEDS_CHANGES, or BLOCKED with specific fixes." \
  --oracle \
  --oracle-backend webgpt \
  --webgpt-tab-id 837343233 \
  --ask-id phase-01-webgpt-review \
  --json

skills/plan-iterate/run.sh record-review \
  --phase phase-01-report-cleanup \
  --reviewer webgpt \
  --adjudicator-kind webgpt \
  --verdict passed \
  --review .ask_artifacts/runs/phase-01-webgpt-review/review.md \
  --review-request .ask_artifacts/runs/phase-01-webgpt-review/phase-01-webgpt-review.request.json \
  --review-bundle /tmp/phase-review.zip \
  --invocation-command "ask webgpt --webgpt-tab-id 837343233 --deep-review-target /tmp/phase-review.zip" \
  --invocation-receipt .ask_artifacts/runs/phase-01-webgpt-review/phase-01-webgpt-review.status.json

When reviewer_policy.comparison_required=true, record-review --verdict passed stores each reviewer receipt without marking the phase externally passed. After all required distinct reviewers are recorded, set the comparison explicitly:

skills/plan-iterate/run.sh record-comparison \
  --phase phase-01-report-cleanup \
  --agreement agree \
  --closure-allowed \
  --reason "scillm GPT-5.5 high and comparison reviewer both passed the same bundle."

Reviewer Comparison

Set reviewer_policy.comparison_required=true when:

the phase changes correctness, extraction, security, compliance, deployment, memory, or user-facing verification;
the human disproved an earlier green claim;
a blocker survived two implementation attempts;
the phase combines code, prompt, schema, data, and UI/report judgment.

Comparison is explicit state, not an implied mood:

{
  "review_comparison": {
    "agreement": "agree",
    "closure_allowed": true,
    "reason": "scillm GPT-5.5 high and comparison reviewer agree; deterministic validation passed."
  }
}

Use only these agreement values:

pending
agree
partial
disagree
insufficient

If reviewers disagree, duplicate the same reviewer, return conditional_pass, return partial, or evidence is insufficient, set status=external_review_blocked, keep closure_allowed=false, patch the blocker, and rerun the relevant validation/review. Do not batch or accept the phase from a split review. review_comparison.closure_allowed=true is valid only when agreement=agree. Comparison-required phases require all passing reviewers to reference the same review_bundle_sha256, and every review result must match the current phase_subject_sha256 computed from the acceptance contract, changed file paths and hashes, deterministic evidence, validation commands, and claims.

name	plan-iterate
description	Evidence-gated phase iteration for implementation plans. Use when a task must proceed phase by phase with deterministic artifacts, validation logs, named external reviewer verdicts, default scillm GPT-5.5 high review, optional reviewer comparison, blocker tracking, and fail-closed acceptance before advancing; especially for security, correctness, deployment, or report-hardening work.

Plan Iterate

Use this skill to control implementation phases:

define phase -> implement -> validate locally -> package evidence -> external review -> patch blockers -> accept -> advance next phase or stop blocked

It does not replace /plan, /review-plan, /orchestrate, $hack, or review-extraction. It verifies whether an implemented phase satisfied its contract with evidence.

Parent Role

$plan-iterate is the parent phase controller. Domain review skills run inside the phase; they do not replace the phase ledger or acceptance gate.

Deterministic Outer Loop Contract

Canonical control flow:

while overall goals remain unmet:
  select next runtime-ready phase from the plan graph
  if no runtime-ready phase exists but an agent-actionable blocked/failing phase exists: resume that phase
  if no runtime-ready or agent-actionable phase exists: stop only with proven BLOCKED or HUMAN_REQUIRED
  while iteration < externally_configured_max_iterations:
    project agent applies exactly one patch iteration
    project agent writes project-agent-result.json
    deterministic validators run and write logs/artifacts
    relevant $review-* skills build read-only bundles/artifacts
    $scillm gpt-5.5 high aggregates those bundles into review_result.json
    controller reads review_result.json
    if PASS: mark phase complete/accepted when deterministic evidence also passes, then immediately advance to the next runtime-ready phase
    if NEEDS_CHANGES: create the next patch iteration plan and continue
    if BLOCKED or INSUFFICIENT_EVIDENCE: stop and ask/interview human
    if max iterations reached: stop and ask/interview human

Cross-Phase Continuation Contract

After a phase reaches accepted, the controller must inspect the active plan graph and do exactly one of the following:

start the next runtime-ready phase in the same run/session;
mark all overall goals complete only when there is no remaining phase or unmet acceptance contract item, and project knowledge does not name remaining executable work; or
stop with BLOCKED, MAX_ITERATIONS_REACHED, or HUMAN_REQUIRED and write the exact missing runtime fields, repeated blocker, exhausted iteration limit, or human decision needed to proceed.

Non-Ambiguous Continuation And Stop Taxonomy

$plan-iterate decisions are mechanical. The project agent must never infer a stop from phase prose, a ledger label, or an accepted current phase.

Controller decision meanings:

CONTINUE                The agent must keep working. No final response is allowed.
PROJECT_GOALS_REMAIN    The agent must create/resume a phase from project knowledge.
OVERALL_COMPLETE        A final response is allowed only after guard-final also permits it.
BLOCKED                 A final/blocking response is allowed only with repeated escalated stop evidence.
MAX_ITERATIONS_REACHED  A final/blocking response is allowed only when the configured iteration cap is exhausted.
HUMAN_REQUIRED          A final/blocking response is allowed only for a named human-only dependency.

HUMAN_REQUIRED must not be used for:

a failed validator with an obvious diagnostic;
a missing artifact the agent can regenerate;
a reviewer needs_changes finding;
external_review_blocked without repeated escalated blocker evidence;
a later phase that is blocked/failing but still agent-actionable;
missing runtime fields when the agent can create or supersede the phase from the known product goal.

external_review_blocked and local_validation_failed are recovery states by default. They become hard stops only when the phase ledger proves all of the following:

the active blocker repeated across at least two focused loops;
the blocker is marked escalated:true;
$dogpile or a recorded not-applicable rationale has been evaluated when outside research could plausibly help;
the stop record names the exact credential, admin action, policy decision, unavailable client/device, or product-scope decision that the agent cannot discover or safely assume.

After an accepted phase, the controller must check, in order:

later runtime-ready phases;
later agent-actionable local_validation_failed or external_review_blocked phases;
PROJECT_KNOWLEDGE.md for Next agent-executable candidate or equivalent remaining work;
non-actionable repeated/escalated blockers;
only then OVERALL_COMPLETE.

Any final response that claims completion, readiness, blocked status, or human need before this ordering has been evaluated is invalid.

Required project-agent result shape:

{
  "iteration": 2,
  "status": "patch_applied | blocked | needs_human",
  "files_changed": [],
  "commands_run": [],
  "remaining_blockers": [],
  "requires_human": false
}

Required controller terminal states:

PASS
BLOCKED
MAX_ITERATIONS_REACHED
HUMAN_REQUIRED

{
  "event_state": "needs_attention",
  "phase_status": "external_review_blocked",
  "reason": "project_agent_stopped_without_terminal_verdict",
  "safe_default": "do_not_mark_complete",
  "resume_hint": "Resume from review_result.json blockers."
}

The maximum iteration count must be supplied by the caller/plan graph. Do not hardcode a max iteration value inside prompts or project-agent instructions.

Before any final response or handoff summary, run the continuation guard against the active repo ledger:

/home/graham/workspace/experiments/agent-skills/skills/plan-iterate/run.sh guard-final --repo-root . --message-file /tmp/final-message.txt

Fixable Work Is Not A Stop

A project agent may stop only when the controller state is one of the explicit hard stops and the stop record proves why the agent cannot take the next action:

HUMAN_REQUIRED: a credential, policy choice, external account/admin action, unavailable client device, or product-scope decision cannot be discovered or safely assumed.
MAX_ITERATIONS_REACHED: the externally configured phase iteration limit is exhausted and the current unresolved issues are recorded.
BLOCKED: the same blocker survived the required focused implementation or validation loops, the $dogpile escalation was evaluated when applicable, and the phase ledger records the repeated blocker, occurrence count, escalation evidence, and exact human decision or external dependency.

Use this routing by default:

Phase Surface	Domain Loop / Reviewer	Deterministic Evidence
UI / UX / visual design	`$review-design`	`$test-interactions`, `$surf` screenshots/CDP
Code implementation	`$review-code`	tests, typecheck, lint, build, runtime smoke
Prompt contract	`$review-prompt`	validators, expected-response fixtures, consumer smoke
Security / extraction / compliance	domain-specific review skill	raw proof artifacts and domain validators

Each domain review iteration must also create or update three project-agent plan artifacts:

implementation/patch plan: what the main project agent will change or intentionally decline to change;
validation/evidence plan: deterministic commands, screenshots, fixtures, validators, or smoke checks that will prove the next state;
review/escalation plan: which read-only reviewer loop runs next, what it will inspect, and when $dogpile, $ask, WebGPT, or human clarification is required.

These are not three separate phase plans. They are per-round control artifacts inside the current $plan-iterate phase so a reviewer loop cannot silently become an implementation owner.

Core Rules

Do not mark a phase done from prose, status text, or absence of errors.
Keep raw artifacts, validation logs, and reviewer responses in the phase directory.
Treat scillm/WebGPT/external review as a bounded checkpoint, not an outsourced reasoning loop.
Give $scillm bounded, replayable progress context on repeated reviews; do not rely on hidden CLI transcript state.
Name the adjudicator for every review result: webgpt, scillm, human, or deterministic_verifier.
A reviewer verdict is a receipt, not closure. Only the $plan-iterate controller may close a phase, and only after both the stored aggregation gate and deterministic validation/evidence gates pass.
Domain reviewer/subagent calls must be read-only: they may critique and suggest changes to the main project agent, but the main project agent owns all file edits, Memory-significant mutations, deterministic validation, and phase state.
Domain review loops must leave a machine-readable and human-readable trace: state.json, events.jsonl, aggregate_verdict.json, a context artifact, a relevant best-practices-* skill list, three per-iteration plan artifacts, and a matrix suited to the surface: screenshot-to-finding for design, file/diff-to-finding for code, or prompt-fixture/gate-to-finding for prompts. Every loop event must end in a discrete state such as verified, needs_patch, blocked, failed, or skipped.
Canonical domain_review_loops[].end_state values are verified, needs_patch, blocked, failed, and skipped. Domain skills may keep local states, but must map them into this enum before phase aggregation: satisfied -> verified, needs_changes -> needs_patch, insufficient_evidence -> blocked, halted with blockers -> blocked, waiting_review/running -> failed, and non-applicable skip artifacts with verified applicability -> skipped.
Do not call heuristic classification, label copy, or report generation a review unless a named reviewer actually reviewed replayable evidence.
Use reviewer comparison when the phase crosses a trust boundary or prior solo iteration produced a false green.
Escalate before closure when raw evidence disagrees with report claims, a blocker repeats twice, or the human disproves a green claim.
When a blocker repeats after two focused implementation/review loops, run $dogpile for source-derived outside insight before another solo patch attempt, unless the blocker is provably local-only and external research has no plausible value. Record the dogpile query, report path, useful findings, and any "not applicable" rationale in phase progress context.
If the same blocker repeats after applying or evaluating $dogpile findings, stop execution instead of continuing solo iteration. Set the phase to external_review_blocked or local_validation_failed as appropriate, write the repeated blocker and dogpile summary into phase progress context, and ask the human concise clarifying questions tied to the unresolved decision or missing evidence.
Run canaries before batching: one positive, one negative, one ambiguous/insufficient-evidence, one expected failure, and one regression fixture.
For security-sensitive work, require raw proof artifacts, command logs, manifest hashes, and post-patch verification before accepting closure claims.

Phase States

Use only these states:

planned
implementing
local_validation_failed
ready_for_review
external_review_blocked
external_review_passed
accepted
superseded
abandoned

A phase is complete only when its status is accepted.

CLI

Run commands from the repository root that owns the phase work:

skills/plan-iterate/run.sh init --phase phase-01-report-cleanup
skills/plan-iterate/run.sh record-skill-context --phase phase-01-report-cleanup --context /tmp/headless-skill-context.md
skills/plan-iterate/run.sh record-context --phase phase-01-report-cleanup --context /tmp/phase-context.md
skills/plan-iterate/run.sh record-plan-graph --phase phase-01-report-cleanup --graph /tmp/phase-plan.dag.json
skills/plan-iterate/run.sh inspect-plan-graph --phase phase-01-report-cleanup --graph /tmp/phase-plan.dag.json
skills/plan-iterate/run.sh continue --phase phase-01-report-cleanup
skills/plan-iterate/run.sh package --phase phase-01-report-cleanup --output /tmp/phase-01-review.zip
skills/plan-iterate/run.sh record-review --phase phase-01-report-cleanup --verdict blocked --review scillm-gpt55-review.md
skills/plan-iterate/run.sh status

Use --root DIR when the phase state should live outside the current repository. The default root is .plan-iterate.

Run the medium-complexity local sanity before relying on the skill after edits:

skills/plan-iterate/sanity.sh

Phase Status Contract

Each phase owns:

.plan-iterate/<phase-id>/
  PHASE_STATUS.json
  PHASE_REVIEW_REQUEST.md
  reviews/

PHASE_STATUS.json must use schema plan_iterate.phase_status.v1 and include:

{
  "schema": "plan_iterate.phase_status.v1",
  "phase_id": "phase-03-evolution-decision-log",
  "plan_id": "",
  "status": "ready_for_review",
  "implementation_summary": "...",
  "acceptance_contract": [],
  "changed_files": [],
  "validation_commands": [],
  "evidence_artifacts": [],
  "progress_context_artifacts": [],
  "skill_context_artifacts": [],
  "plan_graph_artifacts": [],
  "active_plan_graph_artifact": "",
  "domain_review_loops": [],
  "review_artifacts": [],
  "known_caveats": [],
  "claims": [],
  "blockers": [],
  "memory_context": {
    "collection": "plan_iterate_phase_context",
    "keys": []
  },
  "reviewer_policy": {
    "required": true,
    "comparison_required": false,
    "closure_rule": "deterministic_validation_and_external_review"
  },
  "review_results": [],
  "review_comparison": {
    "agreement": "pending",
    "closure_allowed": false,
    "reason": ""
  },
  "review_status": "pending"
}

.plan-iterate/plans/<plan-id>/
  PLAN_STATUS.json
  phases.json

the graph id, graph goal, active phase ledger status, and legend;
every plan node in dependency order;
which nodes are completed, active, pending, blocked, manual, or runtime-ready;
which groups are sequential and which same-depth dependency lanes may run concurrently;
missing runtime fields and the next action for nodes that are blocked or manual.

Semantic node execution readiness is fail-closed:

review-code needs review context, files, and review_scopes[].scope, model, agent, contract, review_level, and proof_level. Files may be inferred from PHASE_STATUS.changed_files; context is never inferred.
review-design needs a persona plus screenshots or a test-interactions manifest.
test-interactions needs a manifest, or a URL plus persona, and an output directory.
review-prompt needs the prompt/template, concrete fixture/context, expected response, validator or smoke command, and consumer/schema.
project-agent nodes are manual until they include an explicit command, handoff, or task plus acceptance evidence.
Runtime scillm.exec nodes (local_command, scillm_call, scillm_batch, codex_exec, claude_print, deterministic render/verifier) must include the command, model, prompt, items, or manifest required by their runtime type.

Each iteration_plans[] entry must include a positive round and exactly these three phase-relative artifacts:

implementation_plan_artifact
validation_plan_artifact
review_plan_artifact

Review Bundle

package creates a ZIP containing:

PHASE_STATUS.json
PHASE_REVIEW_REQUEST.md
changed-files.diff
manifest.json
validation-logs/
evidence-artifacts/
progress-context/
skill-context/
domain-review-loops/
plan-graphs/*-plan-ascii.md
reviews/

External Reviewer Loop

Use external reviewers only at phase checkpoints, canaries, or escalation triggers:

project agent implements phase
plan-iterate packages evidence
domain review skill bundles the phase completion for review
scillm/WebGPT/human reviews the domain bundle and phase evidence
project agent records verdict
project agent patches exact blockers
repeat until pass or stop

Preserve reviewer outputs as reviews/<timestamp>-<reviewer>-response.md through record-review.

Repeated Blocker Research Escalation

Use $dogpile to search for:

upstream library issues, examples, and implementation patterns;
similar GitHub bugs, PRs, and code paths;
current documentation or migration notes;
known UI, browser, accessibility, or framework edge cases;
design precedents when the blocker is product/UX clarity rather than code.

If the blocker still repeats after $dogpile has been evaluated, stop the loop. Do not run another implementation pass from the same hypothesis. The stop record must include:

the blocker text and occurrence count;
the validation/review artifacts showing the repeated failure;
the dogpile query, report path, and relevant findings or degraded sources;
what was tried after dogpile;
the exact human clarification needed to proceed.

Ask at most three concise questions. Each question should map to a decision that cannot be resolved from code, deterministic evidence, reviewer receipts, or dogpile results.

Each phase completion review should use the domain-appropriate bundle:

$review-design bundle or loop artifacts for UI/UX phases. It must include fresh rendered screenshots and, when interaction matters, $test-interactions results plus focused/container screenshot evidence. For $review-design iterate, include the loop state.json, events.jsonl, aggregate_verdict.json, per-section verdicts, and the screenshot/review matrix, immutable goal, context artifact, relevant best-practice skills, and three per-round plans so the main project agent can evaluate the loop without reading raw model transcripts.
$review-code bundle for code phases. It must include the current scoped diff, relevant tests/checks, expected contracts, non-goals, known caveats, blockers, progress context, and headless skill context. For $review-code loops, include state.json, events.jsonl, aggregate_verdict.json, per-round reviewer verdicts, applied/rejected patch records, test results, immutable goal, context artifact, relevant best-practice skills, three per-round plans, and a file/diff-to-finding matrix. The project agent applies patches and deterministic tests/checks decide closure.
$review-prompt final audit bundle for prompt-contract phases. It must include prompt templates, concrete fixtures, expected responses, validators, smoke output, scoring/keep decisions, and any $ask/WebGPT final gate artifacts. For $review-prompt loops, include state.json, events.jsonl, aggregate_verdict.json or terminal audit.json, per-round model requests and responses, score/decision artifacts, validator/smoke outputs, and a prompt-fixture/gate-to-finding matrix, immutable goal, context artifact, relevant best-practice skills, and three per-round plans. Deterministic gates and keep/revert decisions decide whether a prompt candidate is admissible.

For mixed phases, build all applicable domain review bundles concurrently:

$review-code bundle/artifacts   -> code findings
$review-design bundle/artifacts -> visual/interaction findings
$review-prompt final audit      -> prompt-contract findings or skipped_fail_closed with verdict not_applicable_verified

{
  "verdict": "PASS | NEEDS_CHANGES | BLOCKED | INSUFFICIENT_EVIDENCE",
  "goals_met": false,
  "highest_severity": "critical | high | medium | low | none",
  "issues": [],
  "next_plan": [],
  "human_questions": []
}

Any unresolved medium, high, or critical issue means the controller must create the next project-agent patch plan or stop for human input; the phase cannot be accepted.

Default reviewer:

scillm:gpt-5.5-high: default external reviewer for targeted phase/code/contract review. Use $scillm with model: "gpt-5.5" and top-level reasoning_effort: "high", preferably streaming for long review bundles. It is replayable, scriptable, and exposes machine-readable reasoning proof.

Escalation and comparison reviewer use:

webgpt: optional escalation for strategy, taxonomy decisions, report clarity, cross-project process review, or an independent human-facing judgment check using the real $ask WebGPT route ($ask webgpt or --oracle-backend webgpt). $surf is only the browser transport owned by $ask; do not call it directly for review work, and do not make WebGPT the default when $scillm can review the same bundle.
scillm:gpt-5.5-high: hard semantic or multimodal canaries; expensive, not for broad batches.
scillm:claude-sonnet-high: adversarial plan, prompt, and implementation critique.
scillm:gemini-flash-high: long-context PDF/report review when latency and cost matter.
scillm:oc-kimi: bounded visual batches after canaries prove the prompt, schema, and image attachment contract.
scillm:opencode-deepseek: text-only code or schema review; do not use for visual bbox decisions.
human: policy, semantic, or residual ambiguity that the project agent and reviewers cannot resolve.

Record the default scillm GPT-5.5 high review:

Record headless skill context:

skills/plan-iterate/run.sh record-skill-context \
  --phase phase-01-report-cleanup \
  --context /tmp/headless-skill-context.md

Record progress context before a non-first review:

skills/plan-iterate/run.sh record-context \
  --phase phase-01-report-cleanup \
  --context /tmp/phase-01-progress-context.md \
  --memory-key phase-01-review-002

skills/plan-iterate/run.sh record-review \
  --phase phase-01-report-cleanup \
  --reviewer scillm-gpt55-high \
  --adjudicator-kind scillm \
  --verdict needs_changes \
  --review /tmp/scillm-gpt55-review.md \
  --review-request /tmp/scillm-request.json \
  --review-bundle /tmp/phase-review.zip \
  --invocation-command "curl /v1/chat/completions model=gpt-5.5 reasoning_effort=high" \
  --invocation-receipt /tmp/scillm-http-response.json \
  --model gpt-5.5

Record an optional WebGPT escalation review:

skills/ask/run.sh ask "Review the phase bundle at /tmp/phase-review.zip. Return PASS, NEEDS_CHANGES, or BLOCKED with specific fixes." \
  --oracle \
  --oracle-backend webgpt \
  --webgpt-tab-id 837343233 \
  --ask-id phase-01-webgpt-review \
  --json

skills/plan-iterate/run.sh record-review \
  --phase phase-01-report-cleanup \
  --reviewer webgpt \
  --adjudicator-kind webgpt \
  --verdict passed \
  --review .ask_artifacts/runs/phase-01-webgpt-review/review.md \
  --review-request .ask_artifacts/runs/phase-01-webgpt-review/phase-01-webgpt-review.request.json \
  --review-bundle /tmp/phase-review.zip \
  --invocation-command "ask webgpt --webgpt-tab-id 837343233 --deep-review-target /tmp/phase-review.zip" \
  --invocation-receipt .ask_artifacts/runs/phase-01-webgpt-review/phase-01-webgpt-review.status.json

skills/plan-iterate/run.sh record-comparison \
  --phase phase-01-report-cleanup \
  --agreement agree \
  --closure-allowed \
  --reason "scillm GPT-5.5 high and comparison reviewer both passed the same bundle."

Reviewer Comparison

Set reviewer_policy.comparison_required=true when:

the phase changes correctness, extraction, security, compliance, deployment, memory, or user-facing verification;
the human disproved an earlier green claim;
a blocker survived two implementation attempts;
the phase combines code, prompt, schema, data, and UI/report judgment.

Comparison is explicit state, not an implied mood:

{
  "review_comparison": {
    "agreement": "agree",
    "closure_allowed": true,
    "reason": "scillm GPT-5.5 high and comparison reviewer agree; deterministic validation passed."
  }
}

Use only these agreement values:

pending
agree
partial
disagree
insufficient