| name | plan-executor:run-reviewer-team |
| description | Use when a single parallel reviewer-team run is needed — launches the frozen Claude + Codex + Gemini + Security reviewer set (security:big-toni when available, plan-executor:lite-security-reviewer as fallback), collects all outputs, triages findings, and returns a structured review report. |
Run Reviewer Team
This skill executes one reviewer-team run: it freezes the four-reviewer set, builds reviewer prompts, dispatches all four reviewers in parallel as sub-agents, collects every output, triages findings, and returns a self-contained review report.
It does NOT decide whether to fix, retry, or escalate. That logic belongs to the caller.
Required Inputs
plan_context — plan path or relevant plan excerpts that define the expected implementation
execution_outputs — description or summary of what was built or changed during execution
changed_files — list of files created or modified
language — detected primary language of the changed files
recipe_list — recipe skills relevant to the changed code (used to build reviewer prompts)
prior_review_context — prior triage history for this review loop; must include already-fixed, rejected, and deferred findings so reviewers do not re-raise resolved items; pass empty object {} on the first run
Optional inputs (deviation journal)
deviation_journal_path (optional) — absolute path to <execution_root>/.plan-executor/deviations.jsonl.
deviation_digest (optional) — rendered digest of the journal as built by the orchestrator's between-wave read.
An empty digest is normal; surface it to reviewers as "no prior deviations" and proceed.
If any required input is missing, stop immediately and return status: blocked with the missing field in notes.
Reviewer Set
The frozen reviewer set for every invocation is exactly four reviewers:
- Claude — launched as a focused sub-agent via the Agent tool
- Codex — launched via the Bash tool invoking the local
codex CLI (codex --dangerously-bypass-approvals-and-sandbox exec)
- Gemini — launched via the Bash tool invoking the local
gemini CLI (gemini --yolo -p)
- Security — launched as a focused sub-agent via the Agent tool. Skill selection follows the availability check in "Security reviewer skill selection" below.
All four must be launched in the same parallel batch. Do not launch them sequentially.
A run is not complete until all four reviewer outputs have been collected. Do not produce a triage report from a partial batch.
CLI invocation parameters are fixed: codex --dangerously-bypass-approvals-and-sandbox exec for Codex and gemini --yolo -p for Gemini. JSON stream output is disabled. Do NOT pass --json to codex and do NOT pass -o stream-json to gemini. Plain stdout output is what this skill consumes.
If a reviewer tool is unavailable — Claude sub-agent dispatch fails, the codex binary is missing or returns a non-zero exit before producing any output, the gemini binary is missing or returns a non-zero exit before producing any output, or the security sub-agent dispatch fails — return status: blocked with the tool name and a concrete availability error in notes. Do not substitute a different reviewer or reduce the set below four.
The security reviewer is the one exception: if security:big-toni is not available, the orchestrator substitutes plan-executor:lite-security-reviewer as described below. This substitution is NOT a block — it is the documented fallback.
Security reviewer skill selection
Before building the four reviewer prompts, decide which skill the security reviewer will use.
- Check whether
security:big-toni appears in the current session's available-skills list (the list provided in system-reminders by the harness).
- If
security:big-toni is present, use it. This is the preferred path.
- If
security:big-toni is NOT present, use plan-executor:lite-security-reviewer instead. Record this substitution in the run's notes field (e.g. "security reviewer: lite fallback (security:big-toni not installed)").
- If neither skill is available, return
status: blocked with security reviewer unavailable in notes.
Do NOT fall back to a generic sub-agent without a security skill. The security slot must always be filled by one of the two named skills.
Language detection and Claude recipe skill loading
Before building the Claude reviewer prompt, determine the project language and the matching production-code / test-code recipe skills. The Claude reviewer MUST invoke those skills at the start of its run so the review is anchored to the project's documented code standards.
Language resolution
- If the caller passed a non-empty
language input, use it verbatim (lower-case it for lookup).
- If
language is missing, empty, or unknown, infer it from the extensions of changed_files using this precedence (first match wins):
.ts, .tsx, .mts, .cts → typescript
.py, .pyi → python
.go → go
.rs → rust
- If neither the caller nor the extension check yields a known language, record
language: unknown and skip recipe loading — do NOT block the run.
Recipe skill mapping
| Language | Production skill | Test skill |
|---|
typescript | typescript-services:production-code-recipe | typescript-services:test-code-recipe |
python | python-services:production-code-recipe | python-services:test-code-recipe |
go | go-services:go-expert-recipe | go-services:go-reviewer-recipe |
rust | rust-services:production-code-recipe | rust-services:test-code-recipe |
Availability check
For each mapped skill, check whether it appears in the current session's available-skills list (the list provided in system-reminders by the harness).
- If a mapped skill is present, include it in the Claude recipe load list.
- If a mapped skill is missing, omit it and add a note to the run (e.g.
"claude recipe: typescript-services:test-code-recipe not installed, skipped"). Do NOT block.
- If both the production-code and test-code skills for the detected language are missing, record
claude recipe: no project recipes available for <language> and proceed without recipe loading.
Merging with caller-provided recipes
If the caller passed a non-empty recipe_list, merge the caller's list with the mapped list. Deduplicate by skill name. The Claude recipe load list is the union.
Claude reviewer prompt obligation
The Claude reviewer prompt MUST include a preamble that tells Claude to invoke every skill in the resolved recipe load list via the Skill tool BEFORE running any review step. Each invocation is for standards context — Claude should treat the returned content as the authoritative rules for its review.
This obligation applies only to the Claude reviewer. Codex, Gemini, and the security reviewer receive recipe context per their existing contracts and do NOT receive the Skill-tool preamble.
Reviewer Prompt Contract
Build one prompt per reviewer. Each prompt must include:
- the review scope: changed files, plan context, execution summary
- language and recipe context
- prior review context: already-fixed findings, rejected findings, deferred findings
- the reporting contract below
Claude prompt exception: The Claude prompt MUST begin with a "Load project recipe skills first" preamble that lists the resolved recipe load list from "Language detection and Claude recipe skill loading" and instructs Claude to invoke each one via the Skill tool before running any review step. If the resolved list is empty (unknown language, or no recipes available for the language), the preamble states that no project recipes were resolved and Claude proceeds without recipe context. Claude reviews the changed files against those loaded standards.
Gemini prompt exception: Gemini is invoked via the local gemini CLI as a single-shot headless run (gemini --yolo -p) with the prompt body delivered on stdin. Although the gemini CLI does have filesystem access, it does not reliably execute git diff exploration the way Claude / Codex sub-agents do — so the Gemini prompt MUST still include the full diff of every changed file inline (run git diff or equivalent and embed the output). Without the inline diff, Gemini will hallucinate or review stale base content instead of the actual changes. Claude, Codex, and Security sub-agents have filesystem access and can run git diff themselves — do NOT embed the diff in their prompts.
Security reviewer exception: The security reviewer prompt MUST invoke the skill selected in "Security reviewer skill selection" above as its entry point (use the Skill tool with skill: "security:big-toni" or skill: "plan-executor:lite-security-reviewer", depending on availability), providing the review scope and changed files as arguments. When security:big-toni is used it does NOT receive the standard recipe list or language context — security:big-toni determines its own methodology. When plan-executor:lite-security-reviewer is used it receives the review scope, changed files, prior review context, and the reporting contract directly (it has its own built-in checklist, so no recipe list is needed). In both cases the security reviewer MUST receive the prior review context and the reporting contract below so it does not re-raise already-resolved findings.
Reporting contract to include in every reviewer prompt:
Report only findings within the current review scope. For each finding include: file path, line reference if applicable, a concrete description, and your reasoning. Classify every finding as one of:
FIX_REQUIRED — real, in-scope, must be fixed
VERIFIED_FIX — a prior FIX_REQUIRED issue that is now correctly fixed
REJECTED — invalid, out of scope, or based on incorrect assumptions
DEFERRED — real but intentionally left unresolved (must state reason)
Do not re-raise findings already marked fixed, rejected, or deferred in prior review context unless you have new evidence that invalidates the prior decision. Do not make code changes directly.
Use deviation entries as leads. When deviation_digest is non-empty, include it verbatim in every reviewer's prompt with the heading ## Prior deviations to verify. Tell reviewers: "Re-read the evidence cited in each deviation before accepting the claim. Do not suppress a finding solely because a deviation exists." Stale or unverifiable evidence is itself a finding.
Execution
-
Validate all required inputs are present.
-
Run the security reviewer skill-selection check (see "Security reviewer skill selection"). Record the chosen skill and, if the fallback was selected, add a note to the run for later inclusion in the report.
-
Run the language detection and Claude recipe load-list resolution (see "Language detection and Claude recipe skill loading"). Record the detected language and the final recipe load list. Add notes for any missing mapped skills.
-
Build one reviewer prompt per reviewer using the contract above.
-
Write each reviewer's prompt to a temp file under the working directory (e.g. .tmp-reviewer-claude.md, .tmp-reviewer-codex.md, .tmp-reviewer-gemini.md, .tmp-reviewer-security.md) so the large prompt bodies (which include the inlined diff) are not passed on the command line.
-
Launch all four reviewers in a single parallel batch (issue all four tool calls in one assistant message — do NOT chain them):
- Claude — Agent tool (subagent_type: general-purpose). Pass the Claude reviewer prompt verbatim.
- Codex — Bash tool. Concrete command:
codex --dangerously-bypass-approvals-and-sandbox exec - < <abs-path-to-codex-prompt-file> (the - positional makes codex exec read instructions from stdin). Do NOT add --json.
- Gemini — Bash tool. Concrete command:
gemini --yolo -p "Begin the review now per the instructions provided on stdin." < <abs-path-to-gemini-prompt-file> (Gemini appends the -p value to anything it reads from stdin, so the stdin-piped prompt body forms the bulk of the input and -p ends with a kickoff sentence). Do NOT add -o stream-json.
- Security — Agent tool (subagent_type: general-purpose), invoking either
security:big-toni or plan-executor:lite-security-reviewer based on the skill-selection check.
Pick a sensible Bash timeout for the Codex and Gemini invocations (e.g. timeout 600) so a hung CLI cannot stall the run indefinitely. A non-zero exit from codex or gemini whose stdout is empty MUST be surfaced as a status: blocked per "Reviewer Set" above; a non-zero exit with non-empty stdout is treated as a normal review with whatever output was produced.
-
Wait for all four outputs before proceeding.
-
Triage every finding from every reviewer into exactly one bucket:
FIX_REQUIRED
VERIFIED_FIX
REJECTED
DEFERRED
- Deduplicate across reviewers: if multiple reviewers raise the same issue, merge into one finding and note that N reviewers agreed.
-
Produce the review report (see Completion Contract).
Completion Contract
Return one structured report with these fields:
status — complete | blocked
reviewer_set — list of the four reviewers used. For the security reviewer entry, record the actual skill used (security:big-toni or plan-executor:lite-security-reviewer).
attempt_note — free-text note about this run (e.g. first attempt, retry N). MUST mention the lite fallback when it was selected, the detected language, and any skipped Claude recipes.
detected_language — the resolved project language (typescript | python | go | rust | unknown).
claude_recipes_loaded — list of recipe skill names actually included in the Claude preamble.
findings — list of triaged findings; each entry:
id — short unique identifier (e.g. F1, F2)
category — one of FIX_REQUIRED | VERIFIED_FIX | REJECTED | DEFERRED
file — affected file path
description — concrete description of the finding
reasoning — reviewer reasoning
reviewers — which reviewer(s) raised this finding
deferred_reason — populated only for DEFERRED
triage_summary — counts per category: fix_required, verified_fix, rejected, deferred
notes — any blocker detail, tool errors, or observations
status: complete
All four reviewers ran and produced output. Triage is complete. findings and triage_summary are populated.
status: blocked
A required tool was unavailable or a required input was missing. findings and triage_summary may be empty or partial. notes must contain the exact blocker.