Spawns multiple local CLI coding agents in parallel to review the same code change, then
synthesizes their findings into one report. Each panelist runs in its own subprocess with
no shared conversation state — they only see the prompt and the diff.
-
Pick a target. Default to --uncommitted. Map the user's phrasing to a flag:
- "PR 27" / "pr #27" / "/panel-review pr 27" / a
github.com/<owner>/<repo>/pull/N
URL → --pr <N or URL> (requires the gh CLI).
- "vs main" / "against develop" without a PR number →
--base <branch>.
- Specific SHA →
--commit <sha>.
- Otherwise →
--uncommitted. Ask only if the intent is genuinely ambiguous.
-
Pick panelists. Default: every supported CLI on PATH (codex, claude, opencode,
gemini). The user may name a subset.
-
Capture optional focus. If the user gave context ("look closely at the auth
changes"), pass --focus.
-
Decide if this needs deep mode. Default is read-only fan-out — fast and safe.
Pass --checkout if the user wants panelists to actually run tests, grep callers
in the PR's checkout, and chase downstream effects (asks like "really dig in",
"run the tests", "deep review", "check what this breaks downstream"). --checkout
only works with --pr / --base / --commit, not --uncommitted/--staged,
and it gives panelists write/exec access in a throwaway worktree — strictly less
safe than the default. Surface that tradeoff briefly to the user before opting
in if they didn't ask for it explicitly.
-
Run the script (path: skills/panel-review/panel-review.sh). You MUST
launch it as a background Bash (Bash tool with run_in_background: true)
and poll progress with BashOutput on the returned bash_id every 10
seconds until every panelist has emitted its done (exit N) heartbeat.
This skill explicitly overrides the default harness guidance that says
"do not poll background tasks — you'll be notified when they complete."
That guidance is wrong for this workflow: the heartbeats and per-section
streaming exist precisely so the coordinator can give the user live
progress, and waiting for the single completion notification defeats that.
Poll every 10 seconds. Do not back off to 30s/60s/90s — those long sleeps
are the symptom of obeying the wrong guidance.
Do NOT launch the script via the Agent tool / TaskCreate / any
subagent mechanism. Subagents run in a separate Claude context and there
is no streaming-output API for in-flight subagents — the only thing the
parent sees is the subagent's final response when it terminates. The
script's stderr heartbeats become invisible, the per-section streaming is
useless, and progress polling silently degrades into hacks like
sleep 90 && grep panel-review: tasks/<id>.output | tail against the
subagent's transcript file. If you catch yourself reaching for the Agent
tool here, stop and use background Bash instead.
Do NOT poll by shelling out to sleep N && grep against any output
file. BashOutput is the only correct progress-polling mechanism for
this script — it returns new stdout/stderr since the last call, including
the stderr heartbeats, with no parsing required.
Reason all of this matters: panelists run in parallel, but Codex is slow
and dominates wall clock. Without backgrounded Bash + BashOutput, the
call blocks silently for minutes and the user sees nothing.
If you absolutely cannot use run_in_background (rare — usually only when
the harness lacks BashOutput), run it in the foreground and pass
timeout: 600000 (10 min) since the default 2-minute Bash timeout will
kill the call before Codex returns. You will lose live progress in this
mode; warn the user.
-
Set up live progress UX. Right before (or right after) launching the script:
- Call
TodoWrite with one todo per chosen panelist (Review: codex,
Review: claude, …) plus a final Synthesize findings todo. Initial
statuses: the first panelist's in_progress, the rest pending. (Some
harnesses expose this as TaskCreate / TaskUpdate instead — use whichever
todo-list tool your environment provides.)
- Each
BashOutput poll: scan stderr for panel-review: <name> started
and panel-review: <name> done (exit N). On a started, set that
panelist's todo to in_progress (re-call TodoWrite with the updated list).
On a done, set it to completed, post a single-line user-visible
status (✓ <name> done — N findings, top severity <SEV> or ✓ <name> — NO_FINDINGS / ✗ <name> failed (exit N)), and immediately post that
panelist's full ## <name> (exit N) section to chat as soon as it
appears in stdout. Do not wait until every panelist is done — surfacing
each section the moment it lands gives the user actionable findings 5–10
minutes before synthesis is possible. The synthesis step still adds value
by deduping/ranking across panelists; it is not a substitute for the
individual sections.
- After the last
done heartbeat, set Synthesize findings to in_progress,
proceed to steps 7–8, then mark it completed.
-
Read the script's combined output — it prints one section per panelist with their
raw findings, plus a tempdir path containing each panelist's stdout/stderr. Wait for
all panelists to finish before synthesizing; partial output is fine to show the
user during the wait, but consensus / disagreement analysis needs every panelist's
verdict.
-
Synthesize the findings in your reply to the user:
- Consensus — issues raised by 2+ panelists, deduplicated. List file:line + the
core problem and a suggested fix.
- Unique findings — per panelist, only the findings no one else mentioned that
still pass the "would a competent reviewer ask for this change" bar.
- Action list — must-fix → should-fix → optional polish.
- Disagreements — if panelists contradict each other, surface that explicitly
rather than picking a side.
-
Don't paraphrase or invent. Surface what the panelists actually said. If a
panelist returned NO_FINDINGS, note it; don't drop the panelist from the report.