with one click
subagent-driven-codex
// Use when executing an implementation plan in this session, sequentially, by routing every implementer and reviewer dispatch through Codex CLI instead of Claude subagents. Two-stage review per task.
// Use when executing an implementation plan in this session, sequentially, by routing every implementer and reviewer dispatch through Codex CLI instead of Claude subagents. Two-stage review per task.
[HINT] Download the complete skill directory including SKILL.md and all related files
| name | subagent-driven-codex |
| description | Use when executing an implementation plan in this session, sequentially, by routing every implementer and reviewer dispatch through Codex CLI instead of Claude subagents. Two-stage review per task. |
Execute a plan by dispatching every role — implementer, spec reviewer, quality reviewer — to Codex CLI via the bridge script, not to Claude subagents. The orchestrator (you) stays in this session, plans tasks, ferries context between Codex sessions, and gates progress on the same two-stage review used by subagent-driven.
Core principle: One Codex SESSION_ID per role per task + per-agent planning dir + two-stage review (spec then quality) = high-quality delegation to a second model, all from a single Claude session.
Announce at start: "I'm using the subagent-driven-codex skill — every implementer and reviewer dispatch will go to Codex via the bridge."
subagent-driven dispatches Claude subagents through the Task tool. subagent-driven-codex is the same workflow, but every dispatch goes to Codex through ${CLAUDE_PLUGIN_ROOT}/skills/collaborating-with-codex/scripts/codex_bridge.py.
Use this variant when:
Stick with plain subagent-driven when the task needs the Claude subagents' richer tooling (Skill tool, Agent forks) or when interactive back-and-forth with the orchestrator beats Codex's batch model. Codex CLI does support MCP if the user has it configured — that is not a reason to avoid Codex.
Before passing any prompt body to codex_bridge.py, the orchestrator MUST resolve ${CLAUDE_PLUGIN_ROOT} to an absolute path and substitute every occurrence in the rendered prompt.
# In the orchestrator's shell (Claude side):
PLUGIN_ROOT="$(realpath "${CLAUDE_PLUGIN_ROOT:-/path/to/superpower-planning}")"
# Render template → final prompt with absolute paths:
sed "s|\${CLAUDE_PLUGIN_ROOT}|${PLUGIN_ROOT}|g" /tmp/codex_<role>_<task>.tpl \
> /tmp/codex_<role>_<task>.txt
If CLAUDE_PLUGIN_ROOT is not set in the orchestrator shell either, fall back to the plugin's installed path (commonly ~/.claude/plugins/cache/superpower-planning/superpower-planning/<version>/ or wherever this skill file itself lives — dirname from this SKILL.md's known-good path is reliable).
Empirical evidence: a probe asking Codex to read ${CLAUDE_PLUGIN_ROOT}/skills/.../findings.md returned cat: 'No such file or directory' and CLAUDE_PLUGIN_ROOT=''. Codex treats the placeholder as literal text.
Every task MUST pass two independent Codex reviews before it can be marked complete:
./spec-reviewer-prompt.md./quality-reviewer-prompt.md (only after spec review passes)A task is NOT complete until BOTH reviews return APPROVED. The Task Status Dashboard in .planning/progress.md has Spec Review, Quality Review, and Plan Align columns. All three MUST show PASS before status can be complete.
Each review loop is capped at 3 fix-review rounds per task.
The initial review does not count as a round. A "round" is one fix-then-re-review cycle: initial review → fix → re-review (round 1) → fix → re-review (round 2) → fix → re-review (round 3) → STOP.
After 3 rounds without approval, STOP and escalate to the user with:
Track round count in the Task Status Dashboard (e.g. FAIL (round 2/3)).
digraph when_to_use {
"Have implementation plan?" [shape=diamond];
"Tasks mostly independent?" [shape=diamond];
"Stay in this session?" [shape=diamond];
"Want Codex as executor?" [shape=diamond];
"subagent-driven-codex" [shape=box];
"subagent-driven" [shape=box];
"executing-plans" [shape=box];
"Manual execution or brainstorm first" [shape=box];
"Have implementation plan?" -> "Tasks mostly independent?" [label="yes"];
"Have implementation plan?" -> "Manual execution or brainstorm first" [label="no"];
"Tasks mostly independent?" -> "Stay in this session?" [label="yes"];
"Tasks mostly independent?" -> "Manual execution or brainstorm first" [label="no - tightly coupled"];
"Stay in this session?" -> "Want Codex as executor?" [label="yes"];
"Stay in this session?" -> "executing-plans" [label="no - parallel session"];
"Want Codex as executor?" -> "subagent-driven-codex" [label="yes"];
"Want Codex as executor?" -> "subagent-driven" [label="no - use Claude subagents"];
}
Each role gets ONE directory, reused across all tasks:
mkdir -p .planning/agents/implementer/
mkdir -p .planning/agents/spec-reviewer/
mkdir -p .planning/agents/quality-reviewer/
Each directory contains:
findings.md — discoveries, decisions, critical items (appended across tasks)progress.md — step-by-step progress log (appended across tasks)session.txt — current Codex SESSION_ID for this role for the current task only (overwritten when the next task starts)base_sha_taskN.txt — git HEAD captured before the implementer ran (one file per task; reviewers need it for diff)findings.md and progress.md are role-persistent: the same Codex implementer keeps appending what it learned across all tasks. session.txt is per-task: each new task gets a fresh Codex SESSION_ID for each role, written by the first dispatch of that role for that task and reused only for fix-rounds within the same task.
This skill defaults to sticky reviewers: the spec reviewer and quality reviewer for a given task keep the same SESSION_ID across re-review rounds. The original subagent-driven instead spins up a fresh subagent per round.
Pick deliberately:
To switch a reviewer to fresh mode for a particular task, simply do not write its SESSION_ID to session.txt and dispatch each round as an initial call. Note the choice in .planning/progress.md so the user can see which mode was used.
When extracting tasks from plan.md to dispatch to Codex:
plan.md, no paraphrase or summaryplan.md contains this task (e.g. ### Task 3: Recovery modes).planning/plan.md and .planning/design.md so Codex can cross-reference originalsVerbatim copying + plan references let Codex and reviewers verify against the source of truth.
The orchestrator's shell is responsible for resolving ${CLAUDE_PLUGIN_ROOT} (see Render Step). Below, ${PLUGIN_ROOT} is the already-resolved absolute path. ${WORKSPACE} is the absolute workspace path. ${SLUG} is a per-task stable slug used to namespace /tmp files. Recommended value: "$(basename "${WORKSPACE}")". Do NOT use $$ (the bash PID) — every Bash tool call is a fresh shell with a different PID, so a PID-based SLUG would not match between dispatch and fix-round. If you genuinely need cross-worktree disambiguation, persist the slug to .planning/agents/slug.txt once at task start and read it back on every dispatch.
Initial dispatch (any role) — capture SESSION_ID from output:
python3 "${PLUGIN_ROOT}/skills/collaborating-with-codex/scripts/codex_bridge.py" \
--cd "${WORKSPACE}" \
--PROMPT "$(cat /tmp/codex_${SLUG}_<role>_<task>.txt)" \
> /tmp/codex_${SLUG}_<role>_<task>.json 2>&1 &
Required: run_in_background: true on the Bash call. Read the .json after the bridge returns and extract SESSION_ID and agent_messages.
Follow-up (fix-round, re-review, etc.):
python3 "${PLUGIN_ROOT}/skills/collaborating-with-codex/scripts/codex_bridge.py" \
--cd "${WORKSPACE}" \
--SESSION_ID "$(cat .planning/agents/<role>/session.txt)" \
--PROMPT "$(cat /tmp/codex_${SLUG}_<role>_<task>_round<N>.txt)" \
> /tmp/codex_${SLUG}_<role>_<task>_round<N>.json 2>&1 &
Non-git workspace: add --skip-git-repo-check if ${WORKSPACE} isn't a git repository. Most plan-driven runs are git-tracked, so this flag is rarely needed.
Heavy debug trace (rare — when reasoning steps matter): add --return-all-messages.
Sandbox: the bridge defaults to danger-full-access. --sandbox read-only is rejected by the bridge, and workspace-write silently downgrades on hosts without bubblewrap (Ubuntu 24.04+ commonly fails the bwrap probe). The practical effect is that every reviewer Codex has filesystem write access. Restrict review behavior through the prompt body ("do not modify files; return findings only"). Treat any modification by a reviewer as a review-invalidating event — see Hard Requirement #9.
After installing the plugin, verify the bridge actually runs from this skill's path before sending a real implementer task. From the orchestrator shell:
PLUGIN_ROOT="$(realpath "${CLAUDE_PLUGIN_ROOT}")"
python3 "${PLUGIN_ROOT}/skills/collaborating-with-codex/scripts/codex_bridge.py" \
--cd "$(pwd)" \
--skip-git-repo-check \
--PROMPT "Reply with the literal string 'codex bridge ok' and nothing else." \
> /tmp/codex_smoke.json 2>&1 &
# (run in background, then check /tmp/codex_smoke.json for success: true and the literal reply)
If the smoke test fails, fix the bridge or auth before dispatching real tasks — review loops are far harder to diagnose mid-flight.
digraph process {
rankdir=TB;
"Read plan, extract all tasks with full text, note context, create tasks via TaskCreate" [shape=box];
"Per-task loop" [shape=box style=filled fillcolor=lightyellow];
"Plan Alignment Gate" [shape=box];
"Final code review (Codex)" [shape=box];
"Use superpower-planning:finishing-branch" [shape=box style=filled fillcolor=lightgreen];
"Read plan, extract all tasks with full text, note context, create tasks via TaskCreate" -> "Per-task loop";
"Per-task loop" -> "Plan Alignment Gate";
"Plan Alignment Gate" -> "Final code review (Codex)";
"Final code review (Codex)" -> "Use superpower-planning:finishing-branch";
}
Per-task loop (every task runs through this):
.planning/agents/{implementer,spec-reviewer,quality-reviewer}/ exist. Record current git HEAD as the task's base SHA: git -C "${WORKSPACE}" rev-parse HEAD > .planning/agents/base_sha_taskN.txt. Reviewers will use this to scope their git diff to this task only../implementer-prompt.md, fill in placeholders ({{N}}, {{task_name}}, {{FULL_TEXT_OF_TASK}}, etc.), THEN run the Render Step (substitute ${CLAUDE_PLUGIN_ROOT} with the absolute plugin root). Write the final body to /tmp/codex_${SLUG}_implementer_taskN.txt.codex_bridge.py (background). Capture SESSION_ID from the JSON output and write it to .planning/agents/implementer/session.txt./tmp/codex_${SLUG}_implementer_taskN.json.git diff $(cat .planning/agents/base_sha_taskN.txt)..HEAD yourself and confirm the changes look reasonable before invoking reviewers.git -C "${WORKSPACE}" rev-parse HEAD > .planning/agents/head_sha_taskN.txt. Pass both base and head into reviewer prompts../spec-reviewer-prompt.md (fresh Codex session — new SESSION_ID stored in .planning/agents/spec-reviewer/session.txt). Run the Render Step before writing the prompt body../quality-reviewer-prompt.md once spec PASSes (fresh Codex session, new SESSION_ID in .planning/agents/quality-reviewer/session.txt).aggregate-agent-findings.sh for each role, update Task Status Dashboard, mark task complete via TaskUpdate.After all tasks: Plan Alignment Gate (re-read plan.md/design.md, check for cumulative drift), then a final whole-implementation Codex review, then finishing-branch.
After each task passes both reviews, aggregate Codex's findings:
${CLAUDE_PLUGIN_ROOT}/scripts/aggregate-agent-findings.sh "<role>" "Task N: <name>"
This extracts "Critical for Orchestrator" items from each role's findings.md and appends them to top-level .planning/findings.md and .planning/progress.md. Then manually:
Example aggregation:
<!-- .planning/findings.md -->
## Task 2: Recovery modes (Codex-driven)
- [From implementer/codex] Database migration requires careful ordering
- [From spec-reviewer/codex] All requirements met after fix pass
- [From quality-reviewer/codex] Approved with no issues
<!-- .planning/progress.md Task Status Dashboard -->
| Task 1: Hook installation | ✅ complete | PASS | PASS | PASS | agents/implementer/ | 5 tests passing (codex) |
| Task 2: Recovery modes | ✅ complete | PASS (2nd pass) | PASS | PASS | agents/implementer/ | 8 tests passing (codex) |
| Task 3: Config parser | ⏳ pending | - | - | - | - | - |
./implementer-prompt.md — body fed to Codex for implementation work (initial + fix-rounds)./spec-reviewer-prompt.md — body fed to Codex for spec compliance review (initial + re-reviews)./quality-reviewer-prompt.md — body fed to Codex for code quality review (initial + re-reviews)Each template explains exactly what to render, where Codex's output lands, and how to feed follow-up rounds.
Things subagent-driven does that change for Codex:
| Concern | subagent-driven (Claude) | subagent-driven-codex |
|---|---|---|
| Dispatch | Task tool, general-purpose subagent | codex_bridge.py background bash |
| Tooling inside agent | Skill tool, Agent forks, MCP | Codex CLI's native tools — no Skill tool, no Agent fork. MCP works if the user has it configured for Codex. |
| Reading plugin instructions | Read of ${CLAUDE_PLUGIN_ROOT}/... paths | Same — BUT all ${CLAUDE_PLUGIN_ROOT} references in the prompt MUST be substituted with the absolute path before sending; Codex treats placeholders literally. See Render Step. |
| Multi-turn within a task | Same subagent invocation re-dispatched | Same Codex SESSION_ID re-used (per task; overwritten when next task starts) |
| Asking the orchestrator a question | Subagent text reply | Codex agent_messages text — read it and follow up via SESSION_ID |
| "Critical for orchestrator" markers | Subagent writes to findings.md | Same — Codex writes to the same findings.md paths because it can edit files |
| Cost model | Claude credits | OpenAI Codex credits |
You: I'm using subagent-driven-codex to execute this plan.
[Read .planning/plan.md once, extract all 5 tasks verbatim, TaskCreate]
[Resolve PLUGIN_ROOT="$(realpath "${CLAUDE_PLUGIN_ROOT}")"]
[Set WORKSPACE="<absolute workspace path>", SLUG="$(basename "${WORKSPACE}")"]
[Run smoke test once before the first real dispatch]
Task 1: Hook installation script
[mkdir -p .planning/agents/{implementer,spec-reviewer,quality-reviewer}]
[Capture base SHA: git -C "${WORKSPACE}" rev-parse HEAD > .planning/agents/base_sha_task1.txt]
[Render: fill placeholders in ./implementer-prompt.md → write implementer-task1.tpl
then sed-substitute ${CLAUDE_PLUGIN_ROOT} → /tmp/codex_${SLUG}_implementer_task1.txt]
[Verify: grep '\${CLAUDE_PLUGIN_ROOT}' /tmp/codex_${SLUG}_implementer_task1.txt is empty]
[Dispatch implementer in background:
python3 "${PLUGIN_ROOT}/skills/collaborating-with-codex/scripts/codex_bridge.py" \
--cd "${WORKSPACE}" \
--PROMPT "$(cat /tmp/codex_${SLUG}_implementer_task1.txt)" \
> /tmp/codex_${SLUG}_implementer_task1.json 2>&1 &]
[After notification: parse JSON; write SESSION_ID to .planning/agents/implementer/session.txt]
Codex (implementer): "Before I begin — should the hook be installed at user or system level?"
You: "User level (~/.config/superpowers/hooks/)."
[Render follow-up prompt → /tmp/codex_${SLUG}_implementer_task1_round1.txt]
[Dispatch with --SESSION_ID "$(cat .planning/agents/implementer/session.txt)" in background]
Codex: "Got it. Implementing now…"
[Codex edits files, runs tests, commits sha abcd123]
Codex final reply: Implemented, 5/5 tests pass, self-review caught --force flag, logged to findings.md.
[Verify: git diff "$(cat .planning/agents/base_sha_task1.txt)"..HEAD, run tests yourself]
[Capture head SHA: git -C "${WORKSPACE}" rev-parse HEAD > .planning/agents/head_sha_task1.txt]
[Render spec reviewer prompt with base_sha and head_sha filled in →
/tmp/codex_${SLUG}_specrev_task1.txt (run sed substitution as before)]
[Dispatch fresh Codex session; capture SESSION_ID into .planning/agents/spec-reviewer/session.txt]
Codex (spec reviewer): Verdict PASS — spec compliant.
[Render quality reviewer prompt → /tmp/codex_${SLUG}_qualrev_task1.txt]
[Dispatch fresh Codex session; capture SESSION_ID into .planning/agents/quality-reviewer/session.txt]
Codex (quality reviewer): Verdict APPROVED.
[Run aggregate-agent-findings.sh implementer "Task 1: Hook installation"]
[Run aggregate-agent-findings.sh spec-reviewer "Task 1: Hook installation"]
[Run aggregate-agent-findings.sh quality-reviewer "Task 1: Hook installation"]
[TaskUpdate Task 1 → completed]
Task 2: Recovery modes
[Capture fresh base SHA into .planning/agents/base_sha_task2.txt]
[session.txt for each role gets overwritten with the new task's SESSION_IDs]
[Same flow — but spec reviewer finds 2 issues; render fix prompt, dispatch on
implementer's SESSION_ID; then re-render reviewer round1 prompt and dispatch
on spec-reviewer's SESSION_ID (sticky). Max 3 rounds.]
…
vs. subagent-driven (Claude subagents):
vs. executing-plans (parallel session):
Quality gates:
agent_messages size and SESSION_IDs.Never:
codex_bridge.py in the foreground (freezes session — see Hard Requirements).main/master without explicit user consent.plan.md blind — provide full task text in the prompt and reference the path for cross-check.--model or --profile unless the user explicitly named one.If Codex asks questions:
If reviewer Codex finds issues:
If a Codex run fails (exit non-zero, JSON success: false):
If a reviewer modifies files (Hard Requirement #9):
git stash or revert the unauthorized changes (git checkout -- <paths>) so the workspace returns to the implementer's last sanctioned commit..planning/findings.md so the user can audit later.Required workflow skills:
superpower-planning:collaborating-with-codex — provides the bridge script. This skill cannot work without it.superpower-planning:git-worktrees — RECOMMENDED: set up isolated workspace unless already on a feature branch (Codex commits into the workspace it sees).superpower-planning:writing-plans — creates the plan this skill executes.superpower-planning:requesting-review — code review template referenced by reviewer prompts.superpower-planning:finishing-branch — complete development after all tasks.Codex follows internally:
Alternative workflows:
superpower-planning:subagent-driven — same flow, Claude subagents instead of Codex.superpower-planning:team-driven — parallel execution via Agent Team.superpower-planning:executing-plans — parallel session with batched human checkpoints.