Pick the right megaplan profile, thinking-strength tier, and robustness level for the work in front of you — for both Codex and Claude harnesses. Consult before invoking megaplan.

2026-05-2567

epic-blitz.md

from "peteromallet/megaplan"

Three-round adversarial critique of epic drafts (high / mid / low abstraction) with revision after each round. Produces a chain-ready revised epic.

2026-05-2367

megaplan-observe.md

from "peteromallet/megaplan"

Observe an in-flight megaplan — introspect state, trace events, diagnose blockages, detect drift. Companion to megaplan-decision. Use during and after a run, not before.

2026-05-2167

megaplan-bakeoff.md

from "peteromallet/megaplan"

Methodology for running multi-profile LLM bake-offs via megaplan and presenting fair, blind-assessed comparisons. Cost/quality discipline, prompt hygiene, pre-merge gates, and reporting patterns. Use when the user says "bakeoff", "bake off", "megaplan bakeoff", or asks to compare profile mixes head-to-head.

2026-05-2167

megaplan-cloud.md

from "peteromallet/megaplan"

Run megaplan plans and chains inside a provider-managed container (today, Railway) with a persistent workspace volume. Use when the run needs to outlast a local terminal session, span multiple repos, or share a long-lived dev box across concurrent chains. Covers `cloud.yaml` fields, `extra_repos[]` + `chain_session` multi-tenancy, the operator loop, and the gotchas that wedge fresh runs.

2026-05-2167

megaplan-epic.md

from "peteromallet/megaplan"

Run an epic — a chain of sprint-sized megaplans driven sequentially via `megaplan chain`. Use when the work is bigger than ~2 weeks and needs to be decomposed into multiple plans with state, ordering, and failure semantics handled by the harness.

2026-05-2167

package.json

"author": "peteromallet"

"repository": "peteromallet/megaplan"

Abrir repositório GitHub Ver repositórios do creator

$ install --global

$ download --local

Executar no Manus

$ useful --forSOC

Especialistas em gestão de projetosNegócios e Operações Financeiras13-1082L4

Execute qualquer Skill com um clique

name	megaplan
description	AI agent harness for coordinating Claude and GPT to make and execute extremely robust plans.

Megaplan

Scope: This skill covers tooling — how to invoke and drive megaplan. For the decisions that come before invocation (scoping, brief, profile, robustness, depth), consult the megaplan-decision skill. If anything here contradicts megaplan-decision on decision-making content, megaplan-decision wins.

Route every step through the megaplan CLI. Never call agents directly. Before the first CLI call, resolve a working launcher and reuse it for the whole run. Do not assume megaplan itself is on PATH; command presence alone is not enough. Prove the launcher works by successfully running a harmless CLI call with it first. In the instructions below, treat <launcher> as that verified command. Launcher resolution order:

Try python -m megaplan config show.
If that fails, try ./.venv/bin/python -m megaplan config show.
If that fails, try uv run python -m megaplan config show.
If that fails, try a version-selected shim such as PYENV_VERSION=3.11.11 megaplan config show.
Only use bare megaplan ... if that exact form already succeeded during this check.

Triage

Decision-making (scoping, profile, robustness, depth, brief structure) lives in the megaplan-decision skill — consult it before running megaplan init. Always run megaplan, even for tiny work — bare robustness is the floor, never skip the harness. The few seconds of overhead pay back in the captured brief, plan, and outcome record.

Modes

Megaplan has two output modes, picked with --mode at init:

--mode code (default): the run produces a code diff. Execute workers emit per-task file changes. Use for features, refactors, bug fixes, migrations — anything whose deliverable is source code.
--mode metaplan (alias: --mode doc): the run produces a single document artifact at --output <relative/path> (e.g. docs/design.md). The prep, execute, and review phases use authoring-specific prompts; the execute schema uses sections_written instead of file changes; auditing reasons about section delivery. Use for design docs, architecture specs, research notes, RFCs, proposals, post-mortems, migration plans — anything whose deliverable is prose, not code. This is the "design-first / preplan" workflow; prep is the visible repository-investigation phase inside every run (both modes have it), not a separate mode.

--from-doc <relative/path> works with either mode. The path must be relative to --project-dir, must stay inside that directory, and must point to an existing file. When present, init imports any ## Settled Decisions section from that prior doc artifact and stores the source path for later planning and execution context.

All other flags (--robustness, --auto-approve, --phase-model, --hermes, subagent mode, overrides, step editing) behave identically in both modes. The workflow phases are the same: prep → plan → critique → gate → revise → finalize → execute → review.

A common pattern is two runs: first --mode metaplan to produce a rigorous design document, then --mode code --from-doc docs/design.md on a new idea that references that document to implement it.

--mode and --output go together. init rejects --output without --mode metaplan (error invalid_args), and rejects --mode metaplan without --output. Don't try to pass one without the other.

Working tree default

Default to building on top of any existing uncommitted changes in the working tree, not stashing or resetting them. The plan author should treat the dirty tree as in-progress context the new work composes with. Only deviate when the existing changes directly contradict what the new plan needs to do — and then flag the conflict explicitly rather than silently overwriting.

Start

Run <launcher> config show before init. If raw_config.execution.auto_approve is explicitly present, do not ask the execution-mode question and honor that configured override, including configured false. If that raw key is absent, ask execution mode (auto-approve or review) before init. In the same config check, respect execution.robustness as a settable override when it is configured; otherwise pick robustness yourself per the megaplan-decision skill.

<launcher> init --project-dir "$PROJECT_DIR" [--auto-approve] [--robustness bare|light|full|thorough|extreme] [--mode code|metaplan] [--output docs/foo.md] [--from-doc docs/prior.md] "$IDEA"

Legacy robustness names (tiny|standard|robust|superrobust) are still accepted on the CLI and in stored config — they map to bare|full|thorough|extreme respectively — but new plans should use the canonical names. For metaplan-mode runs, pass --mode metaplan --output <relative/path> (the path is where the final document artifact is written, relative to the project dir). Everything else is identical to code mode. Pass --from-doc <relative/path> when the new run should inherit decisions from a prior doc artifact. The path must be relative to the project dir, must exist as a file, and can be used with either --mode code or --mode metaplan. When the source doc contains a ## Settled Decisions section, megaplan imports those decisions and automatically promotes them into success criteria for the new plan: load_bearing: true decisions become must criteria and load_bearing: false decisions become info criteria.

Settled Decisions Section Format

When authoring a doc artifact that makes design decisions, use either of these canonical markdown shapes (the parser accepts both):

Bold-dash inline shape (preferred for short decisions):

## Settled Decisions

- **SD-001** \u2014 Keep the current storage model. _load_bearing: true_
  Rationale: External integrations depend on it.
- **SD-002** \u2014 Default model is claude-sonnet-4-6. _load_bearing: false_
  Rationale: Balance of speed and capability.

YAML-ish shape (preferred when decisions need more fields):

## Settled Decisions
- id: SD-001
  load_bearing: true
  decision: Keep the current storage model
  rationale: External integrations depend on it.

Use one list item per decision. Keep the SD-NNN convention (SD- prefix plus a number), store load_bearing as true or false, and indent continuation lines by two spaces beneath the list-item marker. Report the plan name, execution mode, robustness, mode (and --output path when metaplan mode), current state, and next step.

Workflow

Run the loop in this order:

prep
plan
critique
gate
revise when gate recommends iteration
finalize
execute
review Use next_step and valid_next for CLI routing. After gate, follow orchestrator_guidance instead of manually interpreting gate signals. When a response includes next_step_runtime, use its duration_hint and recommended_next_check_seconds to calibrate timing. At --robustness bare, the loop is: plan → finalize → execute. There is no prep, no critique, no gate, and no review. At --robustness light, the loop is: plan → critique → revise → finalize → execute. There is no prep, no gate, and no review. At --robustness full, the loop is: prep → plan → critique → gate → ... At --robustness thorough, the loop is also prep → plan → critique → gate → ... but uses 8 critique checks instead of 4 and enables parallel critique. At --robustness extreme, the loop is the same as thorough but also enables parallel review.

Step Rules

plan: inspect the repository first; produce the plan plus questions, assumptions, and success_criteria. Each criterion is {"criterion": "...", "priority": "must|should|info"}. must = hard gate (reviewer blocks), should = quality target (reviewer flags but doesn't block), info = human reference (reviewer skips).
prep: make repository investigation explicit before planning. Respect skip: true when the task is already concrete enough.
critique: surface concrete flags with concern, evidence, category, and severity; reuse open flag IDs; call out scope creep. Also validate that success criteria priorities are well-calibrated — must criteria should be verifiable yes/no, subjective goals should be should.
gate: read the response, warnings, and orchestrator_guidance. (Skipped at light robustness.)
revise: show the delta, flags addressed, and flags remaining. At light robustness, routes to finalize; otherwise loops back through critique and gate.
review: judge success against the success criteria and the user's intent, not plan elegance. Only block on must criteria failures. should failures are flagged but don't require rework. info criteria are waived.

Gate Principle

The gate response tells the orchestrator what to do next. Follow orchestrator_guidance unless you have a concrete reason to disagree after investigating the repository or plan artifacts yourself. Investigate before disagreeing: read the current plan and critique artifacts, check the project code to verify whether a flagged issue is real, or use megaplan status --plan <name> / megaplan audit --plan <name>. If you disagree with the guidance, explain why briefly and use an override. Do not manually reinterpret score trajectory, flag quality, or loop state when the gate already did that work for you.

Execute

After a successful gate, run megaplan finalize to produce the execution-ready briefing document.
In auto-approve mode, run megaplan execute --confirm-destructive after finalize.
In review mode, pause at the finalize-to-execute checkpoint and wait for explicit approval before running:

megaplan execute --confirm-destructive --user-approved

Long-Running Execution

For plans with multiple batches, use per-batch mode to drive execution incrementally:

megaplan execute --plan <name> --confirm-destructive --user-approved --batch 1
megaplan execute --plan <name> --confirm-destructive --user-approved --batch 2
# ... continue until all batches complete

Between batches, poll progress:

megaplan progress --plan <name>

Use megaplan status --plan <name> for the full plan state, including active-step timing and any next_step_runtime guidance from the latest response. Per-batch mode uses global batch numbering (1-indexed, computed from ALL tasks). Each --batch N call:

Validates that batches 1..N-1 are complete
Executes only batch N's tasks
Writes execution_batch_N.json as evidence
On the final batch, produces aggregate execution.json and transitions to executed Timeout recovery: re-run the same --batch N. The harness checks prerequisite completion and merges only untracked tasks. Note: progress shows completed state only (between-batch granularity). With per-batch mode, each batch is a separate CLI call, so the orchestrator has full visibility.

Overrides

megaplan override add-note --plan <name> --note "..."
megaplan override force-proceed --plan <name> --reason "..."
megaplan override replan --plan <name> --reason "..." [--note "..."]
megaplan override abort --plan <name> --reason "..." force-proceed is available from critiqued (routes to finalize, not execute). replan is available from gated, finalized, or critiqued. add-note is safe from any active state.

Replan

Use replan when the orchestrator itself needs to edit the plan directly instead of asking the revise worker to do it.

megaplan override replan --plan <name> --reason "expanding scope" --note "Also clean up the display layer"

After replan, read the returned plan file, edit it directly, then run megaplan critique.

Step Editing

Use step when you need to insert, remove, or reorder step sections (## Step N: or ### Step N:) without hand-editing the markdown.

megaplan step add --plan <name> --after S3 "Add regression coverage for the parser"
megaplan step remove --plan <name> S4
megaplan step move --plan <name> S4 --after S2

Each edit writes a new same-iteration plan artifact, preserves the latest plan meta questions/success criteria/assumptions, and resets the plan to planned so it re-enters critique.

Sessions And Autonomy

Agents default to persistent sessions.
--fresh: start a new persistent session.
--ephemeral: one-off call with no saved session.
--persist: explicit persistent mode.
Keep moving and show results at each step.
Only pause at finalize to execute in review mode.

Configuration

View current defaults with megaplan config show. Override with megaplan config set <key> <value>. Reset with megaplan config reset. When routing or behavior depends on config, check megaplan config show and respect user overrides instead of assuming defaults. Settable execution keys: execution.auto_approve, execution.robustness.

Profiles

A profile is a named preset that maps each workflow phase to an agent/model spec. Pass --profile <name> to any command that accepts --phase-model (init, loop-init, tiebreaker, etc.) to apply the preset. See megaplan-decision for profile selection. Inspect available profiles with megaplan config profiles list. Resolution order, later overrides earlier within the same name: built-in (megaplan/profiles/*.toml) → user (~/.config/megaplan/profiles.toml, or $XDG_CONFIG_HOME/megaplan/profiles.toml) → project (<project_dir>/.megaplan/profiles.toml). Inspect with megaplan config profiles list and megaplan config profiles show <name>. File format: TOML with a [profiles.<name>] table. Keys are phase names (plan, prep, critique, revise, gate, finalize, execute, loop_plan, loop_execute, review, tiebreaker_researcher, tiebreaker_challenger); values are agent specs like "claude", "codex", "hermes:fireworks:accounts/fireworks/models/kimi-k2p6", "hermes:glm-5.1". Example:

[profiles.my-mix]
plan     = "claude"
critique = "codex"
execute  = "hermes:fireworks:accounts/fireworks/models/kimi-k2p6"
review   = "codex"

--phase-model overrides on the CLI stack on top of any profile.

Bakeoff

See the bakeoff skill for methodology and the megaplan-decision skill for when bake-offs earn their cost. This section covers the CLI mechanics once you've decided to run one.

megaplan bakeoff run runs the same idea through multiple profiles concurrently, each in its own git worktree, each driven autonomously by megaplan auto. Use it when the user wants to compare profiles head-to-head on the same task (e.g., "run this with kimi and the default profile side-by-side"). Supports --mode code (default) and --mode doc / --mode metaplan (alias). For doc-mode bake-offs, --output <relative/path> is required and is threaded into each profile's megaplan init; merge brings the chosen profile's doc artifact back to main instead of applying a code patch. Joke mode is not yet supported. Requires a clean main worktree by default — pass --allow-dirty when there are unrelated uncommitted changes you want to keep on main. Those changes stay on main and are NOT copied into the worktrees, since worktrees branch off the current commit's SHA. The idea must be a file (--idea-file <path>), not an inline string. Write the idea to a file first. Bakeoff is inherently autonomous (it spawns megaplan auto), so the execution-mode question doesn't apply to bakeoff runs. --robustness is forwarded to each profile's init. When a project-layer .megaplan/profiles.toml exists, it's automatically copied into each worktree so project-only profiles resolve. Lifecycle:

megaplan bakeoff run --idea-file <path> --profiles <p1> <p2> [--mode code|doc|metaplan] [--output <relative/path>] [--exp-id <id>] [--detach] [--robustness <level>] [--allow-dirty] — kicks off N concurrent profile runs. Without --detach it streams a live status table every 5s and blocks until all profiles finish; with --detach it returns immediately and the user polls via status. --output is required with --mode doc|metaplan and rejected with --mode code.
megaplan bakeoff status [--exp <id>] — current state of each profile (running / completed / crashed).
megaplan bakeoff tail [--exp <id>] — tail the per-profile auto logs.
megaplan bakeoff compare --exp <id> [--judge <model>] — collect metrics across profiles; with --judge, an LLM judge ranks the outputs.
megaplan bakeoff pick --exp <id> --profile <name> --rationale "..." — record the human-selected winner.
megaplan bakeoff merge --exp <id> — merge the chosen profile's worktree back to main.
megaplan bakeoff resume --exp <id> — resume unfinished profile runs.
megaplan bakeoff abandon --exp <id> — discard worktrees but keep audit data.

Cloud Mode

megaplan cloud runs a plan inside a provider-managed container with a persistent workspace volume, so the run survives the user's terminal session. Suggest it for long-running plans that would outlast a local session, multi-repo work, or when the user wants an isolated persistent sandbox. Sprint 1 ships the railway provider only; ssh and local are planned.

Quick subcommand reference: init, build, deploy, chain, status, attach, logs, exec, resume, down, destroy. Typical flow: megaplan cloud init → edit cloud.yaml → export secrets → megaplan cloud deploy → megaplan cloud chain <chain.yaml>.

For the full reference — cloud.yaml fields, the extra_repos[] + chain_session multi-tenancy model, the operator loop, and the gotchas that wedge fresh runs (committed chain_state.json, profile-alias gap, secret-upload behavior, "internal_error" masking credit failures) — see the megaplan-cloud skill. Read it before launching the first cloud chain in a new project; the gotchas section will save hours.

Tickets

megaplan ticket new creates a repo-scoped issue ticket. Use it when:

During epic/plan work you notice an out-of-scope problem, bug, or rough edge
A user explicitly asks you to capture something for later attention
You want to log an observation that doesn't block the current task but should be tracked

The command prints only a ULID to stdout on success. Tickets live as .megaplan/tickets/{ulid}-{slug}.md files and are auto-discovered by the planner for future epics. Link them to epics with megaplan ticket link <ticket> <epic> --resolves so they auto-address when the epic completes.

Feedback

See megaplan-decision for when to add the feedback phase (--with-feedback). This section covers the CLI mechanics once you've decided to use it.

megaplan feedback --plan <name> scaffolds a feedback.md file in the plan directory and opens it in $EDITOR (or $VISUAL). The file has one section per workflow stage — prep, plan, critique, revise, gate, tiebreaker, finalize, execute, review — plus an Overall section. Each section has a rating: (integer 0–10) and a free-form comment: field; leave any field blank to skip it.

This is user feedback, owned by the human after a run finishes — megaplan only scaffolds the template and parses it back on load, it never overwrites edits. Old plans without a feedback.md simply have no feedback attached; running megaplan feedback --plan <name> on an older plan scaffolds the template on demand (backwards compatible).

Use megaplan feedback --plan <name> --show to print the parsed summary, and --no-edit to just scaffold the template and print the path without launching an editor. Parsed feedback is exposed on the in-memory Plan record as Plan.feedback (a dict shaped {"overall": {...}, "stages": {stage: {...}}}), so downstream tooling can read it the same way as any other artifact. When --actor/MEGAPLAN_ACTOR_ID is set, parsed feedback is also written to the plans.feedback jsonb column so the DB and file backends stay in sync.

Filling feedback with subagents

Recommended process when an agent (rather than the human) is producing the initial assessment:

Scaffold: run megaplan feedback --plan <name> --no-edit to create the empty feedback.md. Note the plan directory it prints — that is where the per-stage artifacts live (plan_v*.md, critique*.json, gate.json, tiebreaker_*.json, finalize.json, execution*.json, review.json, etc.).
Per-stage assessment: dispatch one read-only subagent per stage that actually ran (skip stages with no artifacts). Brief each subagent narrowly — give it the plan idea, the stage name, and the artifact filenames for that stage only. Ask it to return a 0–10 rating plus a 1–3 sentence comment grounded in what the artifacts show (what worked, what was weak, what was missed). Run these in parallel; they have no dependencies on each other.
Synthesize Overall: after the per-stage results come back, you (the orchestrating agent, not a subagent) read the per-stage ratings and comments together with the final outcome (final.md, review.json, any latest_failure) and decide an Overall rating and comment. The Overall is a judgment call about whether the run delivered the goal, not an average of stage ratings.
Write: edit feedback.md with the ratings and comments. Leave a stage blank if it didn't run or you can't form a defensible opinion — empty is better than guessed. Run megaplan feedback --plan <name> --show to confirm the parser picked everything up.

Keep comments grounded in specific artifact evidence ("critique flagged X but reviewer didn't catch the regression in Y") rather than vibes. The point of feedback is signal for future runs, not a participation score.

Searching feedback across plans

megaplan feedback search queries every plan with non-empty feedback across both backends — local feedback.md files in this project tree plus, when an actor is configured, the plans.feedback jsonb column in the DB. Duplicates between backends are de-duped by (plan name, project_dir). Use this to answer "which profile actually scored well on this repo?" or "where did the executor get a 6 or below?". Filters:

--profile <substr> — substring match on the plan's profile (e.g. --profile claude matches all-claude, claude-led, etc.).
--repo <substr> — substring match on the plan's project_dir / repo path.
--min-rating N / --max-rating N — bounds on the Overall rating.
--stage <name> — only plans that recorded a rating for that stage (plan, critique, execute, …).
--has-comment — only plans whose Overall comment is non-empty.
--all — scan every megaplan project root on this machine, not just the current tree.
--json — emit raw rows instead of a table.

Default output is a compact table (plan, profile, overall rating, backend, repo, plus the first line of the Overall comment). Combine with megaplan feedback show --plan <name> to drill into a specific match.

Observability

Plans started after this layer landed write an append-only events.ndjson event journal to their plan dir. Four CLI surfaces read from it; reach for them whenever a run is in flight or you need to reconstruct what happened. See the megaplan-observe skill for the full failure-mode catalog and worked invocation chains.

megaplan introspect --plan <name> — single structured-JSON snapshot. Always check this first when something looks stuck; the active_phase.liveness enum (progressing | quiet | stalled | timeout-imminent) and block_details.recoverable_via together tell you whether to wait, intervene, or override. Also surfaces outstanding_flags_count, useful when a plan is sitting on unresolved critique signals.
megaplan trace --plan <name> [--follow] [--format json|pretty|narrative] [--phase <p>] [--since <duration>] — stream events. narrative format is the agent-facing affordance; --follow tails as the plan progresses; --phase and --since filter. Prints trace: no events.ndjson for plan <name> cleanly when a plan predates the journal.
megaplan doctor [--plan <name> | --repo] — diagnostic. --repo catches rubric/binary drift (skill profile names vs profiles list) and editable-install dirtiness before megaplan init, so reach for it after a branch switch / pull / fresh checkout. --plan reports lock, phase liveness, LLM liveness, cost-vs-cap, orphan subprocesses, and outstanding flags.
megaplan record-tag --plan <name> --tag <name> --note "..." — annotate a moment in the journal so later trace/introspect calls can find it. All three args are required.

Stale-timestamp inference, opaque blocked state, and "model thinking vs TCP wedged" are the three confusions this layer is designed to kill — if you find yourself running lsof, ps, or doing manual ls -lat time math on a plan dir, you should be running introspect or trace instead.

Commands

megaplan status --plan <name>
megaplan progress --plan <name>
megaplan audit --plan <name>
megaplan list
megaplan prep --plan <name>
megaplan plan --plan <name>
megaplan critique --plan <name>
megaplan revise --plan <name>
megaplan gate --plan <name>
megaplan finalize --plan <name>
megaplan execute --plan <name> --confirm-destructive [--batch N]
megaplan review --plan <name>
megaplan step add --plan <name> [--after S<N>] "description"
megaplan step remove --plan <name> S<N>
megaplan step move --plan <name> S<N> --after S<M>
megaplan override add-note --plan <name> --note "..."
megaplan override force-proceed --plan <name> --reason "..."
megaplan override replan --plan <name> --reason "..." [--note "..."]
megaplan override abort --plan <name> --reason "..."
megaplan config show
megaplan config set <key> <value>
megaplan config reset
megaplan config profiles list
megaplan config profiles show <name>
megaplan bakeoff run --idea-file <path> --profiles <p1> <p2> [--mode code|doc|metaplan] [--output <relative/path>] [--exp-id <id>] [--detach] [--robustness <level>] [--allow-dirty]
megaplan bakeoff status [--exp <id>]
megaplan bakeoff tail [--exp <id>]
megaplan bakeoff compare --exp <id> [--judge <model>]
megaplan bakeoff pick --exp <id> --profile <name> --rationale "..."
megaplan bakeoff merge --exp <id>
megaplan bakeoff resume --exp <id>
megaplan bakeoff abandon --exp <id>
megaplan ticket new "title" -b "body"
megaplan ticket list [--status <s>] [--tags <t>] [--json]
megaplan ticket show <id> [--json]
megaplan ticket edit <id> [--title <t>] [--body <b>] [--status <s>]
megaplan ticket link <ticket> <epic> [--resolves]
megaplan ticket unlink <ticket> <epic>
megaplan ticket addressed <id> [--note <n>]
megaplan ticket dismiss <id> --reason "..."
megaplan ticket reopen <id>
megaplan feedback --plan <name> [--show] [--no-edit]
megaplan feedback search [--profile <s>] [--repo <s>] [--min-rating N] [--max-rating N] [--stage <name>] [--has-comment] [--all] [--json]
megaplan introspect --plan <name> [--json]
megaplan trace --plan <name> [--follow] [--format json|pretty|narrative] [--phase <p>] [--since <duration>]
megaplan doctor [--plan <name> | --repo]
megaplan record-tag --plan <name> --tag <name> --note "..."

Subagent Mode

This appendix is Codex-specific. It adds only the orchestration delta for Codex. The base skill remains the workflow source of truth.

Activation

Default to subagent unless an inline override is explicit for this run or megaplan config show reports "orchestration": {"mode": "inline"}.
Prefer subagent for long multi-phase runs where keeping the outer conversation clean matters.
Prefer inline for small edits, quick clarifications, or when the user wants to watch each phase in the main thread.

Tool Mapping

spawn_agent: launch the autonomous megaplan runner.
wait_agent: wait for either a breakpoint or completion.
resume_agent: reopen the orchestrator after a breakpoint.
send_input: resume after a breakpoint, or interrupt a still-running agent when the user needs an immediate change.
close_agent: hard-stop a stuck orchestrator before relaunching.

Launch

When subagent mode is active, the outer skill becomes a launcher plus breakpoint relay. Start a Codex subagent with:

agent_type: default
model: prefer gpt-5.4 when available
reasoning_effort: high
fork_context: false unless the current thread contains important constraints that are not restated in the prompt
message: fill the template below with {IDEA}, {PROJECT_DIR}, {AUTO_APPROVE}, {AUTO_APPROVE_FLAG}, and {ROBUSTNESS_FLAG}
Expand {AUTO_APPROVE_FLAG} to an empty string when raw_config.execution.auto_approve is explicitly set; otherwise expand it to --auto-approve for auto-approve runs and an empty string for review runs.
Expand {ROBUSTNESS_FLAG} to an empty string when raw_config.execution.robustness is explicitly set; otherwise expand it to --robustness {ROBUSTNESS}.
After editing this source file, rerun megaplan setup --force so installed SKILL.md files pick up the refreshed appendix.

Outer Skill Rules

Decide inline vs subagent before starting the workflow.
Launch once, remember the spawned agent id, then wait_agent for a final message that starts with either BREAKPOINT: or COMPLETE:.
Parse only the explicit first header line when deciding whether the stop was intentional.
If a breakpoint arrives, relay it to the user, collect the answer, then resume_agent and send_input to the same agent.
If the user adds context while the subagent is running, default to megaplan override add-note --plan <name> --note "..." and let the next phase boundary pick it up.
If the user needs an immediate redirect, add the note first, then send_input with interrupt: true telling the orchestrator to rerun megaplan status, read all notes, and continue from the current state.
If the orchestrator is stuck, close_agent, add a note, and relaunch a new subagent with a resume prompt on the same plan.

Agent Prompt Template

You are the autonomous megaplan runner for this single run.

Project: {PROJECT_DIR}
Idea: {IDEA}
Execution mode: {AUTO_APPROVE}

Operate through the `megaplan` CLI only. Do not call workers or agents directly.
Use the same verified `<launcher>` for every CLI call in this run. Verify it with a successful harmless CLI call first; command presence alone is not enough.

Startup:
1. Run `<launcher> init --project-dir "{PROJECT_DIR}" {AUTO_APPROVE_FLAG} {ROBUSTNESS_FLAG} "{IDEA}"`.
2. Capture the returned plan name.
3. Output `PLAN_NAME: <name>` on its own line before any `BREAKPOINT:` or `COMPLETE:`.
4. Run `<launcher> status --plan <name>`.

Routing:
- Use `next_step` and `valid_next` from `<launcher> status --plan <name>` for every move.
- Trust CLI state over memory.
- If `notes_count > 0`, read the full `notes` array before acting.
- After each step, read `next_step_runtime.duration_hint` and `next_step_runtime.recommended_next_check_seconds` when present to calibrate the next status check.
- For `bare`: `plan -> finalize -> execute`.
- For `light`: `plan -> critique -> revise -> finalize -> execute`.
- For `full` (legacy: `standard`), `thorough` (legacy: `robust`), or `extreme` (legacy: `superrobust`): follow the base skill workflow, including `prep`, `gate`, `review`, and any revise/rework loops.
- Build on top of uncommitted changes in the working tree by default; only override if they directly contradict the plan.
- After `gate`, follow `orchestrator_guidance` unless repository evidence proves it wrong.

Breakpoints:
- Stop only for `GATE_ESCALATE`, `GATE_BLOCKED`, `EXECUTE_APPROVAL`, `PHASE_ESCALATE`, or `EXECUTE_ESCALATE`.
- Format every breakpoint exactly as:
  `BREAKPOINT: <type>`
  `Plan: <name>`
  `State: <state>`
  `Summary: <short reason>`
  `Context: <artifacts, warnings, or the exact user decision needed>`

Safeguards:
- Retry a non-execute phase once with `--fresh` before escalating.
- If `execute` makes no forward progress for 3 attempts, stop with `BREAKPOINT: EXECUTE_ESCALATE`.
- Treat `review` returning `needs_rework` as a normal branch, not a breakpoint.

Completion:
- When the run finishes, return exactly:
  `COMPLETE: megaplan run finished`
  `Plan: <name>`
  `Final state: <state>`
  `Summary: <outcome>`
  `Artifacts: <key files or reports>`
  `Follow-up: <only if something remains>`

name	megaplan
description	AI agent harness for coordinating Claude and GPT to make and execute extremely robust plans.

Megaplan

Try python -m megaplan config show.
If that fails, try ./.venv/bin/python -m megaplan config show.
If that fails, try uv run python -m megaplan config show.
If that fails, try a version-selected shim such as PYENV_VERSION=3.11.11 megaplan config show.
Only use bare megaplan ... if that exact form already succeeded during this check.

Triage

Modes

Megaplan has two output modes, picked with --mode at init:

--mode code (default): the run produces a code diff. Execute workers emit per-task file changes. Use for features, refactors, bug fixes, migrations — anything whose deliverable is source code.
--mode metaplan (alias: --mode doc): the run produces a single document artifact at --output <relative/path> (e.g. docs/design.md). The prep, execute, and review phases use authoring-specific prompts; the execute schema uses sections_written instead of file changes; auditing reasons about section delivery. Use for design docs, architecture specs, research notes, RFCs, proposals, post-mortems, migration plans — anything whose deliverable is prose, not code. This is the "design-first / preplan" workflow; prep is the visible repository-investigation phase inside every run (both modes have it), not a separate mode.

A common pattern is two runs: first --mode metaplan to produce a rigorous design document, then --mode code --from-doc docs/design.md on a new idea that references that document to implement it.

Working tree default

Start

<launcher> init --project-dir "$PROJECT_DIR" [--auto-approve] [--robustness bare|light|full|thorough|extreme] [--mode code|metaplan] [--output docs/foo.md] [--from-doc docs/prior.md] "$IDEA"

Settled Decisions Section Format

When authoring a doc artifact that makes design decisions, use either of these canonical markdown shapes (the parser accepts both):

Bold-dash inline shape (preferred for short decisions):

## Settled Decisions

- **SD-001** \u2014 Keep the current storage model. _load_bearing: true_
  Rationale: External integrations depend on it.
- **SD-002** \u2014 Default model is claude-sonnet-4-6. _load_bearing: false_
  Rationale: Balance of speed and capability.

YAML-ish shape (preferred when decisions need more fields):

## Settled Decisions
- id: SD-001
  load_bearing: true
  decision: Keep the current storage model
  rationale: External integrations depend on it.

Workflow

Run the loop in this order:

prep
plan
critique
gate
revise when gate recommends iteration
finalize
execute
review Use next_step and valid_next for CLI routing. After gate, follow orchestrator_guidance instead of manually interpreting gate signals. When a response includes next_step_runtime, use its duration_hint and recommended_next_check_seconds to calibrate timing. At --robustness bare, the loop is: plan → finalize → execute. There is no prep, no critique, no gate, and no review. At --robustness light, the loop is: plan → critique → revise → finalize → execute. There is no prep, no gate, and no review. At --robustness full, the loop is: prep → plan → critique → gate → ... At --robustness thorough, the loop is also prep → plan → critique → gate → ... but uses 8 critique checks instead of 4 and enables parallel critique. At --robustness extreme, the loop is the same as thorough but also enables parallel review.

Step Rules

plan: inspect the repository first; produce the plan plus questions, assumptions, and success_criteria. Each criterion is {"criterion": "...", "priority": "must|should|info"}. must = hard gate (reviewer blocks), should = quality target (reviewer flags but doesn't block), info = human reference (reviewer skips).
prep: make repository investigation explicit before planning. Respect skip: true when the task is already concrete enough.
critique: surface concrete flags with concern, evidence, category, and severity; reuse open flag IDs; call out scope creep. Also validate that success criteria priorities are well-calibrated — must criteria should be verifiable yes/no, subjective goals should be should.
gate: read the response, warnings, and orchestrator_guidance. (Skipped at light robustness.)
revise: show the delta, flags addressed, and flags remaining. At light robustness, routes to finalize; otherwise loops back through critique and gate.
review: judge success against the success criteria and the user's intent, not plan elegance. Only block on must criteria failures. should failures are flagged but don't require rework. info criteria are waived.

Gate Principle

Execute

After a successful gate, run megaplan finalize to produce the execution-ready briefing document.
In auto-approve mode, run megaplan execute --confirm-destructive after finalize.
In review mode, pause at the finalize-to-execute checkpoint and wait for explicit approval before running:

megaplan execute --confirm-destructive --user-approved

Long-Running Execution

For plans with multiple batches, use per-batch mode to drive execution incrementally:

megaplan execute --plan <name> --confirm-destructive --user-approved --batch 1
megaplan execute --plan <name> --confirm-destructive --user-approved --batch 2
# ... continue until all batches complete

Between batches, poll progress:

megaplan progress --plan <name>

Validates that batches 1..N-1 are complete
Executes only batch N's tasks
Writes execution_batch_N.json as evidence
On the final batch, produces aggregate execution.json and transitions to executed Timeout recovery: re-run the same --batch N. The harness checks prerequisite completion and merges only untracked tasks. Note: progress shows completed state only (between-batch granularity). With per-batch mode, each batch is a separate CLI call, so the orchestrator has full visibility.

Overrides

megaplan override add-note --plan <name> --note "..."
megaplan override force-proceed --plan <name> --reason "..."
megaplan override replan --plan <name> --reason "..." [--note "..."]
megaplan override abort --plan <name> --reason "..." force-proceed is available from critiqued (routes to finalize, not execute). replan is available from gated, finalized, or critiqued. add-note is safe from any active state.

Replan

Use replan when the orchestrator itself needs to edit the plan directly instead of asking the revise worker to do it.

megaplan override replan --plan <name> --reason "expanding scope" --note "Also clean up the display layer"

After replan, read the returned plan file, edit it directly, then run megaplan critique.

Step Editing

Use step when you need to insert, remove, or reorder step sections (## Step N: or ### Step N:) without hand-editing the markdown.

megaplan step add --plan <name> --after S3 "Add regression coverage for the parser"
megaplan step remove --plan <name> S4
megaplan step move --plan <name> S4 --after S2

Each edit writes a new same-iteration plan artifact, preserves the latest plan meta questions/success criteria/assumptions, and resets the plan to planned so it re-enters critique.

Sessions And Autonomy

Agents default to persistent sessions.
--fresh: start a new persistent session.
--ephemeral: one-off call with no saved session.
--persist: explicit persistent mode.
Keep moving and show results at each step.
Only pause at finalize to execute in review mode.

Configuration

Profiles

[profiles.my-mix]
plan     = "claude"
critique = "codex"
execute  = "hermes:fireworks:accounts/fireworks/models/kimi-k2p6"
review   = "codex"

--phase-model overrides on the CLI stack on top of any profile.

Bakeoff

See the bakeoff skill for methodology and the megaplan-decision skill for when bake-offs earn their cost. This section covers the CLI mechanics once you've decided to run one.

megaplan bakeoff run --idea-file <path> --profiles <p1> <p2> [--mode code|doc|metaplan] [--output <relative/path>] [--exp-id <id>] [--detach] [--robustness <level>] [--allow-dirty] — kicks off N concurrent profile runs. Without --detach it streams a live status table every 5s and blocks until all profiles finish; with --detach it returns immediately and the user polls via status. --output is required with --mode doc|metaplan and rejected with --mode code.
megaplan bakeoff status [--exp <id>] — current state of each profile (running / completed / crashed).
megaplan bakeoff tail [--exp <id>] — tail the per-profile auto logs.
megaplan bakeoff compare --exp <id> [--judge <model>] — collect metrics across profiles; with --judge, an LLM judge ranks the outputs.
megaplan bakeoff pick --exp <id> --profile <name> --rationale "..." — record the human-selected winner.
megaplan bakeoff merge --exp <id> — merge the chosen profile's worktree back to main.
megaplan bakeoff resume --exp <id> — resume unfinished profile runs.
megaplan bakeoff abandon --exp <id> — discard worktrees but keep audit data.

Cloud Mode

Tickets

megaplan ticket new creates a repo-scoped issue ticket. Use it when:

During epic/plan work you notice an out-of-scope problem, bug, or rough edge
A user explicitly asks you to capture something for later attention
You want to log an observation that doesn't block the current task but should be tracked

Feedback

See megaplan-decision for when to add the feedback phase (--with-feedback). This section covers the CLI mechanics once you've decided to use it.

Filling feedback with subagents

Recommended process when an agent (rather than the human) is producing the initial assessment:

Scaffold: run megaplan feedback --plan <name> --no-edit to create the empty feedback.md. Note the plan directory it prints — that is where the per-stage artifacts live (plan_v*.md, critique*.json, gate.json, tiebreaker_*.json, finalize.json, execution*.json, review.json, etc.).
Per-stage assessment: dispatch one read-only subagent per stage that actually ran (skip stages with no artifacts). Brief each subagent narrowly — give it the plan idea, the stage name, and the artifact filenames for that stage only. Ask it to return a 0–10 rating plus a 1–3 sentence comment grounded in what the artifacts show (what worked, what was weak, what was missed). Run these in parallel; they have no dependencies on each other.
Synthesize Overall: after the per-stage results come back, you (the orchestrating agent, not a subagent) read the per-stage ratings and comments together with the final outcome (final.md, review.json, any latest_failure) and decide an Overall rating and comment. The Overall is a judgment call about whether the run delivered the goal, not an average of stage ratings.
Write: edit feedback.md with the ratings and comments. Leave a stage blank if it didn't run or you can't form a defensible opinion — empty is better than guessed. Run megaplan feedback --plan <name> --show to confirm the parser picked everything up.

Searching feedback across plans

--profile <substr> — substring match on the plan's profile (e.g. --profile claude matches all-claude, claude-led, etc.).
--repo <substr> — substring match on the plan's project_dir / repo path.
--min-rating N / --max-rating N — bounds on the Overall rating.
--stage <name> — only plans that recorded a rating for that stage (plan, critique, execute, …).
--has-comment — only plans whose Overall comment is non-empty.
--all — scan every megaplan project root on this machine, not just the current tree.
--json — emit raw rows instead of a table.

Observability

megaplan introspect --plan <name> — single structured-JSON snapshot. Always check this first when something looks stuck; the active_phase.liveness enum (progressing | quiet | stalled | timeout-imminent) and block_details.recoverable_via together tell you whether to wait, intervene, or override. Also surfaces outstanding_flags_count, useful when a plan is sitting on unresolved critique signals.
megaplan trace --plan <name> [--follow] [--format json|pretty|narrative] [--phase <p>] [--since <duration>] — stream events. narrative format is the agent-facing affordance; --follow tails as the plan progresses; --phase and --since filter. Prints trace: no events.ndjson for plan <name> cleanly when a plan predates the journal.
megaplan doctor [--plan <name> | --repo] — diagnostic. --repo catches rubric/binary drift (skill profile names vs profiles list) and editable-install dirtiness before megaplan init, so reach for it after a branch switch / pull / fresh checkout. --plan reports lock, phase liveness, LLM liveness, cost-vs-cap, orphan subprocesses, and outstanding flags.
megaplan record-tag --plan <name> --tag <name> --note "..." — annotate a moment in the journal so later trace/introspect calls can find it. All three args are required.

Commands

megaplan status --plan <name>
megaplan progress --plan <name>
megaplan audit --plan <name>
megaplan list
megaplan prep --plan <name>
megaplan plan --plan <name>
megaplan critique --plan <name>
megaplan revise --plan <name>
megaplan gate --plan <name>
megaplan finalize --plan <name>
megaplan execute --plan <name> --confirm-destructive [--batch N]
megaplan review --plan <name>
megaplan step add --plan <name> [--after S<N>] "description"
megaplan step remove --plan <name> S<N>
megaplan step move --plan <name> S<N> --after S<M>
megaplan override add-note --plan <name> --note "..."
megaplan override force-proceed --plan <name> --reason "..."
megaplan override replan --plan <name> --reason "..." [--note "..."]
megaplan override abort --plan <name> --reason "..."
megaplan config show
megaplan config set <key> <value>
megaplan config reset
megaplan config profiles list
megaplan config profiles show <name>
megaplan bakeoff run --idea-file <path> --profiles <p1> <p2> [--mode code|doc|metaplan] [--output <relative/path>] [--exp-id <id>] [--detach] [--robustness <level>] [--allow-dirty]
megaplan bakeoff status [--exp <id>]
megaplan bakeoff tail [--exp <id>]
megaplan bakeoff compare --exp <id> [--judge <model>]
megaplan bakeoff pick --exp <id> --profile <name> --rationale "..."
megaplan bakeoff merge --exp <id>
megaplan bakeoff resume --exp <id>
megaplan bakeoff abandon --exp <id>
megaplan ticket new "title" -b "body"
megaplan ticket list [--status <s>] [--tags <t>] [--json]
megaplan ticket show <id> [--json]
megaplan ticket edit <id> [--title <t>] [--body <b>] [--status <s>]
megaplan ticket link <ticket> <epic> [--resolves]
megaplan ticket unlink <ticket> <epic>
megaplan ticket addressed <id> [--note <n>]
megaplan ticket dismiss <id> --reason "..."
megaplan ticket reopen <id>
megaplan feedback --plan <name> [--show] [--no-edit]
megaplan feedback search [--profile <s>] [--repo <s>] [--min-rating N] [--max-rating N] [--stage <name>] [--has-comment] [--all] [--json]
megaplan introspect --plan <name> [--json]
megaplan trace --plan <name> [--follow] [--format json|pretty|narrative] [--phase <p>] [--since <duration>]
megaplan doctor [--plan <name> | --repo]
megaplan record-tag --plan <name> --tag <name> --note "..."

Subagent Mode

This appendix is Codex-specific. It adds only the orchestration delta for Codex. The base skill remains the workflow source of truth.

Activation

Default to subagent unless an inline override is explicit for this run or megaplan config show reports "orchestration": {"mode": "inline"}.
Prefer subagent for long multi-phase runs where keeping the outer conversation clean matters.
Prefer inline for small edits, quick clarifications, or when the user wants to watch each phase in the main thread.

Tool Mapping

spawn_agent: launch the autonomous megaplan runner.
wait_agent: wait for either a breakpoint or completion.
resume_agent: reopen the orchestrator after a breakpoint.
send_input: resume after a breakpoint, or interrupt a still-running agent when the user needs an immediate change.
close_agent: hard-stop a stuck orchestrator before relaunching.

Launch

When subagent mode is active, the outer skill becomes a launcher plus breakpoint relay. Start a Codex subagent with:

agent_type: default
model: prefer gpt-5.4 when available
reasoning_effort: high
fork_context: false unless the current thread contains important constraints that are not restated in the prompt
message: fill the template below with {IDEA}, {PROJECT_DIR}, {AUTO_APPROVE}, {AUTO_APPROVE_FLAG}, and {ROBUSTNESS_FLAG}
Expand {AUTO_APPROVE_FLAG} to an empty string when raw_config.execution.auto_approve is explicitly set; otherwise expand it to --auto-approve for auto-approve runs and an empty string for review runs.
Expand {ROBUSTNESS_FLAG} to an empty string when raw_config.execution.robustness is explicitly set; otherwise expand it to --robustness {ROBUSTNESS}.
After editing this source file, rerun megaplan setup --force so installed SKILL.md files pick up the refreshed appendix.

Outer Skill Rules

Decide inline vs subagent before starting the workflow.
Launch once, remember the spawned agent id, then wait_agent for a final message that starts with either BREAKPOINT: or COMPLETE:.
Parse only the explicit first header line when deciding whether the stop was intentional.
If a breakpoint arrives, relay it to the user, collect the answer, then resume_agent and send_input to the same agent.
If the user adds context while the subagent is running, default to megaplan override add-note --plan <name> --note "..." and let the next phase boundary pick it up.
If the user needs an immediate redirect, add the note first, then send_input with interrupt: true telling the orchestrator to rerun megaplan status, read all notes, and continue from the current state.
If the orchestrator is stuck, close_agent, add a note, and relaunch a new subagent with a resume prompt on the same plan.

Agent Prompt Template

You are the autonomous megaplan runner for this single run.

Project: {PROJECT_DIR}
Idea: {IDEA}
Execution mode: {AUTO_APPROVE}

Operate through the `megaplan` CLI only. Do not call workers or agents directly.
Use the same verified `<launcher>` for every CLI call in this run. Verify it with a successful harmless CLI call first; command presence alone is not enough.

Startup:
1. Run `<launcher> init --project-dir "{PROJECT_DIR}" {AUTO_APPROVE_FLAG} {ROBUSTNESS_FLAG} "{IDEA}"`.
2. Capture the returned plan name.
3. Output `PLAN_NAME: <name>` on its own line before any `BREAKPOINT:` or `COMPLETE:`.
4. Run `<launcher> status --plan <name>`.

Routing:
- Use `next_step` and `valid_next` from `<launcher> status --plan <name>` for every move.
- Trust CLI state over memory.
- If `notes_count > 0`, read the full `notes` array before acting.
- After each step, read `next_step_runtime.duration_hint` and `next_step_runtime.recommended_next_check_seconds` when present to calibrate the next status check.
- For `bare`: `plan -> finalize -> execute`.
- For `light`: `plan -> critique -> revise -> finalize -> execute`.
- For `full` (legacy: `standard`), `thorough` (legacy: `robust`), or `extreme` (legacy: `superrobust`): follow the base skill workflow, including `prep`, `gate`, `review`, and any revise/rework loops.
- Build on top of uncommitted changes in the working tree by default; only override if they directly contradict the plan.
- After `gate`, follow `orchestrator_guidance` unless repository evidence proves it wrong.

Breakpoints:
- Stop only for `GATE_ESCALATE`, `GATE_BLOCKED`, `EXECUTE_APPROVAL`, `PHASE_ESCALATE`, or `EXECUTE_ESCALATE`.
- Format every breakpoint exactly as:
  `BREAKPOINT: <type>`
  `Plan: <name>`
  `State: <state>`
  `Summary: <short reason>`
  `Context: <artifacts, warnings, or the exact user decision needed>`

Safeguards:
- Retry a non-execute phase once with `--fresh` before escalating.
- If `execute` makes no forward progress for 3 attempts, stop with `BREAKPOINT: EXECUTE_ESCALATE`.
- Treat `review` returning `needs_rework` as a normal branch, not a breakpoint.

Completion:
- When the run finishes, return exactly:
  `COMPLETE: megaplan run finished`
  `Plan: <name>`
  `Final state: <state>`
  `Summary: <outcome>`
  `Artifacts: <key files or reports>`
  `Follow-up: <only if something remains>`

megaplan

Mais deste repositório

Mais deste repositório

Megaplan

Triage

Modes

Working tree default

Start

Settled Decisions Section Format

Workflow

Step Rules

Gate Principle

Execute

Long-Running Execution

Overrides

Replan

Step Editing

Sessions And Autonomy

Configuration

Profiles

Bakeoff

Cloud Mode

Tickets

Feedback

Filling feedback with subagents

Searching feedback across plans

Observability

Commands

Subagent Mode

Activation

Tool Mapping

Launch

Outer Skill Rules

Agent Prompt Template

Megaplan

Triage

Modes

Working tree default

Start

Settled Decisions Section Format

Workflow

Step Rules

Gate Principle

Execute

Long-Running Execution

Overrides

Replan

Step Editing

Sessions And Autonomy

Configuration

Profiles

Bakeoff

Cloud Mode

Tickets

Feedback

Filling feedback with subagents

Searching feedback across plans

Observability

Commands

Subagent Mode

Activation

Tool Mapping

Launch

Outer Skill Rules

Agent Prompt Template