| name | agentic-milestone-planning |
| description | Decomposes complex, multi-day tasks into optimized milestones using parallel reviewer agents (ultraplan). Spawns 3 independent reviewers that analyze the problem from different angles, then synthesizes their findings into a milestone dependency DAG. Triggers when the user says "plan milestones", "break this into milestones", "ultraplan", or when agentic-long-run harness needs milestone generation. |
Milestone Planning (Ultraplan)
Decomposes a complex task into milestones by spawning 3 parallel reviewer agents, synthesizing their independent analyses, and producing a milestone dependency DAG.
Core Principle
Milestones are the unit of agentic-long-running execution. A bad milestone decomposition cascades into days of wasted work. Therefore milestone generation must be adversarial ā multiple independent perspectives must challenge each other before milestones are locked.
Hard Gates
- All 3 reviewer agents must run in parallel. Sequential execution is prohibited. Dispatch all 3 concurrently using the
subagent tool's parallel mode (tasks array).
- Each reviewer receives the full problem statement. Do not split or filter the problem per reviewer. Every reviewer sees everything.
- Reviewers must not see each other's findings. Each reviewer operates independently. No cross-pollination during the review phase.
- Synthesis must address every reviewer's concern. The synthesis agent must explicitly respond to each finding ā accepted, rejected with reason, or deferred to a specific milestone.
- Every milestone must have measurable success criteria. "Working correctly" is not a criterion. Specific test commands, file existence checks, or behavioral assertions are required.
- Milestone dependencies must form a DAG. Circular dependencies are a plan failure. Every milestone must have a clear topological ordering.
- Do not generate milestones for trivial tasks. If the problem can be solved in a single agentic-plan-crafting cycle (fewer than ~8 tasks), tell the user to use agentic-plan-crafting directly.
- Reviewer outputs must be passed verbatim to the synthesis agent. Do not summarize, filter, or reframe. Copy each reviewer's full output into the designated placeholder. The main agent must not editorialize the handoff.
When To Use
- When the user presents a complex, multi-day task
- When the agentic-long-run harness needs milestone decomposition
- When the user says "plan milestones", "break this into milestones", or "ultraplan"
- When a task clearly requires multiple independent implementation phases
When NOT To Use
- Single-day tasks (use agentic-plan-crafting directly)
- Tasks with fewer than ~8 implementation steps
- When milestones are already defined and the user wants execution (use agentic-long-run)
- When work scope is still ambiguous (use agentic-clarification first)
Input
The skill requires a clear problem statement as input. This can come from:
- A Context Brief file produced by the
agentic-clarification skill (preferred)
- A direct, detailed request from the user (must include goal, scope, constraints)
If the input is ambiguous, return to the agentic-clarification skill before proceeding.
Process
Phase 1: Problem Framing
Before dispatching reviewers, frame the problem:
- Read the input (Context Brief or user request)
- Identify: goal, scope boundaries, technical constraints, success criteria
- If a codebase is involved, dispatch an Explore agent to map relevant architecture
- Compose the Problem Brief ā a self-contained document that each reviewer will receive:
## Problem Brief
**Goal:** [What must be achieved]
**Scope:**
- In: [What is included]
- Out: [What is explicitly excluded]
**Technical Context:**
[Relevant architecture, existing code, constraints]
**Constraints:**
[Time, compatibility, dependencies, performance requirements]
**Success Criteria:**
[Specific, measurable outcomes]
**Verification Strategy:**
- **Level:** [e2e | integration | skill/agent | test-suite | build-only]
- **Command:** [exact command to run the verification]
- **What it validates:** [what passing this verification proves]
Phase 2: Parallel Reviewer Dispatch
Dispatch all 3 reviewer agents concurrently using the subagent tool's parallel mode (tasks array). Each receives the full Problem Brief and its reviewer-specific prompt.
Dispatch example:
Use the subagent tool with the tasks parameter to run all 3 reviewers in parallel:
tasks: [
{ agent: "reviewer-feasibility", task: "[Problem Brief + feasibility prompt]" },
{ agent: "reviewer-architecture", task: "[Problem Brief + architecture prompt]" },
{ agent: "reviewer-risk", task: "[Problem Brief + risk prompt]" }
]
Reviewers are read-only analysts ā they do not modify code. If a reviewer appears stuck (no response after extended time), this is likely a rate limit or timeout ā see Phase 2.5 for failure handling.
Reviewer 1: Feasibility Analyst
You are a feasibility analyst reviewing a problem decomposition.
## Problem Brief
{PROBLEM_BRIEF}
## Your Task
Analyze the feasibility of solving this problem. For each major component:
1. **Technical feasibility:** Can this be built with the stated tech stack?
Identify any components that require research, prototyping, or may not
be possible as described.
2. **Effort estimation:** Classify each component as:
- Small (1-3 tasks, < 1 plan cycle)
- Medium (4-8 tasks, 1 plan cycle)
- Large (9+ tasks, multiple plan cycles ā candidate for milestone)
- Uncertain (requires spike/prototype before estimation)
3. **Risk of underestimation:** Flag components that appear simple but
have hidden complexity (integration points, edge cases, data migration,
backward compatibility).
4. **Suggested milestone boundaries:** Based on effort and risk, suggest
where natural milestone boundaries should fall. A milestone should be
independently deliverable and testable.
## Output Format
For each suggested milestone:
- **Name:** [milestone name]
- **Effort:** [Small/Medium/Large/Uncertain]
- **Feasibility risk:** [Low/Medium/High] ā [reason]
- **Key deliverable:** [what this milestone produces]
Also list:
- **Spike candidates:** Components needing prototype before planning
- **Underestimation risks:** Components likely harder than they appear
Reviewer 2: Architecture Analyst
You are an architecture analyst reviewing a problem decomposition.
## Problem Brief
{PROBLEM_BRIEF}
## Your Task
Analyze the architectural implications and suggest milestone boundaries
that respect architectural constraints.
1. **Interface boundaries:** Identify the key interfaces, contracts, and
APIs that must be defined. Milestones should align with interface
boundaries ā one milestone should not half-define an interface.
2. **Data flow:** Map how data flows through the system. Milestones that
cut across data flows create integration risk.
3. **Dependency direction:** Identify which components depend on which.
Milestones should be ordered so dependencies are built before dependents.
4. **Incremental deliverability:** Each milestone should leave the system
in a working state. No milestone should produce a half-built component
that only works after the next milestone.
5. **Existing pattern alignment:** Where possible, milestones should follow
existing patterns in the codebase rather than introducing new patterns.
## Output Format
For each suggested milestone:
- **Name:** [milestone name]
- **Architectural rationale:** [why this is a natural boundary]
- **Interfaces defined:** [what contracts this milestone establishes]
- **Depends on:** [which milestones must complete first]
- **Leaves system in working state:** [Yes/No ā explain]
Also list:
- **Interface risks:** Interfaces that may need revision after initial implementation
- **Pattern conflicts:** Where the proposed work conflicts with existing patterns
Reviewer 3: Risk Analyst
You are a risk analyst reviewing a problem decomposition.
## Problem Brief
{PROBLEM_BRIEF}
## Your Task
Identify risks that could derail multi-day execution and suggest milestone
ordering that minimizes cumulative risk.
1. **Integration risk:** Which components have the highest risk of not
working together? These should be integrated early, not in the last
milestone.
2. **Ambiguity risk:** Which requirements are most likely to change or
be misunderstood? These should be tackled early so course corrections
are cheap.
3. **Dependency risk:** Which external dependencies (APIs, libraries,
services) are least reliable? Milestones depending on them should
include fallback plans.
4. **Regression risk:** Which changes are most likely to break existing
functionality? These milestones need heavier test coverage.
5. **Recovery cost:** If a milestone fails validation, how expensive is
it to redo? High-cost milestones should be smaller and more frequent.
## Output Format
For each identified risk:
- **Risk:** [description]
- **Severity:** [Low/Medium/High/Critical]
- **Affected milestone(s):** [which milestones]
- **Mitigation:** [how to structure milestones to reduce this risk]
Overall risk-ordered milestone sequence:
1. [milestone] ā [why first: highest ambiguity / integration risk / ...]
2. [milestone] ā [why second]
...
Phase 2.5: Reviewer Failure Handling
After dispatching all 3 reviewers, wait for all to complete. If any reviewer fails:
- Timeout or error: Re-dispatch the failed reviewer once with the same prompt. If it fails again, proceed without it.
- Empty or unusable output: If a reviewer returns fewer than 3 sentences or clearly did not address the Problem Brief, re-dispatch once. If still unusable, proceed without it.
- Proceeding with fewer than 3 reviewers: Log the missing perspective(s) in the synthesis handoff. The synthesis agent must note the gap in its Conflict Resolution Log: "Missing perspective: [reviewer name] ā [reason]. Milestone plan may have blind spot in [area]."
- Minimum viable count: At least 2 of 3 reviewers must succeed. If fewer than 2 complete successfully, stop and report to user ā the problem may be too ambiguous for automated review.
Phase 3: Synthesis
After all 3 reviewers complete, dispatch a Synthesis Agent that receives all 3 reviewer outputs and produces the final milestone plan.
Verbatim handoff rule (Hard Gate equivalent): The main agent must copy each reviewer's full output into the designated {..._OUTPUT} placeholder without summarizing, filtering, reframing, or adding commentary. This is the same principle as the agentic-run-plan validator's fixed template ā the main agent has read all 3 outputs and may unconsciously bias the synthesis by selective framing. Verbatim copy eliminates this channel.
What must NOT happen during handoff:
- Summarizing a reviewer's output ("The feasibility analyst mainly said...")
- Filtering out findings the main agent considers irrelevant
- Adding framing language ("Pay special attention to the risk analyst's concerns about...")
- Reordering findings by perceived importance
The synthesis agent prompt:
You are a milestone synthesis agent. You have received analyses from 3
independent reviewers who each examined the same problem from a different
angle. Your job is to produce the final milestone decomposition.
## Reviewer Outputs
### Feasibility Analysis
{FEASIBILITY_OUTPUT}
### Architecture Analysis
{ARCHITECTURE_OUTPUT}
### Risk Analysis
{RISK_OUTPUT}
## Your Task
1. **Cross-reference findings.** Identify where reviewers agree and
where they conflict. Agreements are high-confidence decisions.
Conflicts require resolution.
2. **Resolve conflicts explicitly.** For each conflict:
- State the conflict
- State your resolution
- State why (which reviewer's reasoning is stronger in this case)
3. **Produce the milestone DAG.** Each milestone must have:
- Name
- Goal (1 sentence)
- Success criteria (measurable, specific)
- Dependencies (which milestones must complete first)
- Files affected
- Risk level (from risk analysis)
- Estimated effort (from feasibility analysis)
- User value
4. **Validate the DAG.** Verify:
- No circular dependencies
- Valid topological ordering exists
- No file conflicts between parallel milestones
- Each milestone leaves system in working state
- First milestone is the minimum viable milestone
5. **Produce execution order.** List milestones in execution order,
marking which can run in parallel.
## Output Format
## Conflict Resolution Log
| Conflict | Resolution | Rationale |
|----------|-----------|-----------|
| [description] | [decision] | [why] |
## Milestone DAG
### M1: [Name]
- **Goal:** [one sentence]
- **Success Criteria:**
- [ ] [specific, measurable criterion]
- [ ] [specific, measurable criterion]
- **Dependencies:** None
- **Files:** [list]
- **Risk:** [Low/Medium/High]
- **Effort:** [Small/Medium/Large]
- **User Value:** [what user sees after completion]
- **Abort Point:** [Yes/No]
### M2: [Name]
...
## Execution Order
Phase 1 (parallel): M1, M2
Phase 2 (after Phase 1): M3
Phase 3 (parallel): M4, M5
## Rejected Proposals
| Proposal | Source | Reason for rejection |
|----------|--------|---------------------|
| [what was proposed] | [which reviewer] | [why rejected] |
Phase 3.5: Integration Verification Milestone
After synthesis, the main agent automatically appends an Integration Verification Milestone as the final milestone in the DAG. This milestone is not generated by reviewers or synthesis ā it is a structural guarantee.
### M_final: Integration Verification
- **Goal:** Validate that all milestones work together as a complete system
- **Success Criteria:**
- [ ] Highest-level project verification passes (e2e, integration, or discovered verification)
- [ ] All milestone success criteria remain valid after full integration
- [ ] No regressions in pre-existing functionality
- [ ] Cross-milestone interfaces are exercised end-to-end
- **Dependencies:** ALL other milestones
- **Files:** None (read-only verification ā no new code)
- **Risk:** Medium (integration issues between independently-verified milestones)
- **Effort:** Small (verification only, no implementation)
- **User Value:** Confidence that the system works as a whole, not just per-milestone
- **Abort Point:** No (this is the final gate)
Verification Discovery: During Phase 1 (Problem Framing), run the same verification discovery as agentic-plan-crafting:
- Search for e2e tests ā integration tests ā verification skills/agents ā test suite ā build+lint
- Record the result in the Problem Brief under a
Verification Strategy section
- The Integration Verification Milestone uses this discovered verification as its primary check
If no verification infrastructure exists: The Integration Verification Milestone's agentic-plan-crafting phase (during agentic-long-run execution) will create the necessary verification as Task 0, same as agentic-plan-crafting's behavior.
Phase 3.6: Independent DAG Validation
After appending the Integration Verification Milestone, the main agent independently validates the full DAG structure (including M_final) before presenting to the user. Do not rely on the synthesis agent's self-reported validation.
- Circular dependency check: For each milestone, trace its dependency chain. If any milestone appears as both an ancestor and a descendant of another, the DAG is invalid. Reject and re-dispatch synthesis with the specific cycle identified.
- File conflict check for parallel milestones: For milestones with no dependency relationship, verify their "Files Affected" lists do not overlap. If they overlap, they cannot run in parallel ā add a dependency or flag for user decision.
- Orphan check: Every milestone except the first must have at least one dependency, OR be explicitly marked as independently parallelizable with rationale.
- Success criteria check: Every milestone must have at least 2 measurable success criteria. "Working correctly" or similar vague criteria trigger re-dispatch.
If validation fails: re-dispatch synthesis with the specific error(s) as additional constraint. Do not present an invalid DAG to the user.
Phase 4: User Review and Lock
Milestone count guard: The recommended milestone count is 3-7 for most projects. If the synthesis produces more than 7, present a warning: "This plan has N milestones. Consider whether the problem should be split into separate projects." If more than 10, require explicit user approval to proceed.
- Present the synthesized milestone plan to the user
- Show the conflict resolution log ā the user must see where reviewers disagreed
- Show the execution order with parallelization
- Show the total milestone count with the count guard warning if applicable
- Ask the user to approve, modify, or reject the milestone plan
- If approved: save the milestone plan to the harness state directory
- If modifications requested: apply changes and re-present
- If rejected: return to Phase 1 with updated constraints
Phase 5: Save Milestone Artifacts
Save all artifacts to the harness state directory:
docs/engineering-discipline/harness/<session-slug>/
āāā state.md # Master state file
āāā milestones/
ā āāā M1-<name>.md # Individual milestone definition
ā āāā M2-<name>.md
ā āāā ...
āāā reviews/
āāā feasibility.md
āāā architecture.md
āāā risk.md
āāā synthesis.md
state.md format:
# Long Run State: [Session Name]
**Created:** YYYY-MM-DD HH:MM
**Last Updated:** YYYY-MM-DD HH:MM
**Status:** agentic-milestone-planning-complete | executing | paused | completing | completed | failed
**Verification Strategy:**
- **Level:** [e2e | integration | skill/agent | test-suite | build-only]
- **Command:** [exact verification command]
- **What it validates:** [what passing proves]
## Milestones
| ID | Name | Status | Attempts | Dependencies | Plan File | Review File |
|----|------|--------|----------|-------------|-----------|-------------|
| M1 | [name] | pending | 0 | ā | ā | ā |
| M2 | [name] | pending | 0 | M1 | ā | ā |
| M3 | [name] | pending | 0 | M1, M2 | ā | ā |
Status values: pending | planning | executing | validating | completed | failed | skipped
Attempts: number of plan-execute-review cycles attempted (incremented at each Step 2-3 start)
## Execution Log
| Timestamp | Event | Details |
|-----------|-------|---------|
| YYYY-MM-DD HH:MM | milestones-locked | N milestones approved by user |
Structured state initialization: After creating milestone definitions, initialize the canonical structured state via harness_milestone:
{ "runId": "<run-id>", "action": "create", "id": "M1", "name": "Milestone Name", "status": "pending", "dependencies": [] }
Individual milestone file (M1-.md) format:
# Milestone: [Name]
**ID:** M1
**Status:** pending
**Dependencies:** [None | M1, M2, ...]
**Risk:** [Low/Medium/High]
**Effort:** [Small/Medium/Large]
## Goal
[One sentence goal]
## Success Criteria
- [ ] [Specific, measurable criterion]
- [ ] [Specific, measurable criterion]
- [ ] [Specific, measurable criterion]
## Files Affected
- Create: [files to create]
- Modify: [files to modify]
## User Value
[What the user sees/can test after this milestone]
## Abort Point
[Yes/No ā can user stop here and have something useful?]
## Notes
[Any special considerations from reviewer analysis]
Anti-Patterns
| Anti-Pattern | Why It Fails |
|---|
| Running reviewers sequentially | Wastes time; reviewers are independent |
| Skipping synthesis and just merging reviewer outputs | Conflicts go unresolved; milestone boundaries are incoherent |
| Accepting milestones without measurable success criteria | Cannot validate completion; "done" becomes subjective |
| Creating milestones too large (>12 tasks each) | Exceeds single agentic-plan-crafting cycle; risk of context loss |
| Creating milestones too small (1-2 tasks each) | Overhead of agentic-plan-crafting + agentic-run-plan + agentic-review-work exceeds the work itself |
| Creating more than 10 milestones without user approval | Compounding risk across milestones; likely needs project split |
| Ignoring reviewer conflicts | Unresolved conflicts surface during execution when they're expensive to fix |
| Not saving reviewer outputs | Loses the reasoning behind milestone decisions; cannot audit later |
| Letting user skip approval | User discovers misalignment mid-execution after days of work |
Minimal Checklist
Transition
After milestone planning is complete:
- To begin execution ā
agentic-long-run skill
- If ambiguity discovered ā return to
agentic-clarification skill
- If task is too small for milestones ā use
agentic-plan-crafting directly
This skill itself does not invoke the next skill. It ends by presenting the milestone plan and letting the user choose the next step.