| name | retrospective |
| description | Structured retrospective after completing a delivery increment or diamond. Captures learning for continuous improvement. |
| metadata | {"instruction_budget":"58","framework_dependency":"mycelium","framework_dependency_note":"This skill is designed to run within the Mycelium framework (https://github.com/haabe/mycelium). Standalone use will skip the canvas state, theory gates, and harness behavior the skill assumes. Install: /plugin install mycelium@haabe/mycelium."} |
Retrospective
Run after every completed delivery diamond or significant milestone. Source: Forsgren (learning culture).
Preflight: Read target canvas file(s) before any Write/Edit
Hard rule. Before issuing Write or Edit against any .claude/canvas/*.yml, use the Read tool on that file in this session. Claude Code's Read-before-Write check requires the Read tool specifically — cat/head/grep via Bash do NOT satisfy it.
Edit vs Write — different cost profiles (verified 2026-05-14):
Edit (exact-string replacement): Read with limit: 1 satisfies the check at ~50 tokens. State-tracking is per-file, not per-byte — subsequent Edit calls work anywhere in the file. Use this for partial updates against large canvas files (e.g., purpose.yml at 800+ lines).
Write (full replacement): do a full Read first. Write obliterates the file; you should see what you're about to replace. The limit:1 shortcut is not appropriate here.
ID-bearing entries — scan the ID space before assigning (added 2026-05-15, v0.23.19): When adding a new component, opportunity, solution, or any other ID-bearing entry to a canvas file, run a Bash grep first to confirm the next ID in your prefix sequence is actually free:
grep "^ - id: <prefix>-" .claude/canvas/<file>.yml | sort -u
Replace <prefix> with the canvas's ID prefix (comp for landscape, opp for opportunities, sol for solutions, ht for human-tasks, etc.). Then pick the next free integer. validate_canvas.py has a duplicate-ID check (lines 230-239) that catches the failure on CI, but a duplicate can persist in the working tree for days if CI isn't run between edit and discovery — see roadmap-repo corrections.md 2026-05-15 "Duplicate canvas ID created in landscape.yml" for the worked example.
Original failure mode: anti-pattern #7 instance #5, 2026-05-09 — agent conflated Bash head with the Read tool, lost ~14k tokens to a Write-fail → remedial-full-Read → re-Write loop. The limit:1 discipline (graduated 2026-05-14, v0.23.18) prevents the second-order cost where the agent correctly follows the rule but full-Reads every time. The ID-scan discipline (graduated 2026-05-15, v0.23.19) prevents the related class where the agent reads enough of the file to satisfy the Edit check but not enough to see existing ID assignments — kin to anti-pattern #8 (Stale State Read).
If this skill writes to multiple canvas files, register each one first (limit:1 for Edit-only paths; full Read for Write paths) AND ID-scan any prefix you intend to assign.
See CLAUDE.md Canvas writes — Read before Write for the canonical rule.
Workflow
Run these steps IN ORDER. Do not skip any step. Step 1 (cycle recording) MUST be completed FIRST — before any reflective analysis.
Step 1. Record Cycle in .claude/canvas/cycle-history.yml AND Decision Log (MANDATORY — DO THIS FIRST)
This step is critical. Without it, the learning metabolism has no data. You MUST do BOTH parts (5a and 5b).
Step 5a. Write cycle record to .claude/canvas/cycle-history.yml
Find the leaf_id and opportunity_id for the delivered solution (from .claude/canvas/opportunities.yml or .claude/canvas/gist.yml). Then write a cycle record:
- cycle_id: cycle-NNN
leaf_id: "opp-XXX-sol-X"
opportunity_id: "opp-XXX"
diamond_id: "d-XXX"
completed_at: "YYYY-MM-DDTHH:MM:SSZ"
outcome: shipped | partial | failed | discarded
cycle_class: product-leaf | meta-dogfood | observation
predicted:
ice_score: {i: X, c: X, e: X, total: XXX}
feasibility_risk: low | medium | high
effort_estimate: "X days/weeks"
actual:
effort: "X days/weeks"
dora:
deploy_frequency: "..."
lead_time: "..."
change_failure_rate: "..."
mttr: "..."
calibration:
ice_accuracy: "predicted XXX vs actual [outcome description]"
effort_accuracy: "predicted X days vs actual X days (delta: +/-X)"
risk_accuracy: "feasibility was [predicted] — actual was [description]"
learnings: "Key learning from this cycle"
Update calibration_summary.total_cycles count. If total_cycles reaches a multiple of 5, prompt: "5 cycles since last review. Run /mycelium:framework-health to check calibration?"
Hard gate on cycle_class: product-leaf: if the cycle being closed shipped an OST solution leaf, predicted.ice_score.total must be non-zero. If it is zero, do NOT write the record yet — stop and ask: "This cycle shipped a product leaf but has no recorded ICE prediction. Was /mycelium:ice-score run before the cycle opened? If yes, copy the score from opportunities.yml. If no, this is a Check 38 violation — class the cycle as meta-dogfood if no design tradeoff was actually scored, or backfill the ICE score with an honest reconstruction noted as reconstructed_post_hoc: true." Reconstructed scores are excluded from calibration aggregates but preserved for the audit trail.
Step 5b. Log cycle calibration summary in .claude/harness/decision-log.md
Write a decision log entry titled "Cycle calibration record" that includes ALL of the following (use these exact words):
- cycle number and diamond ID
- predicted ICE score and effort estimate (from the original canvas)
- actual outcome and effort (from what really happened)
- calibration assessment: was the prediction accurate?
- effort delta: if the estimate was an underestimate or overestimate, state the accuracy gap (e.g., "effort accuracy: predicted 5 days vs actual 7 days, 40% underestimate")
- Risk dimension accuracy (e.g., "feasibility was predicted medium — actual confirmed, analytics pipeline was indeed the hardest part")
This decision log entry ensures the calibration data is auditable alongside other decisions, not just buried in cycle-history.yml.
Step 2. What Went Well?
- Which patterns from patterns.md were reused successfully?
- What new approaches worked?
- Where did the theory gates catch a real problem?
Step 3. What Didn't Go Well?
- What mistakes were made? (Add to corrections.md)
- Where did we skip a guardrail and regret it?
- What took longer than expected and why?
Step 4. What Should Change?
- New corrections to add
- New patterns to capture
- Process improvements
- Guardrail adjustments
- ADR review (if
docs/adr/ exists): did implementation follow the decided approach? Any consequences that turned out differently than expected? Mark superseded ADRs.
Step 5. BVSSH Dimension Check
- Better: Did quality improve or degrade?
- Value: Did we deliver actual user value?
- Sooner: Was our flow efficient?
- Safer: Did we maintain security and trust?
- Happier: How is team satisfaction? Customer advocacy? Societal impact? Was compute usage proportionate to value (not wasteful)?
Step 6. Rework Follow-Up (14-day window)
If this retrospective is for a cycle completed more than 14 days ago, check:
- How many corrections were logged against this delivery since completion? →
rework.post_delivery_corrections
- How many regressions occurred? →
rework.post_delivery_regressions
- Days to first regression? →
rework.days_to_first_regression
Update the cycle record in .claude/canvas/cycle-history.yml with the rework fields. This is the denominator — the hidden cost of delivery that velocity metrics miss.
If this retrospective is for a just-completed cycle, prompt: "Set a reminder to check rework in 14 days. Run /mycelium:retrospective rework-check [cycle-id] after that."
Source: Paddo (the denominator problem — 43% of AI-assisted code requires post-delivery debugging). Forsgren (change failure rate as a trailing indicator).
Root Cause Analysis (when "What Didn't Go Well" surfaces a significant problem)
Use these two complementary techniques. Fishbone gives breadth (all possible causes). 5 Whys gives depth (one cause traced to its root).
Fishbone Diagram (Ishikawa)
Map all potential causes before investigating any. Structure:
┌─ People (skills, handoffs, communication)
├─ Process (gates, cadence, workflow)
Problem ◄───────────────├─ Product (canvas, evidence, assumptions)
(effect) ├─ Platform (tools, infra, dependencies)
├─ Principles (which theory/guardrail failed?)
└─ Pressures (deadlines, scope, external)
- Write the specific problem at the head
- Brainstorm causes in each category — add as branches
- Drill into sub-causes until you reach actionable items
- Vote/rank the most likely root causes for investigation
Ishikawa's original 6M manufacturing categories: Man (Manpower), Machine, Method, Material, Measurement, Mother Nature (Environment). Adapted for product development as: Man→People, Machine→Platform, Method→Process, Material→Product (inputs to the work), Measurement→Principles (what we measure against), Mother Nature→Pressures (external forces).
5 Whys (Toyoda)
For the top-ranked cause from the fishbone, ask "why?" five times:
- Why did this happen? → [first-level cause]
- Why did that happen? → [second-level cause]
- Why? → [deeper]
- Why? → [deeper]
- Why? → [root cause — usually systemic]
Stop rule: Stop when ANY of these conditions are met:
- You reach something you can change systemically (a guardrail, gate, or process step)
- Asking "why" again would require speculation rather than verifiable fact
- You reach a cause outside your sphere of influence (an escalation point, not a dead end)
- The answer would be the same regardless of asking "why" (you've hit bedrock)
Anti-pattern: Stopping at "human error" — that's never the root cause. Ask why the system allowed the error.
Source: Ishikawa (cause-and-effect diagrams), Toyoda/Ohno (5 Whys), adapted for agentic product development.
Waste Identification (Ohno — 7 Wastes / TIMWOOD)
"Eliminating waste is the foundation of lean." (Ohno)
Before root cause analysis, identify which waste category the problem falls into:
| Waste | Product Development Form | Detection |
|---|
| Transportation | Handoffs between people/teams, between discovery and delivery | Count handoffs in the value stream |
| Inventory | WIP, unshipped code, unfinished features, unmerged branches, open PRs | Check WIP limits, branch age |
| Motion | Context switching between tasks, tools, codebases | Track focus time vs fragmented time |
| Waiting | Blocked tasks, review queues, approval bottlenecks, blocked dependencies | Measure wait-to-work ratio |
| Overproduction | Building features nobody uses, YAGNI violations | Compare shipped features to validated needs |
| Overprocessing | Gold-plating, unnecessary abstraction, premature optimization | "Would removing this step reduce value?" |
| Defects | Bugs, rework, corrections, failed deployments | Track defect escape rate |
Also watch for: Muri (overburden → BVSSH Happier / sustainable pace) and Mura (unevenness → delivery cadence variation).
Source: Taiichi Ohno, Sakichi Toyoda (Toyota Production System). Mapped to product development via Poppendieck (Lean Software Development).
Blameless Post-Mortem Format (SRE)
For incidents or significant failures, use the SRE blameless post-mortem:
- Timeline: What happened, when, in what order
- Impact: Who was affected, how severely, for how long
- Contributing factors: What conditions led to this (NOT "who caused it")
- Root cause: The systemic issue, not the human action (use fishbone + 5 Whys above)
- Action items: Specific, assigned, time-bound improvements
- What went well: What prevented it from being worse
Rule: No blame. Focus on the system, not the person. Source: Beyer et al. (SRE)
Refactoring Prompt
After delivery retrospective, always ask:
- "Are there refactoring opportunities? Duplicated logic (DRY)? Unnecessary complexity (KISS)?"
Source: Beck (XP), Fowler (Refactoring)
Output
- Update
.claude/memory/corrections.md with new corrections
- Update
.claude/memory/patterns.md with new patterns
- Update
.claude/memory/delivery-journal.md with retrospective entry
- Update .claude/canvas/bvssh-health.yml if dimensions changed
- Log in .claude/harness/decision-log.md
- Record cycle in
.claude/canvas/cycle-history.yml (see Cycle History Recording above)
Counter-Argument Check (Bias Mitigation)
Before finalizing the retrospective, draft a one-line counter-argument for each major claim: "What's the strongest case that this 'went well' was actually luck? That this 'went wrong' was actually unavoidable? That this 'pattern' is actually noise?" If you can't articulate counter-cases, run /mycelium:devils-advocate before locking in the corrections/patterns.
This addresses the bias cluster documented in corrections.md (L5 sycophancy 2026-04-20, eval overfitting 2026-04-30, sharper-framing-isn't-righter 2026-05-03). Retrospectives are particularly bias-prone — narrative coherence is rewarded, the agent is incentivized to find tidy patterns, and post-hoc rationalization is the natural mode. Counter-arguments break that gravity.
Hindsight Bias Check
Retrospectives are the natural home of hindsight bias — the "I knew it all along" effect that rewrites uncertainty as foreknowledge. For every claim of the form "we should have seen X coming," ask: would I have predicted X with the evidence available BEFORE the outcome? If the honest answer is "no, that evidence only became diagnostic in retrospect," log it as a learning about evidence interpretation, not as a missed signal. This protects future retrospectives from manufacturing false should-have-knowns that distort confidence calibration.
Source: Fischhoff, "Hindsight ≠ Foresight: The Effect of Outcome Knowledge on Judgment Under Uncertainty" (1975).
Especially important when proposing graduation candidates (recurring corrections → guardrails) — make sure the recurrence is real, not 3 instances of pattern-matching by the agent itself.