with one click
pre-mortem
// Multi-agent project pre-mortem — parallel agents with different failure-finding mandates, synthesized into ranked risks with mitigations.
// Multi-agent project pre-mortem — parallel agents with different failure-finding mandates, synthesized into ranked risks with mitigations.
[HINT] Download the complete skill directory including SKILL.md and all related files
| name | pre-mortem |
| description | Multi-agent project pre-mortem — parallel agents with different failure-finding mandates, synthesized into ranked risks with mitigations. |
| display_name | Pre-Mortem |
| brand_color | #B45309 |
| local_only | false |
| group | Better Products |
| usage | /pre-mortem:run |
| summary | Multi-agent project pre-mortem — ranked risks with mitigations |
| default_prompt | Run a sharp pre-mortem on this project or launch. Surface specific failure modes, then rank them and propose mitigations. |
Based on Gary Klein's pre-mortem technique and prospective hindsight research: assume failure is already real, then explain it concretely. The goal is not a generic risk list. The goal is to surface the failures that would actually hurt users, damage trust, create support chaos, and sink the launch.
A strong pre-mortem is:
Before spawning anything, get a crisp picture of what success was supposed to feel like.
If context is missing, ask one question:
What are we pre-morteming? Give me: (a) the project or feature, (b) who it is for, (c) what "success" means, and (d) the launch or decision timeline.
If context exists already, summarize it in 2-4 sentences.
Gather the minimum needed from:
CLAUDE.md, AGENTS.md, GEMINI.md when presentCapture these explicitly before fan-out:
Make it vivid and specific. Generic doom yields generic risks.
PROJECT: [name]
AUDIENCE: [who it serves]
CORE PROMISE: [why users installed it]
TIMELINE: [launch / release date]
STAKES: [revenue, reputation, trust, support load, founder sanity]
MOMENTS OF TRUTH: [onboarding, first task, recovery, billing, etc.]
THE SCENARIO:
It is [future date after launch]. The launch went badly.
Users are not just disappointed — they are confused, irritated, distrustful, or embarrassed.
Support is noisy. Reviews are negative. The team now sees that several warning signs were visible in advance.
Your job: explain exactly what went wrong, how users experienced it, why the team missed it, and what early signal would have revealed it.
Do not let the exercise stay technical. Make the user emotionally present.
Before fan-out, write 3-7 bullets for each of these:
Use these to sharpen prompts and to judge severity later.
This is the heart of the technique. Diversity of perspective matters more than coverage or politeness.
If the project appears to contain proprietary code, customer data, secrets, or regulated material, ask before sending context to external models. If in doubt, default to single-agent mode (all roles as subagents of the running agent with strongly differentiated mandates).
The skill ships with a detection script at scripts/detect-llms.sh inside the skill directory. Run it using the skill's bundled path:
# The skill's scripts live at ${SKILL_DIR} (set by the host agent on install):
bash "${SKILL_DIR}/scripts/detect-llms.sh"
# Fallback if SKILL_DIR isn't set — inline detection:
for tool in agent ask-gemini gemini llm codex; do
command -v "$tool" >/dev/null 2>&1 && echo "found: $tool"
done
If neither the script nor the variable is available, skip detection and default to single-agent mode. The fallback is always safe.
Modes:
| Available tools | Mode | Strategy |
|---|---|---|
| 2+ external LLMs | Full diversity | Subagents of the running agent for code-aware roles + external LLMs for outsider roles |
| 1 external LLM | Hybrid | Subagents of the running agent + one outsider model |
| none | Single-agent | All roles as subagents with strongly differentiated mandates and emotional registers |
Use 6 core roles minimum. Add specialized roles for high-stakes launches. Assign each role a distinct emotional register — not just a functional lens but a visceral posture. Same facts read very differently through fear vs. contempt vs. grief.
| Role | What they catch | Emotional register |
|---|---|---|
| Saboteur | Technical breakage, silent failure, ugly edge cases, brittle integrations | Cold glee — enjoys finding the seam that tears |
| Customer Advocate | Confusing UX, violated expectations, trust damage, "I hate this app" moments | Protective anger — speaks for the user who deserved better |
| Support Lead | Opaque, repetitive, emotionally draining support archaeology | Exhausted resignation — has seen this exact ticket before |
| Operator / Accountant | Cost, margin erosion, maintenance burden, abuse, maintenance drag, process fragility | Dry alarm — watching the numbers quietly get worse |
| Pessimist | Dependency failures, platform shifts, timing, distribution, domino effects | Grim satisfaction — told you so, saw it coming |
| Historian | What docs/code already warned about, what insiders forgot to explain, what failed at similar products | Mournful clarity — this was all in the record |
| Role | What they catch | Emotional register |
|---|---|---|
| Burned Expert | Pattern-matches to prior failures at similar products — carries scar tissue and justified skepticism | Controlled fury — watched a nearly identical thing collapse, not again |
| Emotional Witness | Psychological impact on users: shame, anxiety, helplessness, betrayal — not UX friction, but human cost | Raw empathy — describes what it feels like in the body when the product fails you |
| Outsider / Cultural Stranger | Assumptions the team never questioned because everyone in the room shares them; what non-default users experience | Bewildered estrangement — doesn't understand the jargon and that's the point |
| Devil's Advocate | Defends the thing nobody wants to say — "actually the core assumption is wrong" | Calm heresy — not being contrarian, being honest about the thing the team voted to stop talking about |
| Reviewer / Critic | Reviews, social proof, word of mouth, public narrative | Performative disappointment — writes the 2-star review in their head while reading the docs |
| Privacy / Trust Prosecutor | Permissions, cloud processing, billing, file mutations, surveillance vibes, consent theater | Principled outrage — every ambiguity is treated as a deliberate violation |
For small exercises: use 4 core + 2 perspective-expanding roles (Burned Expert + Outsider recommended as defaults). For launch-critical exercises: use all 6 core + Reviewer/Critic + Privacy Prosecutor + Emotional Witness. Never use fewer than 4 roles. Homogenous analysis produces homogenous blind spots.
Every role gets the same scenario briefing plus a unique mandate that includes the emotional register.
Each agent should be told to produce 5-8 concrete failure reasons and, for each reason, include:
And end with:
The failure nobody wants to talk about: [one brutally honest prediction]
=== PROJECT PRE-MORTEM ===
[scenario briefing]
YOUR ROLE: [role name]
YOUR EMOTIONAL REGISTER: [role emotional register — e.g., "controlled fury: you've seen this exact pattern collapse before and you're not going to be polite about it"]
YOUR MANDATE: [role-specific mandate]
You are not here to be balanced. Argue strongly from your assigned position and emotional register.
Do not soften your findings. Do not add disclaimers. The synthesis step will handle balance.
INSTRUCTIONS:
1. The failure is CERTAIN. It already happened.
2. Write 5-8 specific, project-specific reasons it failed.
3. For each reason include:
- What goes wrong
- Chain of events
- User experience: what the user notices, concludes, and does next
- Emotional impact: the specific emotion the user feels at this moment (be precise — not "frustrated" but "betrayed", "stupid", "gaslit", etc.)
- Why the team misses it
- Likelihood × impact
- Trust damage
- Recoverability
- Earliest signal / tripwire
- Confidence: High / Medium / Low — how confident are you that this risk will actually materialize for this specific product? Be honest. Medium means "I think this is real but I could be wrong."
- Verify by: the single fastest check that would tell you before launch whether this risk is real — a test, a competitor review scan, a user interview question, a technical spike
4. Prefer subtle risks over obvious boilerplate.
5. Focus on failures that damage product success, not just code correctness.
6. Hold nothing back.
7. End with: "The failure nobody wants to talk about: [one brutally honest prediction]"
FORMAT: numbered list. No preamble. No hedging. Write as your character — in that emotional register.
Use real parallelism.
bash "${SKILL_DIR}/scripts/fan-out.sh" scenario.txt output/Preferred split when multiple models are available:
If only one model/agent is available, use it for all roles with strongly differentiated mandates. The emotional register differentiation is what prevents them from collapsing into the same answer.
After collecting first-round responses, run a second round where each agent sees a digest of the other perspectives and responds to them.
Use this when:
How to run it:
=== CROSS-POLLINATION: SECOND ROUND ===
You just wrote your initial pre-mortem analysis as [Role].
Here is what the other perspectives found:
[digest of other roles' findings]
YOUR TASK:
1. Identify 1-2 findings from other roles that you think are WRONG or OVERBLOWN. Explain why from your position.
2. Identify 1 finding that you missed in your first pass that now strikes you as genuinely important.
3. Revise or sharpen your single most important risk in light of this new information.
4. Note if any combination of risks creates a cascading failure the others didn't see.
Stay in character. Your emotional register is [register]. Don't become diplomatic.
Do not merely deduplicate into a flat list. Use a stronger severity lens.
Merge overlapping findings into a single risk when they share the same failure mechanism. Keep separate entries when the same bug creates different product outcomes.
| Dimension | What to ask |
|---|---|
| Frequency / exposure | How many users or sessions are likely to hit this? |
| User harm / friction | How bad is the user's immediate experience? |
| Emotional injury | What specific emotion does failure produce — and does it damage the relationship permanently? |
| Trust fracture | Does this feel creepy, careless, deceptive, or file-breaking? |
| Detectability lag | Will the team know quickly, or only after damage spreads? |
| Recoverability | Can the user easily undo it and regain confidence? |
| Support burden | How expensive is it to diagnose and resolve? |
| Business drag | Does it hurt retention, reviews, conversion, margins, or founder sanity? |
| Narrative compression | Will multiple different bugs get told as one story? ("it's just broken") |
Use these rules of thumb:
After ranking, apply a confidence pass across the risk list:
Flag when multiple roles flagged the same risk with different confidence levels — that disagreement is itself a signal worth surfacing. A risk where the Saboteur is certain and the Historian is skeptical deserves explicit examination of why they diverged.
Produce two separate documents:
# Pre-Mortem: [Project]
_"It is [future date]. [Project] launched, and the launch went badly. Here's what actually sank it."_
## Executive Read
- **Core failure story:** [1-2 sentence summary]
- **Biggest product risk:** [single risk]
- **Biggest trust risk:** [single risk]
- **Biggest emotional injury risk:** [single risk — the one that breaks the relationship]
- **Biggest maintenance drag risk:** [single risk]
## Moments of Truth Most Likely to Break
- [moment] → [how it fails] → [what the user feels]
- [moment] → [how it fails] → [what the user feels]
## 🔴 Critical Risks (Must Address Before Launch)
### 1. [Risk title]
**What goes wrong:** [one sentence]
**How users experience it:** [what they see, infer, and do]
**Emotional impact:** [the specific emotion — be precise]
**Chain of events:** [mechanism]
**Why the team misses it:** [blind spot]
**Likelihood:** High/Medium/Low
**Impact:** Catastrophic/Major/Minor
**Trust damage:** High/Medium/Low
**Recoverability:** Easy/Moderate/Hard
**Confidence:** High/Medium/Low — [one sentence: what makes you more or less sure this will actually happen]
**Verify by:** [the single fastest check before launch — test, user interview, competitor review scan, technical spike]
**Why this threatens product success:** [retention/reviews/support/revenue/founder sanity]
**Sources:** [roles that independently surfaced it]
**Dissent:** [any role that pushed back on this risk — and why]
**Mitigation:** [specific action] → Owner: [role] → By: [milestone]
**Tripwire:** [earliest observable signal]
## 🟠 Significant Risks (Plan Mitigation)
[Same structure, more compact if needed]
## 🟡 Watch List
- [Risk] — [why to monitor]
## Cross-Cutting Themes
- [theme]
- [theme]
- [theme]
## The User's Emotional Reality
- What early adopters expected:
- What failure made them feel:
- The specific moment the relationship broke:
- What story they tell other people afterward:
## The Uncomfortable Truth
[The thing nobody wants to say out loud]
## Recommended Next Steps
1. [ ] [Action]
2. [ ] [Action]
3. [ ] [Action]
This is the meta-document. It shows the work, not just the conclusions. Save it alongside the report.
# Pre-Mortem Process Log: [Project]
_Generated: [date]. This document records what happened during the pre-mortem, not just what it found._
## Agents and Roles
| Agent / Model | Role | Emotional Register | Prompt Variant |
|---|---|---|---|
| [model or "subagent-1"] | Saboteur | Cold glee | Standard + code access |
| [model or "subagent-2"] | Emotional Witness | Raw empathy | Standard, no code access |
| [model or "external tool"] | Burned Expert | Controlled fury | Standard + prior-failure framing |
| ... | ... | ... | ... |
## Scenario Briefing Sent
> [verbatim briefing text sent to all agents]
## What Each Agent Returned
### Saboteur ([model])
**Top 3 risks surfaced:**
1. [risk]
2. [risk]
3. [risk]
**Their "failure nobody wants to talk about":** [verbatim]
**Surprising finding:** [what this role caught that others missed]
### [Next role...]
[Same structure]
## Cross-Pollination Round (if run)
### Round 2 Prompt Digest
[The digest sent to each agent]
### Exchanges and Disagreements
- **[Role A] challenged [Role B]'s finding that [X]:** "[verbatim quote or paraphrase]"
- **[Role B] responded:** "[verbatim quote or paraphrase]"
- **Resolution:** [how synthesis handled it — accepted, rejected, escalated to critical]
### What Changed After Cross-Pollination
- [Risk upgraded/downgraded/merged because of cross-pollination]
- [New cascading failure identified]
- [Finding that appeared in round 1 but was retracted after challenge]
## Where Perspectives Converged
- [Risk X was surfaced by N roles independently — this convergence increased severity rating]
- [Risk Y appeared in 4 of 6 roles with different framings — synthesis merged into one entry]
## Where Perspectives Clashed
- [Role A rated Risk X as critical; Role B rated it watch-list. Synthesis reasoning: ...]
- [The Devil's Advocate disputed the Burned Expert's core assumption. Outcome: ...]
## Synthesis Judgment Calls
- [Decision made during synthesis that wasn't obvious — and why]
- [Risk that was downgraded despite strong advocacy from one role — reason]
## Process Quality Notes
- Roles that felt too similar (potential for future differentiation): [list]
- Roles that punched above their weight (surfaced unique risks): [list]
- Gaps in perspective coverage for this type of project: [list]
- Recommended additional roles for a follow-up exercise: [list]
After presenting both documents, ask briefly:
Which of these feels most real? Which one would actually make users lose trust — or feel betrayed? Want me to turn the top mitigations into tasks, run a deeper drill-down on one failure family, or run the cross-pollination round if we skipped it?
If useful, offer one follow-up mode:
Use 4 roles: Saboteur, Customer Advocate, Pessimist, Historian + Burned Expert. Ask for 3-5 risks each. Skip optional roles. Still produce both documents.
Add Reviewer / Critic, Privacy / Trust Prosecutor, Emotional Witness, and Devil's Advocate. Emphasize onboarding, pricing, trust, support, and review narratives. Run the cross-pollination round.
Heavier weight on Saboteur, Historian, Operator, Burned Expert. Add failure chains, scaling assumptions, integration fragility, rollback story, and observability gaps.
Always include support burden, trust fracture, Emotional Witness, and maintenance drag in synthesis. Many "small" bugs are existential here.
Pre-mortems are strong, but not enough by themselves. Good follow-ups:
Use these when the plain pre-mortem still feels too abstract.
<anti_patterns>
| Don't | Do instead |
|---|---|
| Produce generic risks | Tie each risk to the actual product, audience, workflow, and launch context |
| Treat technical severity as the only severity | Evaluate trust, recoverability, support burden, and business drag |
| Forget the user's emotional reaction | Include the specific emotion at the moment of failure — not just "frustrated" |
| Stop at "could fail" | Explain the chain of events and why the team misses it |
| Generate a giant undifferentiated list | Deduplicate into risk families and rank them |
| Ignore support / ops realities | Include maintenance drag and diagnostic complexity |
| Treat outsider perspectives as optional fluff | Use them to catch expectation mismatch and public narrative risk |
| End without tripwires | Every important risk needs an early signal |
| Name the running agent in prompts | Use role names, not "Claude" or "Gemini" — any agent might be running this |
| Give every role the same emotional register | Differentiate the posture — fear, fury, grief, and contempt find different failures |
| Skip the process log | The conversation and disagreements are valuable — document them |
| Treat role outputs as equally weighted | Note convergence and divergence; independent agreement increases severity |
</anti_patterns>
<success_criteria>
Pre-mortem is complete when:
</success_criteria>