Run any Skill in Manus with one click

incident-response

Stars0

Forks0

UpdatedMarch 28, 2026 at 22:12

Incident investigation protocol — structured debugging with 5-why analysis, evidence gathering, and post-mortem documentation.

Installation

Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.

Run Skill in Manus

Source

kennedym-ds

kennedym-ds/claudecode-orchestrator

View GitHub Repository View Creator Repositories

Download

Run Skill in Manus

Related occupationsSOC

Based on SOC occupation classification

Network and Computer Systems AdministratorsComputer and Mathematical Occupations·SOC 15-1244

SKILL.md

readonly

name	incident-response
description	Incident investigation protocol — structured debugging with 5-why analysis, evidence gathering, and post-mortem documentation.
user-invocable	false

Incident Response

Severity Classification

Level	Impact	Response Time	Escalation
SEV1	Service down, data loss	Immediate	Architect + Security
SEV2	Major feature broken	< 1 hour	Lead + Reviewer
SEV3	Minor feature degraded	< 4 hours	Implementer
SEV4	Cosmetic, non-blocking	Next sprint	Doc in backlog

Investigation Protocol

1. Triage (5 minutes)

What is the symptom?
What is the impact radius?
When did it start?
Is it ongoing or resolved?

2. Evidence Gathering

Error logs and stack traces
Recent deployments or changes (git log)
System metrics (CPU, memory, latency)
User reports or reproduction steps

3. Root Cause Analysis (5 Whys)

Why did the service fail? → {direct cause}
  Why did that happen? → {contributing factor}
    Why wasn't it caught? → {detection gap}
      Why wasn't it prevented? → {process gap}
        What systemic change prevents this? → {root cause fix}

4. Fix Categorization

Hotfix: Minimal change to restore service (deploy immediately)
Proper fix: Correct solution with tests (deploy in next cycle)
Systemic fix: Process or architecture change (plan and schedule)

5. Post-Mortem Template

## Incident: {title}
**Date:** {date}
**Severity:** {level}
**Duration:** {start} → {end}
**Impact:** {who/what affected}

### Timeline
- {HH:MM} — {event}

### Root Cause
{5-why chain}

### Resolution
{what was done}

### Action Items
- [ ] {preventive action} — Owner: {name} — Due: {date}

Debugging Guardrail

If 3 fix attempts fail → stop and escalate to architecture review. Don't keep trying random fixes.

More from this repository

same repository

demo-flow

kennedym-ds/claudecode-orchestrator

Demo pipeline state machine — 7-phase autonomous sequence with delegation context templates, phase transition logic, BLOCKED recovery strategies, and demo-state.json schema. Used exclusively by demo-conductor.

2026-03-310

demo-narration

kennedym-ds/claudecode-orchestrator

Cinematic narration style guide for demo-conductor — ANSI-coloured banner formats, live pipeline scoreboard, audience-facing language, phase summaries, and error narration patterns. Keeps the demo presentation-quality throughout.

2026-03-310

completion-protocol

kennedym-ds/claudecode-orchestrator

Standardized completion and escalation protocol for subagent responses. Ensures the conductor can machine-parse every subagent return. Use when reporting completion status back to the orchestrator.

2026-03-300

learnings-mgmt

kennedym-ds/claudecode-orchestrator

Cross-session learnings lifecycle — schema, storage, retrieval, and pruning of lessons learned during orchestrator sessions. Use when managing learnings via the /learn command.

2026-03-300

team-routing

kennedym-ds/claudecode-orchestrator

Agent Teams assembly and task injection — selects appropriate team, validates prerequisites, estimates cost, injects tasks into the shared task list, and manages team lifecycle.

2026-03-300

budget-gatekeeper

kennedym-ds/claudecode-orchestrator

Token and cost tracking with model tier enforcement

2026-03-290

name	incident-response
description	Incident investigation protocol — structured debugging with 5-why analysis, evidence gathering, and post-mortem documentation.
user-invocable	false

Incident Response

Severity Classification

Level	Impact	Response Time	Escalation
SEV1	Service down, data loss	Immediate	Architect + Security
SEV2	Major feature broken	< 1 hour	Lead + Reviewer
SEV3	Minor feature degraded	< 4 hours	Implementer
SEV4	Cosmetic, non-blocking	Next sprint	Doc in backlog

Investigation Protocol

1. Triage (5 minutes)

What is the symptom?
What is the impact radius?
When did it start?
Is it ongoing or resolved?

2. Evidence Gathering

Error logs and stack traces
Recent deployments or changes (git log)
System metrics (CPU, memory, latency)
User reports or reproduction steps

3. Root Cause Analysis (5 Whys)

Why did the service fail? → {direct cause}
  Why did that happen? → {contributing factor}
    Why wasn't it caught? → {detection gap}
      Why wasn't it prevented? → {process gap}
        What systemic change prevents this? → {root cause fix}

4. Fix Categorization

Hotfix: Minimal change to restore service (deploy immediately)
Proper fix: Correct solution with tests (deploy in next cycle)
Systemic fix: Process or architecture change (plan and schedule)

5. Post-Mortem Template

## Incident: {title}
**Date:** {date}
**Severity:** {level}
**Duration:** {start} → {end}
**Impact:** {who/what affected}

### Timeline
- {HH:MM} — {event}

### Root Cause
{5-why chain}

### Resolution
{what was done}

### Action Items
- [ ] {preventive action} — Owner: {name} — Due: {date}

Debugging Guardrail

If 3 fix attempts fail → stop and escalate to architecture review. Don't keep trying random fixes.