ワンクリックで
error-recovery
Systematically recover from failures — isolate, diagnose, fix, verify, prevent. The three-attempts rule, evidence gathering before guessing, and when to roll back vs push forward.
メニュー
Systematically recover from failures — isolate, diagnose, fix, verify, prevent. The three-attempts rule, evidence gathering before guessing, and when to roll back vs push forward.
Split any task into small, reviewable subtasks — atomic units of work that are easy to track, easy to review, and hard to get wrong.
Use the gh CLI to fetch issues, discussions, and linked issues — analyze the problem, find relevant code, and implement a fix with a PR.
Write and manage Architecture Decision Records — ADR format, status lifecycle, tradeoff analysis, templates, and integration with project workflow.
Design effective database schemas — normalization, indexing strategies, relationships, migrations, query optimization, and data modeling patterns for SQL and NoSQL.
Write clear completion criteria before starting work — acceptance criteria formats, quality gates, scope boundaries, and the done checklist pattern.
Design and manage deployments — CI/CD pipelines, Docker, zero-downtime strategies, rollback plans, environment management, and release automation.
| name | Error Recovery |
| description | Systematically recover from failures — isolate, diagnose, fix, verify, prevent. The three-attempts rule, evidence gathering before guessing, and when to roll back vs push forward. |
Systematically recover from failures — isolate, diagnose, fix, verify, prevent. The three-attempts rule, evidence gathering before guessing, and when to roll back vs push forward.
Use this skill when something goes wrong: a test fails, code doesn't compile, the UI doesn't render, an API returns an error, or a deployment breaks. This provides a structured recovery procedure instead of guessing and repeating.
Errors are data, not failures. Every error message, stack trace, and unexpected behavior contains information about what's wrong. The goal is to extract that information systematically, apply the smallest effective fix, and confirm it worked — then prevent it from happening again.
┌─────────────────────────────────────┐
│ 1. ISOLATE │
│ "What is the smallest reproduction?"│
└────────────────┬────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ 2. DIAGNOSE │
│ "What does the evidence say?" │
└────────────────┬────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ 3. FIX │
│ "What is the smallest change?" │
└────────────────┬────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ 4. VERIFY │
│ "Did it work without side effects?" │
└────────────────┬────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ 5. PREVENT │
│ "How do we stop this recurring?" │
└─────────────────────────────────────┘
Goal: Narrow the problem to the smallest possible reproduction.
Questions to ask:
- Does this happen consistently or intermittently?
- Does it happen in isolation (minimal example)?
- Does it happen with different inputs?
- Did it work before? If so, what changed?
Actions:
- Create a minimal reproduction (strip away unrelated code)
- Check if the issue reproduces in a clean environment
- Check recent changes (git log, blame)
- Check environment differences (local vs CI vs prod)
Isolation exit criterion: You can reproduce the error with a specific, minimal set of conditions.
Goal: Understand the root cause before attempting a fix.
Evidence to gather before any fix:
- Exact error message and stack trace
- Input that triggers the error
- Expected vs actual output
- State at the time of failure (logs, variables, database)
- Recent changes that touch the failing area
Diagnostic techniques:
- Read the stack trace from top to bottom (last frame is where it crashed)
- Check logs at the time of failure
- Add targeted logging (don't guess — add a log, run, observe)
- Check documentation for the API/library being used
- Compare with known working examples
Key rule: Before writing any fix, write down what you think is wrong in one sentence. If you can't, you haven't diagnosed enough.
Goal: Apply the smallest possible change that addresses the root cause.
Fix principles:
- One change at a time (resist the urge to fix multiple things)
- Change the code that's wrong, not the test/check that caught it
- If the fix is complex, the diagnosis might be wrong
- Prefer local fixes over global changes (don't refactor while fixing)
The "three attempts" rule:
1st attempt: Direct fix based on diagnosis
2nd attempt: Re-diagnose, different approach
3rd attempt: Escalate — ask for help, change strategy entirely
After three failed attempts, stop and:
- Re-examine your assumptions (is the diagnosis correct?)
- Look for similar solved problems
- Consider a different approach altogether
- If working with someone, share what you've tried
Goal: Confirm the fix works AND nothing else broke.
Verify checklist:
[ ] The original reproduction no longer errors
[ ] The fix works for edge cases of the same issue
[ ] Related tests still pass
[ ] Adjacent functionality still works (no regression)
[ ] The fix doesn't introduce new issues
If verification fails:
- Undo the fix (git checkout -- file)
- Go back to Diagnose (you missed something)
- Don't pile fixes on top of fixes
Goal: Stop this error from happening again — or catch it faster when it does.
Prevention strategies:
- Add a test that covers this exact case
- Add input validation if the error was caused by unexpected data
- Add better error messages if the error was hard to diagnose
- Add monitoring/alerting if it happened in production
- Add documentation if the fix involved non-obvious logic
- Consider if similar bugs exist in related code (pattern fix)
| Situation | Action |
|---|---|
| Error in development, easy fix | Fix forward |
| Error in development, unclear cause | Roll back, investigate |
| Error in production, hotfix available | Fix forward with urgent review |
| Error in production, unclear cause | Roll back immediately |
| Fix introduces new error | Roll back, re-diagnose |
| Can't reproduce | Add logging, leave fix for when more info available |
Roll back is never a failure. It's the safest path when you don't fully understand the error or can't verify the fix.
| Symptom | First things to check |
|---|---|
| Compile/build error | Read the FIRST error message (not the cascade that follows) |
| Test fails | Did the test exist before? Did the code change? Is the test wrong? |
| UI not rendering | Console errors, network requests, component props |
| API returns 500 | Server logs, request payload, database connectivity |
| Wrong data | Check the query, check the transformation, check the display |
| Slow performance | Profile first, guess second. Measure before optimizing |
| Regression | Check git log for recent changes to the affected area |