원클릭으로 Manus에서 모든 스킬 실행

ctx-architecture-failure-analysis

스타69

포크12

업데이트2026년 5월 23일 22:04

Adversarial failure analysis for codebases. Generates falsifiable incident hypotheses: race conditions, ordering assumptions, cache staleness, error swallowing, ownership gaps, idempotency failures, and scaling cliffs. Produces DANGER-ZONES.md with evidence-backed, ranked findings.

설치

Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.

Manus에서 실행

출처

ActiveMemory

ActiveMemory/ctx

GitHub 저장소 열기 Creator 저장소 보기

다운로드

Manus에서 실행

Design Principle

Generate falsifiable incident hypotheses. /ctx-architecture maps what exists. /ctx-architecture-enrich improves map fidelity. This skill generates concrete, disprovable claims about where the map will break under real-world conditions. Every finding is a hypothesis with evidence, not a suspicion or a vibe.

The goal is to find failure modes that code review misses: the ones that ship, pass tests, and break in production at 3am.

This skill requires /ctx-architecture artifacts as input. If they don't exist, stop and tell the user to run /ctx-architecture first.

When to Use

After /ctx-architecture has run and artifacts exist
Before a release or major deployment
After a significant refactor that changed data flow
When investigating a production incident category
When onboarding to an unfamiliar codebase and need to know where the dragons are

When NOT to Use

Without architecture artifacts (run /ctx-architecture first)
For security analysis (that's ctx-threat-model, a separate concern: auth bypass, injection, privilege escalation)
On trivial or small codebases where the analysis cost exceeds the risk
Mid-refactor when the code is intentionally in flux

Inputs

Required (must exist before running):

.context/ARCHITECTURE.md: system map
.context/DETAILED_DESIGN*.md: module-level detail
.context/map-tracking.json: coverage data

Optional (enhances analysis):

.context/DANGER-ZONES.md: existing danger zones from /ctx-architecture principal mode (used as starting points, not as the final word)
Code-intelligence MCP (canonical: GitNexus; equivalents include sourcegraph-cody): blast radius estimation, shared-state detection
Web-search-with-citations MCP (canonical: Gemini Search; equivalents include Firecrawl, Exa, Tavily): cross-reference against known failure patterns

Process

Phase 0: Validate Prerequisites

Check that architecture artifacts exist. If missing:

Architecture artifacts not found. Run /ctx-architecture first; this skill analyzes existing maps, it doesn't create them.
Load map-tracking.json to identify which modules have sufficient coverage (confidence >= 0.7). Low-confidence modules get flagged as "unanalyzed, risk unknown" rather than skipped.
If .context/DANGER-ZONES.md exists, load it as seed findings to extend, not as the complete picture.

Phase 1: Build the Attack Surface

Read architecture artifacts and identify mutation points: places where state changes, data transforms, or side effects occur. These are the attack surface for correctness failures.

For each module with confidence >= 0.7:

Read the DETAILED_DESIGN entry for the module
Identify:
- Shared mutable state (package-level vars, singletons, caches, registries)
- Concurrent access points (goroutines, channels, locks)
- External I/O boundaries (file, network, database)
- Error handling chains (where errors are caught, wrapped, or swallowed)
- Implicit ordering dependencies (init order, registration order, shutdown sequence)
- State machines and transition points
For each mutation point, read the actual source code, DETAILED_DESIGN summaries are not enough for failure analysis. You need to see the actual lock scope, the actual error check, the actual nil guard.

Phase 2: Adversarial Analysis

Apply each failure category systematically to the mutation points identified in Phase 1. For each category, ask: "How would a correctness chaos monkey make this fail silently?"

Every candidate finding must meet the evidence standard before it can be recorded:

Code path observed: Exact file, function, line range
Triggering precondition: What must be true for the failure to occur
Failure path: Step-by-step sequence from trigger to observable effect
Why it is silent: Why tests, logs, or monitoring miss it
Code evidence: The specific code pattern that supports the claim (the missing lock, the unchecked error, the unbounded loop)

If you cannot provide all five, the finding is a hypothesis, not a confirmed hazard. Label it accordingly (see Confidence below).

Category 1: Concurrency

Races: Shared state accessed from multiple goroutines without synchronization. Check: is every field of every shared struct protected? Are map reads concurrent-safe?
Deadlocks: Lock ordering violations, channel operations that can block forever
Goroutine leaks: Goroutines started but never joined, context cancellation not propagated

Category 2: Ordering Assumptions

Init order: Code that assumes packages initialize in a specific order. init() functions with side effects that depend on other packages' init()
Registration order: Slices or maps where order matters but insertion order isn't guaranteed
Shutdown sequence: Resources closed in wrong order, goroutines still running during cleanup

Category 3: Cache Staleness

TTL-less caches: In-memory caches with no expiration or invalidation
Read-your-writes: Code that writes then reads expecting the write to be visible, but a cache serves stale data
Cross-process: Caches that don't account for multiple instances or processes

Category 4: Fan-out Amplification

N+1 patterns: Loops that make one call per item instead of batching
Recursive expansion: Tree traversals that expand exponentially
Retry storms: Retries without backoff that amplify under load

Category 5: Ownership and Lifecycle

Orphaned resources: Objects created but never cleaned up (file handles, goroutines, temp files)
Double-close: Resources closed by multiple owners
Use-after-close: References held to closed resources
Force-delete orphans: Deletion that doesn't cascade to dependent resources

Category 6: Error Handling

Silent swallowing: _ = someFunc() or empty if err blocks
Error shadowing: Inner errors that mask outer context
Partial failure: Operations that succeed partially and leave inconsistent state
Panic recovery: recover() that catches too broadly and masks real bugs

Category 7: Scaling Cliffs

O(n^2) hidden in loops: Quadratic behavior that's fine for 10 items but kills at 10,000
Unbounded growth: Data structures that grow without limit (in-memory lists, log files, caches)
Single-threaded bottlenecks: Sequential operations that can't be parallelized
Global locks: Locks that serialize all operations across unrelated requests

Category 8: Idempotency Failures

Duplicate processing: At-least-once semantics without deduplication, causing double writes or double side effects
Retry-induced mutations: Retries that re-execute non-idempotent operations (increment counters, send emails, append to lists)
Missing idempotency keys: Operations that should be keyed but aren't, making replay indistinguishable from first execution

Category 9: State Machine and Invariant Drift

Illegal intermediate states: Objects that pass through states the system never validates (half-initialized structs, partially-migrated records)
Unvalidated transitions: State changes that skip validation (e.g., moving from "pending" to "complete" without passing through "running")
Cross-store inconsistency: State split across multiple stores (file + memory, database + cache) that can diverge after partial updates

Phase 3: Challenge Each Finding

Before a candidate finding is accepted, attempt to disprove it. For each candidate, explicitly check:

Is there existing locking or serialization that invalidates the race condition concern?
Is the cache intentionally immutable or write-once?
Is the shutdown ordering already coordinated by context cancellation or a shutdown hook?
Is the apparent N+1 actually bounded by a small constant?
Is the "missing" error check handled by a defer or a higher-level wrapper?
Does the test suite already cover this failure path?

If the challenge succeeds (the concern is invalid), drop the finding. If the challenge partially succeeds (the concern is mitigated but not eliminated), note the mitigation and adjust confidence downward.

This phase is mandatory. Skipping it produces smart fiction.

Phase 4: Quantify and Cross-Reference

For each surviving finding:

Blast radius: If this fails, what breaks?
- If a code-intelligence MCP is available (canonical: GitNexus's impact; equivalents include sourcegraph-cody), use it to get caller chains and dependency graphs
- Otherwise, estimate from the architecture dependency graph
Detection gap: Would existing tests catch this?
- Check test coverage for the affected code path
- Check if the failure mode is tested (not just the happy path)
Likelihood: How likely is this to trigger?
- Is the precondition common (every request) or rare (only under load)?
- Has this pattern caused issues before? (check git log for related fixes)
Risk score: Assign explicit scores on a 1-3 scale:
- Likelihood: 1 (rare) | 2 (uncommon) | 3 (common)
- Blast radius: 1 (localized) | 2 (module) | 3 (system)
- Detection gap: 1 (tested) | 2 (partially tested) | 3 (untested/silent)
- Total: Sum of all three (range 3-9)
- Critical: Total >= 7 AND failure is silent or cascading
- Elevated: All others
Cross-reference: If a web-search-with-citations MCP is available (canonical: Gemini Search; equivalents include Firecrawl, Exa, Tavily), search for the pattern in known incident databases, blog posts, and similar project post-mortems. This grounds findings in real-world evidence. If no such MCP is connected, fall back to built-in web search.

Phase 5: Write DANGER-ZONES.md

Write .context/DANGER-ZONES.md with findings ranked by risk score (highest first within each tier).

# Danger Zones

_Generated YYYY-MM-DD by /ctx-architecture-failure-analysis._
_Run after /ctx-architecture for full coverage._

## Critical (risk score >= 7, silent or cascading)

### DZ-1: [Location]: [Failure Mode]

**Category**: Concurrency | Ordering | Cache | Amplification
  | Ownership | Error Handling | Scaling | Idempotency
  | State Machine
**Confidence**: High | Medium | Low
**Risk score**: L:N + B:N + D:N = N
**Location**: `package/file.go:function`
**Failure mode**: What goes wrong and why
**Triggering precondition**: What must be true for this to fire
**Failure path**: Step-by-step from trigger to effect
**Blast radius**: What breaks when this fails
**Detection gap**: Why tests don't catch it
**Evidence**: The specific code pattern (with file:line)
**Suggested fix**: Concrete code change, not vague advice

### DZ-2: ...

## Elevated (risk score < 7 or non-silent)

### DZ-3: ...

## Unanalyzed Modules

Modules with coverage < 0.7 in map-tracking.json:
- `module/path` (confidence: 0.3): risk unknown

Confidence levels:

High: All five evidence fields present, challenge phase did not weaken the finding
Medium: Evidence is strong but triggering precondition is hard to verify statically (may need runtime proof)
Low: Plausible hypothesis based on code patterns, but could not be fully confirmed or disproved from static reading

DZ numbering: Sequential across the file. Stable across re-runs (findings for the same location keep their number).

Phase 6: Summary Report

Print a summary to the terminal:

## Failure Analysis Report

Modules analyzed: N (of M total)
Findings: N critical, M elevated
  High confidence: N
  Medium confidence: N
  Low confidence: N
Unanalyzed modules: K (coverage < 0.7)
Findings challenged and dropped: N

### Critical Findings
- DZ-1: [one-line summary] (L:N B:N D:N = N)
- DZ-2: [one-line summary] (L:N B:N D:N = N)

### Top Recommendation
[Single most impactful fix across all findings]

Relationship to Other Skills

Skill	Mode
`/ctx-architecture`	Map what exists
`/ctx-architecture-enrich`	Improve map fidelity
`/ctx-architecture-failure-analysis`	Generate falsifiable incident hypotheses
`ctx-threat-model` (future)	Security-focused analysis
`/ctx-architecture` P4	Surface danger zones noticed during mapping

The key distinction: P4 extracts danger zones that were noticed during mapping. This skill generates hypotheses and tests them against the code. P4 findings are observations; this skill's findings are tested claims.

Quality Checklist

Before writing DANGER-ZONES.md, verify:

name	ctx-architecture-failure-analysis
description	Adversarial failure analysis for codebases. Generates falsifiable incident hypotheses: race conditions, ordering assumptions, cache staleness, error swallowing, ownership gaps, idempotency failures, and scaling cliffs. Produces DANGER-ZONES.md with evidence-backed, ranked findings.
allowed-tools	Bash(ctx:), Bash(git:), Bash(go:), Read, Write, Edit, Glob, Grep, mcp__gitnexus__, mcp__gemini-search__*

name	ctx-architecture-failure-analysis
description	Adversarial failure analysis for codebases. Generates falsifiable incident hypotheses: race conditions, ordering assumptions, cache staleness, error swallowing, ownership gaps, idempotency failures, and scaling cliffs. Produces DANGER-ZONES.md with evidence-backed, ranked findings.
allowed-tools	Bash(ctx:), Bash(git:), Bash(go:), Read, Write, Edit, Glob, Grep, mcp__gitnexus__, mcp__gemini-search__*

ctx-architecture-failure-analysis

이 저장소의 다른 Skills

Design Principle

When to Use

When NOT to Use

Inputs

Process

Phase 0: Validate Prerequisites

Phase 1: Build the Attack Surface

Phase 2: Adversarial Analysis

Category 1: Concurrency

Category 2: Ordering Assumptions

Category 3: Cache Staleness

Category 4: Fan-out Amplification

Category 5: Ownership and Lifecycle

Category 6: Error Handling

Category 7: Scaling Cliffs

Category 8: Idempotency Failures

Category 9: State Machine and Invariant Drift

Phase 3: Challenge Each Finding

Phase 4: Quantify and Cross-Reference

Phase 5: Write DANGER-ZONES.md

Phase 6: Summary Report

Relationship to Other Skills

Quality Checklist

Design Principle

When to Use

When NOT to Use

Inputs

Process

Phase 0: Validate Prerequisites

Phase 1: Build the Attack Surface

Phase 2: Adversarial Analysis

Category 1: Concurrency

Category 2: Ordering Assumptions

Category 3: Cache Staleness

Category 4: Fan-out Amplification

Category 5: Ownership and Lifecycle

Category 6: Error Handling

Category 7: Scaling Cliffs

Category 8: Idempotency Failures

Category 9: State Machine and Invariant Drift

Phase 3: Challenge Each Finding

Phase 4: Quantify and Cross-Reference

Phase 5: Write DANGER-ZONES.md

Phase 6: Summary Report

Relationship to Other Skills

Quality Checklist

이 저장소의 다른 Skills