| name | perrow-system-safety |
| description | Apply Charles Perrow's Normal Accident Theory to analyze complex systems for interactive complexity, tight coupling, and accident potential. Use when the user wants to review system architecture, diagnose incidents, assess risk in high-risk systems, or invokes /perrow-system-safety. |
| license | MIT |
| metadata | {"author":"Mason-1011","compatibility":"claude-code"} |
Normal Accident System Safety
Based on Charles Perrow's Normal Accidents: Living with High-Risk Technologies (1984). Apply these frameworks when analyzing systems for safety, diagnosing incidents, or reviewing architecture.
Core Thesis
In systems with interactive complexity AND tight coupling, accidents are inevitable ("normal"). They arise from unanticipated interactions of multiple small failures, not from single catastrophic failures.
The Interaction/Coupling (I/C) Chart
Classify any system on two axes to assess accident potential:
INTERACTION
Linear ◄─────────► Complex
┌─────────────────┬─────────────────┐
Loose │ Quadrant I │ Quadrant II │
COUPLING │ Low risk │ Moderate risk │
│ e.g. Auto │ e.g. University│
│ manufacturing │ research lab │
├─────────────────┼─────────────────┤
Tight │ Quadrant III │ Quadrant IV │
│ Moderate risk │ HIGHEST RISK │
│ e.g. Hydro │ e.g. Nuclear │
│ electric dam │ power, chemical│
│ │ plants, aviation│
└─────────────────┴─────────────────┘
Quadrant IV systems (complex + tight) are where normal accidents occur. Most high-profile disasters live here.
Interactive Complexity — Assessment Checklist
A system has high interactive complexity if it has:
| Factor | Complex (score +1) | Linear (score 0) |
|---|
| Proximity | Components in close proximity, shorting possible | Spatially segregated subsystems |
| Common-mode connections | Shared connections that couple subsystems | Dedicated, independent connections |
| Interconnected subsystems | Tight interconnections between subsystems | Subsystems can operate independently |
| Substitutability | Limited ability to substitute components or roles | Easy to substitute supplies/equipment/personnel |
| Feedback loops | Multiple, unfamiliar feedback loops | Few feedback loops, all understood |
| Controls | Multiple controls per function | Single-purpose controls |
| Information | Indirect, inferred information | Direct, observable information |
| Understanding | Unfamiliar, unexpected sequences | Well-understood production sequences |
Scoring: 0-3 = Linear; 4-6 = Moderately Complex; 7-8 = Highly Complex
Tight Coupling — Assessment Checklist
A system is tightly coupled if:
| Factor | Tight (score +1) | Loose (score 0) |
|---|
| Timing | Delays not possible, must proceed on schedule | Can delay, hold, or pause processes |
| Sequence | Invariant sequence required | Multiple valid sequences possible |
| Methods | Only one way to achieve goal | Multiple methods available |
| Slack | No slack/buffer between components | Some buffer exists between stages |
| Buffers | No designed-in buffers | Fortuitous buffers can absorb failures |
Scoring: 0-2 = Loose; 3 = Moderate; 4-5 = Tight
DEPOSE Failure Analysis Framework
When diagnosing any failure or incident, systematically examine six components:
- Design — Was the system designed with inadequate safety margins? Are safety features built in but flawed?
- Equipment — Did hardware/components fail? Were redundancies present but defeated?
- Procedures — Were established procedures followed? Were they adequate for the actual conditions?
- Operators — Did human error contribute? (But: "human error" is often a symptom, not a cause)
- Supplies & Materials — Were materials substandard? Were consumables adequate?
- Environment — What environmental conditions contributed? (Weather, terrain, temperature, etc.)
DEPOSE Analysis Process
- List all observed failures across all six DEPOSE categories
- Identify which failures were independent vs interacting
- Map the interaction sequence — which failures combined to produce the accident?
- Determine if this was a system accident (unanticipated interactions) or component failure accident (single anticipated failure)
- Ask: were there designed-in safety features that were themselves defeated by the interactions?
System Accident vs Component Failure
| Characteristic | System Accident | Component Failure |
|---|
| Number of failures | Multiple, interacting | Single point of failure |
| Predictability | Unanticipated sequences | Expected failure mode |
| Root cause | Interactive complexity + tight coupling | Design flaw, wear, operator error |
| Post-incident blame | Often misattributed to "operator error" | Correctly attributed to component |
| Prevention | Redesign for loose coupling / linearity | Redundancy, maintenance, training |
Error-Inducing Systems
Some systems are structurally configured to induce errors and defeat attempts at error reduction. Key indicators:
- Blame inversion: Victims are low-status, unorganized, or anonymous; perpetrators face no accountability
- Regulatory fragmentation: International/multi-jurisdictional systems with weak enforcement (e.g., flags of convenience)
- Production pressure dominance: Safety is subordinated to schedule/economic pressure at every level
- Information asymmetry: Operators have poor information; surveillance is inadequate or absent
- Authority concentration: Single-person authority with insufficient checks (the "captain" problem)
- Safety feature defeat: Designed-in safety features are bypassed, degraded, or rendered ineffective by the system's own structure
- Error normalization: Frequent small failures become invisible through familiarity ("the traditions of the sea")
Key insight: In error-inducing systems, improving any single component may be inconsequential because other components will be allowed to express more risk. Only wholesale reconfiguration can make the system error-avoiding.
System Type Classification
Perrow distinguishes three types of systems by how they transform inputs:
| Type | Description | Risk Profile |
|---|
| Transformation | Changes the nature of materials (nuclear, chemical) | Highest risk — poorly understood dynamics, unobservable processes |
| Fabricating | Assembles components into products (manufacturing) | Lower risk — linear, well-understood sequences |
| Additive | Moves things without changing them (transport) | Variable — depends on complexity/coupling |
Aviation is largely additive but becomes transformational at high speed/altitude ("exceeding the buffet boundary"), where it takes on the dangerous characteristics of transformation systems.
Production Pressure Analysis
Production pressures are a systemic force, not individual greed. Assess:
- Schedule pressure: Are operators incentivized to meet timelines at the expense of safety?
- Economic pressure: Does the organization profit from tighter coupling or reduced margins?
- Regulatory capture: Does the industry influence its own regulators?
- Risk homeostasis: Do safety improvements get consumed by increased risk-taking rather than increased safety? (e.g., better brakes → drivers go faster)
- Blame patterns: Does the organization attribute systemic failures to individual "operator error"?
- Insurance failure: Does insurance actually penalize risky behavior, or are costs passed to consumers?
- International fragmentation: Can organizations evade regulation through flags of convenience, multiple jurisdictions?
Practical Application: System Design Review
When reviewing any system (software architecture, organizational process, physical plant):
- Classify on I/C chart — Where does it fall? If Quadrant IV, it needs special attention.
- Run DEPOSE checklist — Identify weak points across all six categories.
- Look for linearization opportunities — Can subsystems be decoupled? Can sequences be made flexible?
- Look for coupling reduction — Can buffers, slack, or delays be introduced?
- Check for automation paradoxes — Does automation reduce workload but increase consequence of failure when humans must intervene?
- Assess error-inducing features — Does the system's structure encourage risky behavior?
- Evaluate redundancy independence — Are backup systems truly independent, or can a single event defeat multiple "redundant" systems?
Practical Application: Incident Diagnosis
When an incident occurs:
- Don't stop at "operator error" — Ask what system conditions made the error likely or inevitable
- Map all concurrent failures — Use DEPOSE to identify every contributing factor
- Trace the interaction chain — How did independent failures combine in unexpected ways?
- Identify defeated safety features — Were there safeguards that should have caught this but didn't?
- Check for tight coupling indicators — Could the cascade have been halted? Were there buffers?
- Look for organizational factors — Production pressures, training gaps, regulatory failures
Policy Framework: System Classification for Action
Perrow proposes classifying high-risk systems along two dimensions: net catastrophic potential and cost of alternatives. This yields three action categories:
Category 1: Abandon (High Risk + Low Alternative Cost)
Systems where inevitable risks outweigh reasonable benefits:
- Nuclear power (complex + tightly coupled transformation system, no organizational fix possible)
- Nuclear weapons (catastrophic potential + arms race dynamics)
Category 2: Constrain & Improve (Moderate-High Risk + High Alternative Value)
Systems we cannot easily do without, but where risks should be reduced:
- Marine transport (error-inducing system, needs wholesale restructuring)
- Recombinant DNA (enormous potential benefits, but needs strict controls)
Category 3: Self-Correcting with Modest Effort (Lower Risk + Existing Self-Correction)
Systems that are partially self-correcting and can be further improved:
- Chemical plants (mostly linear, loosely coupled)
- Airliners and air traffic control (strong safety culture, self-reporting systems like ASRS)
- Mining, fossil fuel plants, highway/automobile safety
Risk Assessment Critique
Perrow argues that conventional risk assessment is flawed because it:
- Monetizes social goods and human life ($300,000 per life in 1984)
- Treats 50 scattered highway deaths as equivalent to 50 deaths in one catastrophe
- Ignores the difference between voluntary risks (skiing) and imposed risks (nuclear plants)
- Fails to account for third-party and fourth-party victims (innocent bystanders + future generations)
- Is dominated by industry-funded researchers with a vested interest
Key insight: "Ultimately, the issue is not risk, but power; the power to impose risks on the many for the benefit of the few."
Key Quotes for Context
"In tightly coupled systems, failures can cascade rapidly, and there is little slack or buffer to absorb them."
"Complex systems produce more unfamiliar sequences than are actually displayed in any given accident."
"Operator error is a convenient catch-all for mishaps whose real cause is uncertain, complex, or embarrassing to the system."
"There is no organizational structure that we would or should tolerate that could prevent [system accidents in nuclear power]."
"The issue is not risk, but power; the power to impose risks on the many for the benefit of the few."
References
- Perrow, C. (1984). Normal Accidents: Living with High-Risk Technologies. Basic Books.
- Case studies covered: Nuclear power (TMI, Fermi, Dresden), Petrochemical (Bhopal, Texas City, Flixborough), Aviation (DC-10, Orange County, ATC), Marine transport, Space (Mercury, Apollo), DNA research