name

architecture-discipline

description

Use when designing/modifying system architecture or evaluating technology choices. Enforces 7-section TodoWrite with 22+ items. Triggers: "design architecture", "system design", "architectural decision", "should we use [tech]", "compare [A] vs [B]", "add new service", "microservices", "database choice", "API design", "scale to [X] users", "infrastructure decision". If thinking ANY of these, USE THIS SKILL: "quick recommendation is fine", "obvious choice", "we already know the answer", "just need to pick one", "simple architecture question".

Architecture Discipline

When to Use

Use when decisions affect:

Data models or schema changes
Service boundaries or new services
Deployment topology or infrastructure
Scale characteristics (10x growth implications)
Technology stack choices
API contracts or external integrations

Skip for:

Parameter tweaks or configuration changes
UI/styling changes
Localization or copy changes
Bug fixes within existing architecture
Adding fields to existing models (unless schema migration)

Threshold: If the decision could cause a 3+ month re-architecture project if wrong, use this skill.

CRITICAL: This Is Reasoning Discipline, Not a Checklist

The 7 sections are a reasoning sequence, not boxes to check:

Alternatives BEFORE choosing
Scale requirements BEFORE design
Failure modes built in, not bolted on

🚨 If you wrote architecture without starting with all 7 sections: DELETE and restart. Retrofitting analysis is rationalization, not evaluation.

MANDATORY FIRST STEP

CREATE TodoWrite with these 7 sections (22+ items total):

Section	Minimum Items
Scale Analysis	4+
Architectural Options	3+
Ripple Effect Analysis	5+
Failure Modes	3+
Observability	3+
Documentation	2+
Migration/Compatibility	2+

Do not design, propose solutions, or implement until TodoWrite is verified.

Verification Checkpoint

After creating TodoWrite, verify 3 random items pass this test:

Each item must have ALL THREE:

✓ Concrete numbers/thresholds ("100K users", "$500/mo", "P95 < 500ms")
✓ Specific tools/technologies ("PostgreSQL", "Redis", "CloudWatch")
✓ Measurable outcome ("handles 1M req/sec", "costs $X at 10x")

❌ FAILS	✅ PASSES
"Add monitoring"	"CloudWatch: `websocket.connections.active`, alert if >5% error rate via PagerDuty"
"Evaluate caching"	"Compare: Redis (1ms, $300/mo) vs In-memory LRU (0.1ms, $0) vs No cache (100ms)"
"Analyze scale"	"Current: 100K DAU, 50 req/sec. 10x: 1M users, 500 req/sec. Bottleneck: PostgreSQL connection pool"

DO NOT PROCEED until 22+ items AND quality check passes.

Section Requirements

1. Scale Analysis (4+ items)

NEVER design for current scale only. Before proposing any solution:

Current scale: Users (DAU/MAU), requests/sec, data volume, read/write ratio
10x scale: What numbers at 10x? When expected?
Bottlenecks: What breaks at 10x? (DB connections, API limits, memory)
Mitigation: Specific solution for each bottleneck

2. Architectural Options (3+ items)

NEVER present single solution. Minimum 3 distinct options, each with:

Performance: Latency (P50/P95/P99), throughput, scale limit
Complexity: LOC estimate, services involved, operational burden
Cost: Infrastructure ($X/mo current, $Y/mo at 10x), development (engineer-weeks)
Trade-offs: Specific advantages (✅) and disadvantages (❌)

If stakeholder suggests solution: Add as Option A, evaluate with SAME rigor as alternatives.

3. Ripple Effect Analysis (5+ items)

Changes propagate across layers. Analyze ALL:

Data layer: Schema changes, migrations, indexes, query performance
Services: Which need updates? API contracts changed?
API: Breaking changes? Version bump? Backward compatibility?
Clients: Mobile updates? Web UI changes?
Operations: Deployment changes? New monitoring? Cost changes?

4. Failure Modes (3+ items)

For each mode:

Scenario: [Component] fails because [reason]
Detection: How we know (metrics drop, error rate spike)
Impact: What breaks (user features, data integrity)
Mitigation: Circuit breaker, fallback, redundancy

5. Observability (3+ items)

Metrics: Specific (latency P95, error rate %, throughput)
Alerts: Conditions (error rate > 5%, latency P95 > 500ms)
Dashboards: Key visualizations

6. Documentation (2+ items)

ADR: Chosen option, rejected alternatives, trade-offs, constraints
Diagram: New components, data flows, failure paths

7. Migration/Compatibility (2+ items)

Backward compatibility: Old clients work? API versioning?
Migration path: Phased rollout, feature flags, rollback procedure

Red Flags - STOP When You Think:

Thought	Reality
"Analysis paralysis"	This IS the analysis that prevents expensive mistakes
"We'll add scale/alternatives/failure modes later"	Retrofitting costs 5-10x more
"CTO already decided"	Still needs independent evaluation
"Being pragmatic not dogmatic"	These requirements ARE pragmatic
"Just a simple feature"	Simple becomes complex at scale
"We already know the solution"	Compare 3 alternatives first
"Keep it simple"	Simple for current scale = complex re-architecture at 10x
"I can add missing sections to existing work"	DELETE and restart

Override Requirements

To skip ANY requirement, you MUST provide ALL 4:

Specific retrofit date (not "later")
Budget allocated (engineer-weeks)
Risk acceptance signed by decision maker
Interim mitigation plan

Skipped	Risk	Cost
Scale Analysis	Re-architecture in 6-12 months	3-6 month project, 5-10x cost
Alternatives	Optimize wrong dimension	2-4 month migration
Failure Modes	Production incidents	$5-50K per incident
Ripple Effects	Broken clients, data issues	Deployment failures

Verification Before Complete

Category	Requirements
Scale	✓ Current + 10x projected ✓ Bottlenecks ✓ Mitigations
Trade-offs	✓ 3+ options ✓ Performance/complexity/cost ✓ Rationale
Impact	✓ All layers analyzed ✓ Breaking changes identified
Failure	✓ Specific modes ✓ Detection ✓ Mitigation ✓ Rollback
Documentation	✓ ADR ✓ Diagram updated

If any item missing, do not proceed to implementation.

name

architecture-discipline

description

Architecture Discipline

When to Use

Use when decisions affect:

Data models or schema changes
Service boundaries or new services
Deployment topology or infrastructure
Scale characteristics (10x growth implications)
Technology stack choices
API contracts or external integrations

Skip for:

Parameter tweaks or configuration changes
UI/styling changes
Localization or copy changes
Bug fixes within existing architecture
Adding fields to existing models (unless schema migration)

Threshold: If the decision could cause a 3+ month re-architecture project if wrong, use this skill.

CRITICAL: This Is Reasoning Discipline, Not a Checklist

The 7 sections are a reasoning sequence, not boxes to check:

Alternatives BEFORE choosing
Scale requirements BEFORE design
Failure modes built in, not bolted on

🚨 If you wrote architecture without starting with all 7 sections: DELETE and restart. Retrofitting analysis is rationalization, not evaluation.

MANDATORY FIRST STEP

CREATE TodoWrite with these 7 sections (22+ items total):

Section	Minimum Items
Scale Analysis	4+
Architectural Options	3+
Ripple Effect Analysis	5+
Failure Modes	3+
Observability	3+
Documentation	2+
Migration/Compatibility	2+

Do not design, propose solutions, or implement until TodoWrite is verified.

Verification Checkpoint

After creating TodoWrite, verify 3 random items pass this test:

Each item must have ALL THREE:

✓ Concrete numbers/thresholds ("100K users", "$500/mo", "P95 < 500ms")
✓ Specific tools/technologies ("PostgreSQL", "Redis", "CloudWatch")
✓ Measurable outcome ("handles 1M req/sec", "costs $X at 10x")

❌ FAILS	✅ PASSES
"Add monitoring"	"CloudWatch: `websocket.connections.active`, alert if >5% error rate via PagerDuty"
"Evaluate caching"	"Compare: Redis (1ms, $300/mo) vs In-memory LRU (0.1ms, $0) vs No cache (100ms)"
"Analyze scale"	"Current: 100K DAU, 50 req/sec. 10x: 1M users, 500 req/sec. Bottleneck: PostgreSQL connection pool"

DO NOT PROCEED until 22+ items AND quality check passes.

Section Requirements

1. Scale Analysis (4+ items)

NEVER design for current scale only. Before proposing any solution:

Current scale: Users (DAU/MAU), requests/sec, data volume, read/write ratio
10x scale: What numbers at 10x? When expected?
Bottlenecks: What breaks at 10x? (DB connections, API limits, memory)
Mitigation: Specific solution for each bottleneck

2. Architectural Options (3+ items)

NEVER present single solution. Minimum 3 distinct options, each with:

Performance: Latency (P50/P95/P99), throughput, scale limit
Complexity: LOC estimate, services involved, operational burden
Cost: Infrastructure ($X/mo current, $Y/mo at 10x), development (engineer-weeks)
Trade-offs: Specific advantages (✅) and disadvantages (❌)

If stakeholder suggests solution: Add as Option A, evaluate with SAME rigor as alternatives.

3. Ripple Effect Analysis (5+ items)

Changes propagate across layers. Analyze ALL:

Data layer: Schema changes, migrations, indexes, query performance
Services: Which need updates? API contracts changed?
API: Breaking changes? Version bump? Backward compatibility?
Clients: Mobile updates? Web UI changes?
Operations: Deployment changes? New monitoring? Cost changes?

4. Failure Modes (3+ items)

For each mode:

Scenario: [Component] fails because [reason]
Detection: How we know (metrics drop, error rate spike)
Impact: What breaks (user features, data integrity)
Mitigation: Circuit breaker, fallback, redundancy

5. Observability (3+ items)

Metrics: Specific (latency P95, error rate %, throughput)
Alerts: Conditions (error rate > 5%, latency P95 > 500ms)
Dashboards: Key visualizations

6. Documentation (2+ items)

ADR: Chosen option, rejected alternatives, trade-offs, constraints
Diagram: New components, data flows, failure paths

7. Migration/Compatibility (2+ items)

Backward compatibility: Old clients work? API versioning?
Migration path: Phased rollout, feature flags, rollback procedure

Red Flags - STOP When You Think:

Thought	Reality
"Analysis paralysis"	This IS the analysis that prevents expensive mistakes
"We'll add scale/alternatives/failure modes later"	Retrofitting costs 5-10x more
"CTO already decided"	Still needs independent evaluation
"Being pragmatic not dogmatic"	These requirements ARE pragmatic
"Just a simple feature"	Simple becomes complex at scale
"We already know the solution"	Compare 3 alternatives first
"Keep it simple"	Simple for current scale = complex re-architecture at 10x
"I can add missing sections to existing work"	DELETE and restart

Override Requirements

To skip ANY requirement, you MUST provide ALL 4:

Specific retrofit date (not "later")
Budget allocated (engineer-weeks)
Risk acceptance signed by decision maker
Interim mitigation plan

Skipped	Risk	Cost
Scale Analysis	Re-architecture in 6-12 months	3-6 month project, 5-10x cost
Alternatives	Optimize wrong dimension	2-4 month migration
Failure Modes	Production incidents	$5-50K per incident
Ripple Effects	Broken clients, data issues	Deployment failures

Verification Before Complete

Category	Requirements
Scale	✓ Current + 10x projected ✓ Bottlenecks ✓ Mitigations
Trade-offs	✓ 3+ options ✓ Performance/complexity/cost ✓ Rationale
Impact	✓ All layers analyzed ✓ Breaking changes identified
Failure	✓ Specific modes ✓ Detection ✓ Mitigation ✓ Rollback
Documentation	✓ ADR ✓ Diagram updated

If any item missing, do not proceed to implementation.

export default architecture-discipline

Architecture Discipline

When to Use

CRITICAL: This Is Reasoning Discipline, Not a Checklist

MANDATORY FIRST STEP

Verification Checkpoint

Section Requirements

1. Scale Analysis (4+ items)

2. Architectural Options (3+ items)

3. Ripple Effect Analysis (5+ items)

4. Failure Modes (3+ items)

5. Observability (3+ items)

6. Documentation (2+ items)

7. Migration/Compatibility (2+ items)

Red Flags - STOP When You Think:

Override Requirements

Verification Before Complete

Architecture Discipline

When to Use

CRITICAL: This Is Reasoning Discipline, Not a Checklist

MANDATORY FIRST STEP

Verification Checkpoint

Section Requirements

1. Scale Analysis (4+ items)

2. Architectural Options (3+ items)

3. Ripple Effect Analysis (5+ items)

4. Failure Modes (3+ items)

5. Observability (3+ items)

6. Documentation (2+ items)

7. Migration/Compatibility (2+ items)

Red Flags - STOP When You Think:

Override Requirements

Verification Before Complete