Run any Skill in Manus with one click

$pwd:

chaos-testing

Name: Chaos Testing
Author: irahardianto

// Controlled failure injection: hypothesis design, blast radius control, safety mechanisms, game day planning, and resilience verification.

Run Skill in Manus

$ git log --oneline --stat

stars:146

forks:47

updated:May 25, 2026 at 01:39

SKILL.md

readonly

name	chaos-testing
description	Controlled failure injection: hypothesis design, blast radius control, safety mechanisms, game day planning, and resilience verification.

Chaos Testing Principles

Controlled failure injection to build confidence in system resilience.

When to Invoke

Verifying system resilience before production deployment
Designing game day exercises
Testing circuit breakers, retries, and failover
Validating disaster recovery plans

Methodology

1. Define Steady State

Identify measurable indicators of normal system behavior:

Request success rate ≥ 99.9%
P99 latency < 500ms
Error rate < 0.1%

2. Form Hypothesis

"When [failure condition], the system will [expected behavior] because [mechanism]."

Example: "When database primary fails, the system will failover to replica within 30s because of automatic failover configuration."

3. Design Experiment

Element	Description
Target	Which component to perturb
Failure mode	What kind of failure (latency, crash, partition)
Blast radius	Scope of impact (single instance, AZ, region)
Duration	How long the failure persists
Abort criteria	When to immediately stop the experiment
Rollback plan	How to restore normal operation

4. Execute

Start with smallest blast radius
Monitor continuously during experiment
Have rollback ready at all times
Stop immediately if abort criteria met

5. Analyze & Learn

Did system behave as hypothesized?
What broke unexpectedly?
What recovery mechanisms worked/failed?
Document findings and action items

Failure Injection Types

Type	Examples
Process	Kill process, OOM, CPU spike
Network	Latency injection, packet loss, partition
Infrastructure	Instance termination, AZ failure, disk full
Application	Exception injection, slow dependency, config error
Data	Corrupt cache, stale data, schema mismatch

Safety Mechanisms (Non-Negotiable)

Abort button — immediate experiment termination capability
Blast radius limits — never affect >5% of production traffic initially
Time-boxed — experiments have maximum duration
Monitoring — real-time dashboards during experiments
Business hours only — no chaos experiments during peak or off-hours
Stakeholder communication — relevant teams informed before experiments

Game Day Planning

Checklist

Monitoring and Alerting Principles @.agents/rules/monitoring-and-alerting-principles.md
Incident Response @.agents/skills/incident-response/SKILL.md
Error Handling Principles .agents/rules/error-handling-principles.md

related-skills.json

same repository

omni.md

from "irahardianto/awesome-agv"

Token-efficient communication protocol. Activate ONLY when: (1) user explicitly requests it (e.g., "use omni", "be concise", "compress output"), (2) dispatched as a sub-agent in /workflow-team pipelines where token budget matters, or (3) agent-to-agent communication via /omni headless modifier. Never activate by default in normal conversations — users expect natural language responses unless they opt in. Compresses prose form while preserving 100% technical accuracy. Code blocks, tool calls, file paths, and data are NEVER compressed.

2026-05-25146

code-review.md

from "irahardianto/awesome-agv"

Structured code review protocol for inspecting code quality against the full rule set. Use when auditing code written by yourself or another agent, during the /audit workflow, or when the user asks for a code review.

2026-05-25146

guardrails.md

from "irahardianto/awesome-agv"

Pre-flight checklist and post-implementation self-review protocol. Use before generating any code (pre-flight) and after writing code but before verification (self-review) to catch issues early.

2026-05-25146

debugging-protocol.md

from "irahardianto/awesome-agv"

Comprehensive protocol for validating root causes of software issues. Use when you need to systematically debug a complex bug, flaky test, or unknown system behavior by forming hypotheses and validating them with specific tasks.

2026-05-25146

perf-optimization.md

from "irahardianto/awesome-agv"

Profile-driven performance optimization protocol. Use when profiling data (CPU, heap, trace) is available or when the user requests performance analysis. Covers methodology, pattern catalog, safety invariants, and when-to-stop heuristics. Language-specific tooling is in languages/*.md.

2026-05-25146

adr.md

from "irahardianto/awesome-agv"

Architecture Decision Record skill for documenting significant architectural decisions with context, options, and consequences. Use during the Research phase when choosing between approaches, or whenever the user asks to document an architectural decision.

2026-05-25146

package.json

"author": "irahardianto"

"repository": "irahardianto/awesome-agv"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software Quality Assurance Analysts and TestersComputer and Mathematical Occupations15-1253L4

name	chaos-testing
description	Controlled failure injection: hypothesis design, blast radius control, safety mechanisms, game day planning, and resilience verification.

Chaos Testing Principles

Controlled failure injection to build confidence in system resilience.

When to Invoke

Verifying system resilience before production deployment
Designing game day exercises
Testing circuit breakers, retries, and failover
Validating disaster recovery plans

Methodology

1. Define Steady State

Identify measurable indicators of normal system behavior:

Request success rate ≥ 99.9%
P99 latency < 500ms
Error rate < 0.1%

2. Form Hypothesis

"When [failure condition], the system will [expected behavior] because [mechanism]."

Example: "When database primary fails, the system will failover to replica within 30s because of automatic failover configuration."

3. Design Experiment

Element	Description
Target	Which component to perturb
Failure mode	What kind of failure (latency, crash, partition)
Blast radius	Scope of impact (single instance, AZ, region)
Duration	How long the failure persists
Abort criteria	When to immediately stop the experiment
Rollback plan	How to restore normal operation

4. Execute

Start with smallest blast radius
Monitor continuously during experiment
Have rollback ready at all times
Stop immediately if abort criteria met

5. Analyze & Learn

Did system behave as hypothesized?
What broke unexpectedly?
What recovery mechanisms worked/failed?
Document findings and action items

Failure Injection Types

Type	Examples
Process	Kill process, OOM, CPU spike
Network	Latency injection, packet loss, partition
Infrastructure	Instance termination, AZ failure, disk full
Application	Exception injection, slow dependency, config error
Data	Corrupt cache, stale data, schema mismatch

Safety Mechanisms (Non-Negotiable)

Abort button — immediate experiment termination capability
Blast radius limits — never affect >5% of production traffic initially
Time-boxed — experiments have maximum duration
Monitoring — real-time dashboards during experiments
Business hours only — no chaos experiments during peak or off-hours
Stakeholder communication — relevant teams informed before experiments

Game Day Planning

Checklist

Monitoring and Alerting Principles @.agents/rules/monitoring-and-alerting-principles.md
Incident Response @.agents/skills/incident-response/SKILL.md
Error Handling Principles .agents/rules/error-handling-principles.md

chaos-testing

Chaos Testing Principles

When to Invoke

Methodology

1. Define Steady State

2. Form Hypothesis

3. Design Experiment

4. Execute

5. Analyze & Learn

Failure Injection Types

Safety Mechanisms (Non-Negotiable)

Game Day Planning

Checklist

Related

Chaos Testing Principles

When to Invoke

Methodology

1. Define Steady State

2. Form Hypothesis

3. Design Experiment

4. Execute

5. Analyze & Learn

Failure Injection Types

Safety Mechanisms (Non-Negotiable)

Game Day Planning

Checklist

Related

chaos-testing

Chaos Testing Principles

When to Invoke

Methodology

1. Define Steady State

2. Form Hypothesis

3. Design Experiment

4. Execute

5. Analyze & Learn

Failure Injection Types

Safety Mechanisms (Non-Negotiable)

Game Day Planning

Checklist

Related

More from this repository

Chaos Testing Principles

When to Invoke

Methodology

1. Define Steady State

2. Form Hypothesis

3. Design Experiment

4. Execute

5. Analyze & Learn

Failure Injection Types

Safety Mechanisms (Non-Negotiable)

Game Day Planning

Checklist

Related

More from this repository