원클릭으로
kubernetes-debug
// Kubernetes debugging methodology and scripts. Use for pod crashes, CrashLoopBackOff, OOMKilled, deployment issues, resource problems, or container failures.
// Kubernetes debugging methodology and scripts. Use for pod crashes, CrashLoopBackOff, OOMKilled, deployment issues, resource problems, or container failures.
AWS service troubleshooting patterns. Use for EC2, ECS, Lambda, CloudWatch, RDS issues.
Search and read Confluence documentation. Use when looking for internal docs, knowledge base articles, runbooks, or team documentation stored in Confluence.
PostgreSQL database inspection and queries. Use when investigating table schemas, running queries, checking locks, replication status, or long-running queries.
Correlate incidents with recent deployments and code changes. Use when investigating if a deployment caused an issue, finding what changed, or identifying the commit that introduced a bug.
GitHub code search, file reading, PR review, branch/file management, and commit operations. Use when you need to search code patterns, read repository files, review pull requests, create branches, commit files, or open PRs.
Jira issue tracking and project management. Use for creating, searching, updating, and commenting on Jira issues. Supports JQL queries for advanced searching.
| name | kubernetes-debug |
| description | Kubernetes debugging methodology and scripts. Use for pod crashes, CrashLoopBackOff, OOMKilled, deployment issues, resource problems, or container failures. |
| allowed-tools | Bash(python *) |
ALWAYS check pod events BEFORE logs. Events explain 80% of issues faster:
All scripts are in .claude/skills/infrastructure-kubernetes/scripts/
python .claude/skills/infrastructure-kubernetes/scripts/list_pods.py -n <namespace> [--label <selector>]
# Examples:
python .claude/skills/infrastructure-kubernetes/scripts/list_pods.py -n default
python .claude/skills/infrastructure-kubernetes/scripts/list_pods.py -n default --label app=myapp
python .claude/skills/infrastructure-kubernetes/scripts/get_events.py <pod-name> -n <namespace>
# Example:
python .claude/skills/infrastructure-kubernetes/scripts/get_events.py my-pod-7f8b9c6d5-x2k4m -n default
python .claude/skills/infrastructure-kubernetes/scripts/get_logs.py <pod-name> -n <namespace> [--tail N] [--container NAME]
# Examples:
python .claude/skills/infrastructure-kubernetes/scripts/get_logs.py my-pod-7f8b9c6d5-x2k4m -n default --tail 100
python .claude/skills/infrastructure-kubernetes/scripts/get_logs.py my-pod-7f8b9c6d5-x2k4m -n default --container mycontainer
python .claude/skills/infrastructure-kubernetes/scripts/describe_pod.py <pod-name> -n <namespace>
python .claude/skills/infrastructure-kubernetes/scripts/get_resources.py <pod-name> -n <namespace>
list_pods.py - Check pod statusget_events.py - Look for scheduling/pull/crash eventsdescribe_pod.py - Check conditions and container statesget_logs.py - Only if events don't explainget_events.py - Check for OOMKilled or error eventsget_resources.py - Compare usage vs limitsget_logs.py - Check for errors before crashdescribe_pod.py - Check restart count and statelist_pods.py - Find stuck podsget_events.py - Check events on stuck podsdescribe_pod.py - Check conditions for cluesget_resources.py - Check if resource constraints are blocking| Event Reason | Meaning | Action |
|---|---|---|
| OOMKilled | Container exceeded memory limit | Increase limits or fix memory leak |
| ImagePullBackOff | Can't pull image | Check image name, registry auth |
| CrashLoopBackOff | Container keeps crashing | Check logs for startup errors |
| FailedScheduling | No node can run pod | Check node resources, taints |
| Unhealthy | Liveness probe failed | Check probe config, app health |
When reporting findings, use this structure:
## Kubernetes Analysis
**Pod**: <name>
**Namespace**: <namespace>
**Status**: <phase> (Restarts: N)
### Events
- [timestamp] <reason>: <message>
### Issues Found
1. [Issue description with evidence]
### Root Cause Hypothesis
[Based on events and logs]
### Recommended Action
[Specific remediation step]