with one click
kubernetes-debugging
Kubernetes debugging for pod failures and networking.
Menu
Kubernetes debugging for pod failures and networking.
Detect documentation drift against filesystem state.
Learning system interface: stats, search, graduate learnings. Backed by learning.db (SQLite + FTS5).
Structured multi-phase workflows: review, debug, refactor (tidy up, clean up, untangle messy code, reorganize without changing behaviour), deploy, create, research, and more.
People operations workflows — recruiting pipeline, performance reviews, compensation analysis, offer drafting, interview prep, onboarding, org planning. Use when managing hiring pipelines, writing performance reviews, analyzing compensation, drafting offers, or planning organizational changes.
Detect stale TODOs, unused imports, and dead code.
Unified voice content generation pipeline with mandatory validation and joy-check. 13-phase pipeline: LOAD, GROUND, STATS-CHECKPOINT, GENERATE, HOOK-GATE, VALIDATE, REFINE, VARIETY-GATE, JOY-CHECK, ANTI-AI, CLOSE-GATE, OUTPUT, CLEANUP. Use when writing articles, blog posts, or any content that uses a voice profile. Use for "write article", "blog post", "write in voice", "generate content", "draft article", "write about".
| name | kubernetes-debugging |
| description | Kubernetes debugging for pod failures and networking. |
| user-invocable | false |
| context | fork |
| agent | kubernetes-helm-engineer |
| routing | {"triggers":["kubernetes debug","pod failure","pod crashloop","kubectl logs","OOMKilled","pod pending"],"category":"kubernetes","pairs_with":["kubernetes-security","service-health-check"]} |
Systematic diagnosis of pod failures, networking issues, and resource problems using a structured triage flow: describe, logs, events, exec.
| Signal | Reference | Size |
|---|---|---|
| CrashLoopBackOff, OOMKilled, config error, health check, liveness probe, ImagePullBackOff, image pull, registry auth, Pending, FailedScheduling, node affinity, taint, PVC | references/crash-diagnosis.md | ~140 lines |
| service resolution, DNS, nslookup, CoreDNS, port-forward, NetworkPolicy, ingress, egress | references/network-debugging.md | ~50 lines |
| CPU throttling, memory limit, OOMKill, ephemeral storage, DiskPressure, debug container, distroless, kubectl reference, rollout, exec | references/resource-debugging.md | ~100 lines |
Load greedily. If the user's question touches any signal keyword, load the matching reference before responding. Multiple signals matching = load all matching references.
Follow this sequence for every pod or workload issue. Do not skip steps -- many failures (scheduling, image pull, volume mount) are only visible in events and describe output, not in logs, so jumping straight to logs misses them.
Always specify -n <namespace> explicitly in every command; never rely on the default context namespace, because the wrong namespace silently returns empty or misleading results.
# 1. Get an overview of the resource state
kubectl get pods -n <namespace> -o wide
# 2. Describe the resource for events, conditions, and status
kubectl describe pod <pod-name> -n <namespace>
# 3. Check current container logs
kubectl logs <pod-name> -n <namespace> -c <container-name>
# 4. Check previous container logs (critical for CrashLoopBackOff)
# Always check --previous before current logs for crashed containers,
# because deleting or restarting the pod destroys these logs permanently.
kubectl logs <pod-name> -n <namespace> -c <container-name> --previous
# 5. Check namespace events sorted by time
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
# 6. If the container is running, exec in for live inspection
kubectl exec -it <pod-name> -n <namespace> -c <container-name> -- /bin/sh
Use read-only commands (describe, logs, get) to gather evidence before proposing any modifications. Never suggest changes based on assumptions -- gather diagnostic output first.
Based on triage output, load the appropriate reference and follow its diagnosis flow:
| Symptom | Reference |
|---|---|
| Pod status CrashLoopBackOff, ImagePullBackOff, or Pending | references/crash-diagnosis.md |
| Service unreachable, DNS failure, connection refused | references/network-debugging.md |
| CPU throttling, OOMKill, disk pressure, need debug container | references/resource-debugging.md |
Cause: The Service selector does not match any running pod labels.
Solution: Compare kubectl get svc <name> -o yaml selector with kubectl get pods --show-labels. Fix the label mismatch.