| name | kubernetes-debugging |
| description | Inspect pod logs, analyze resource quotas, trace network policies, check deployment rollout status, and run cluster health checks for Kubernetes. Use this skill when diagnosing Kubernetes cluster issues, debugging failing pods, investigating network connectivity problems, analyzing resource usage, troubleshooting deployments, or performing cluster health checks. |
Kubernetes Debugging Skill
Quick Diagnostic Patterns
Pod Not Starting
kubectl get pod <pod-name> -n <namespace>
kubectl describe pod <pod-name> -n <namespace>
python3 scripts/pod_diagnostics.py <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace> --previous
Service Connectivity Issues
kubectl get svc <service-name> -n <namespace>
kubectl get endpoints <service-name> -n <namespace>
./scripts/network_debug.sh <namespace> <pod-name>
kubectl run tmp-shell --rm -i --tty --image nicolaka/netshoot -- /bin/bash
Performance Degradation
kubectl top pods -n <namespace> --containers
kubectl get pod <pod-name> -n <namespace> -o yaml | grep -A 10 lastState
kubectl logs <pod-name> -n <namespace> --tail=100 --timestamps
Cluster Health Check
./scripts/cluster_health.sh > cluster-health-$(date +%Y%m%d-%H%M%S).txt
Key Debugging Commands
Focus on non-obvious flags and patterns most useful during debugging:
Pod Debugging
kubectl get pods -A -o wide --field-selector=status.phase!=Running
kubectl logs <pod-name> -n <namespace> --previous
kubectl logs <pod-name> -n <namespace> -c <container>
kubectl logs <pod-name> -n <namespace> -f --timestamps
kubectl describe pod <pod-name> -n <namespace>
kubectl get pod <pod-name> -n <namespace> -o yaml
Service and Network Debugging
kubectl get endpoints <service-name> -n <namespace>
kubectl exec <pod-name> -n <namespace> -- nslookup kubernetes.default
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
Resource Monitoring
kubectl top pod <pod-name> -n <namespace> --containers
kubectl describe resourcequota -n <namespace>
Emergency Operations
⚠️ These commands are destructive or disruptive. Follow the verification steps before and after each operation.
Restart Deployment
kubectl rollout status deployment/<name> -n <namespace>
kubectl rollout restart deployment/<name> -n <namespace>
kubectl rollout status deployment/<name> -n <namespace> --timeout=120s
Rollback Deployment
kubectl rollout history deployment/<name> -n <namespace>
kubectl rollout undo deployment/<name> -n <namespace>
kubectl rollout status deployment/<name> -n <namespace>
kubectl get pods -n <namespace> -l app=<name>
Force Delete Stuck Pod
kubectl get pod <pod-name> -n <namespace> -w
kubectl delete pod <pod-name> -n <namespace> --force --grace-period=0
kubectl get pod <pod-name> -n <namespace>
Drain Node (Maintenance)
kubectl get pods --all-namespaces --field-selector spec.nodeName=<node-name>
kubectl cordon <node-name>
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data
kubectl get pods --all-namespaces --field-selector spec.nodeName=<node-name>
kubectl uncordon <node-name>
kubectl get node <node-name>
Advanced Debugging Techniques
Debug Containers (Kubernetes 1.23+)
kubectl debug <pod-name> -n <namespace> -it --image=nicolaka/netshoot
kubectl debug <pod-name> -n <namespace> -it --copy-to=<debug-pod-name> --container=<container>
Port Forwarding for Testing
kubectl port-forward pod/<pod-name> -n <namespace> <local-port>:<pod-port>
kubectl port-forward svc/<service-name> -n <namespace> <local-port>:<service-port>
Proxy for API Access
kubectl proxy --port=8080
curl http://localhost:8080/api/v1/namespaces/<namespace>/pods/<pod-name>
Custom Column Output
kubectl get pods -o custom-columns=NAME:.metadata.name,STATUS:.status.phase,IP:.status.podIP
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
Reference Documentation
Detailed Troubleshooting Guides
Consult references/troubleshooting_workflow.md for:
- Step-by-step workflows for each issue type
- Decision trees for diagnosis
- Command sequences for systematic debugging
- Quick reference command cheat sheet
Common Issues Database
Consult references/common_issues.md for:
- Detailed explanations of each common issue
- Symptoms and causes
- Specific debugging steps
- Solutions and fixes
- Prevention strategies