| name | remediation |
| description | Safe remediation actions for Kubernetes. Use when proposing or executing pod restarts, deployment scaling, or rollbacks. Always use dry-run first. |
Remediation Actions
Safety Principles
- ALWAYS dry-run first - All scripts support
--dry-run flag
- Confirm before executing - Show what will happen, ask for confirmation
- Document the action - Log what was done and why
- Have a rollback plan - Know how to undo the action
Available Scripts
All scripts are in .claude/skills/remediation/scripts/
restart_pod.py - Restart a pod by deleting it
python .claude/skills/remediation/scripts/restart_pod.py <pod-name> -n <namespace> --dry-run
python .claude/skills/remediation/scripts/restart_pod.py <pod-name> -n <namespace>
scale_deployment.py - Scale a deployment
python .claude/skills/remediation/scripts/scale_deployment.py <deployment> -n <namespace> --replicas N --dry-run
python .claude/skills/remediation/scripts/scale_deployment.py <deployment> -n <namespace> --replicas N
rollback_deployment.py - Rollback to previous revision
python .claude/skills/remediation/scripts/rollback_deployment.py <deployment> -n <namespace> --dry-run
python .claude/skills/remediation/scripts/rollback_deployment.py <deployment> -n <namespace>
Remediation Workflow
- Diagnose first - Use k8s-debugger to understand the issue
- Propose action - State what you plan to do and why
- Dry run - Show what will happen
- Get confirmation - Ask user to confirm
- Execute - Run the action
- Verify - Check that the issue is resolved
Common Remediation Scenarios
Pod stuck in CrashLoopBackOff
python .claude/skills/infrastructure/kubernetes/scripts/get_events.py <pod> -n <namespace>
python .claude/skills/remediation/scripts/restart_pod.py <pod> -n <namespace> --dry-run
python .claude/skills/remediation/scripts/restart_pod.py <pod> -n <namespace>
Deployment stuck with bad image
python .claude/skills/infrastructure/kubernetes/scripts/get_history.py <deployment> -n <namespace>
python .claude/skills/remediation/scripts/rollback_deployment.py <deployment> -n <namespace> --dry-run
python .claude/skills/remediation/scripts/rollback_deployment.py <deployment> -n <namespace>
Service under high load
python .claude/skills/infrastructure/kubernetes/scripts/describe_deployment.py <deployment> -n <namespace>
python .claude/skills/remediation/scripts/scale_deployment.py <deployment> -n <namespace> --replicas 5 --dry-run
python .claude/skills/remediation/scripts/scale_deployment.py <deployment> -n <namespace> --replicas 5
Output Format
When proposing remediation, use this structure:
## Proposed Remediation
**Action**: [e.g., Restart pod, Scale deployment, Rollback]
**Target**: [resource name and namespace]
**Reason**: [why this action will help]
**Risk**: [potential side effects]
### Dry Run Output
[output from --dry-run]
### Confirmation Required
Please confirm you want to proceed with this action.