with one click
k8s-triage
// Kubernetes alert triage — dedup via YT search, deep control plane investigation, auto-escalation for recurring/flapping/control-plane alerts.
// Kubernetes alert triage — dedup via YT search, deep control plane investigation, auto-escalation for recurring/flapping/control-plane alerts.
[HINT] Download the complete skill directory including SKILL.md and all related files
| name | k8s-triage |
| description | Kubernetes alert triage — dedup via YT search, deep control plane investigation, auto-escalation for recurring/flapping/control-plane alerts. |
| allowed-tools | Bash |
| user-invocable | true |
| metadata | {"openclaw":{"always":true}} |
When you see a message containing k8s-triage.sh in #infra-nl-prod or #infra-gr-prod, you MUST run it using the exec tool immediately. Do NOT ask questions.
./skills/k8s-triage/k8s-triage.sh "<alertname>" "<severity>" "<namespace>" "<summary>" "<node>" "<pod>" [--site nl|gr]
Site is auto-detected from node name prefix (grskg* → GR, otherwise NL). Use --site to override.
| Variable | Purpose |
|---|---|
FORCE_ESCALATE=true | Escalate regardless of severity (set by n8n for flapping alerts) |
EXISTING_ISSUE=ID | Reuse this YT issue instead of creating new |
SKIP_ESCALATION=true | Skip escalation step (for testing) |
exec tool to run the script. Do NOT describe what to do.Before creating a new YT issue, the script searches YouTrack for existing open issues with the same Alert Rule created within 24h. If found, it reuses that issue instead of creating a duplicate. Also searches for related issues (same node or namespace within 12h) and lists them in the findings.
When the alert involves a control plane component (apiserver, etcd, controller-manager, scheduler), the script performs cross-component investigation:
Escalation is triggered by ANY of these conditions:
FORCE_ESCALATE=true, set by n8n for flapping alerts)After creating/reusing the YouTrack issue, the script POSTs a register callback to n8n so the Prometheus Alert Receiver can track the issue:
POST /webhook/prometheus-alert (or /webhook/prometheus-alert-gr for GR site)
{
"action": "register",
"alertKey": "<alertname>:<namespace>",
"issueId": "<IFRNLLEI01PRD-NNN or IFRGRSKG01PRD-NNN>"
}
The script sets 6 custom fields on newly created issues:
KubePodCrashLooping)critical or warningprometheus (distinguishes from LibreNMS alerts)