with one click
infra-triage
// Infrastructure alert triage — dedup via YT search, deep PVE/K8s investigation, auto-escalation for recurring/flapping alerts, control plane deep dive for K8s controller nodes.
// Infrastructure alert triage — dedup via YT search, deep PVE/K8s investigation, auto-escalation for recurring/flapping alerts, control plane deep dive for K8s controller nodes.
[HINT] Download the complete skill directory including SKILL.md and all related files
| name | infra-triage |
| description | Infrastructure alert triage — dedup via YT search, deep PVE/K8s investigation, auto-escalation for recurring/flapping alerts, control plane deep dive for K8s controller nodes. |
| allowed-tools | Bash |
| user-invocable | true |
| metadata | {"openclaw":{"always":true}} |
When you see a message starting with [LibreNMS] ALERT in #infra-nl-prod or #infra-gr-prod, you MUST automatically run this full triage flow using the exec tool. Do NOT ask questions — just execute.
./skills/infra-triage/infra-triage.sh <hostname> "<rule_name>" <severity> [--site nl|gr]
Site is auto-detected from hostname prefix (grskg* → GR, otherwise NL). Use --site to override.
| Variable | Purpose |
|---|---|
FORCE_ESCALATE=true | Escalate regardless (set by n8n for flapping alerts) |
EXISTING_ISSUE=ID | Reuse this YT issue instead of creating new |
SKIP_ESCALATION=true | Skip escalation step (for burst/correlated triage) |
Searches YouTrack for existing open issues with the same hostname (created within 24h). If found, reuses instead of creating a duplicate. Reopens if in "To Verify". Also searches for related issues (same alert rule on different hosts within 12h).
Creates issue in the site's YT project (IFRNLLEI01PRD for NL, IFRGRSKG01PRD for GR) or reuses existing. Registers callback with n8n for dedup tracking. Sets custom fields (Hostname, Alert Rule, Severity, Alert Source, VMID, PVE Host).
Posts investigation results as YT comment, including related issues and recurring alert flag.
Always escalates to Claude Code unless SKIP_ESCALATION=true. Includes escalation reason (standard/flapping/recurring).
exec tool to run the script. Do NOT describe what to do — execute it.pct start, pct stop, docker restart, config edits, etc.