| name | grafana |
| description | Use when someone needs the right Grafana dashboard for a symptom — high latency, error-rate spike, saturated nodes, queue backlog, a cost/egress jump — or asks for a dashboard link, a datasource UID, or which cluster/region variables to set. Reach for this whenever building a Grafana URL or picking a panel, because the datasource UIDs and org/cluster values are not guessable and a link built with the wrong one silently shows an empty or wrong-org dashboard. Not for building/designing dashboards in code, configuring a new datasource or alert rule, writing PromQL, debugging the Grafana instance, or Grafana pricing — this only finds the right existing dashboard. |
grafana
Overview
This is a pure reference skill — a lookup table, no scripts. It maps symptoms to the correct dashboard, records the canonical datasource UIDs, cluster/region names, and org IDs, and lists the variables a link needs before it resolves. Grafana links are fragile: a dashboard rendered against the wrong datasource UID or without the org pre-selected looks broken even when nothing is wrong. This skill exists so Claude builds a link that actually loads the right data the first time.
All the data lives in references/dashboards.md. This file tells Claude when to open it and the few rules that keep a constructed link from silently failing.
When to Use
Use this skill when the request is to:
- Find the dashboard for a symptom ("p99 latency is up," "5xx spike," "nodes are saturated," "the bill jumped").
- Get a datasource UID, cluster name, region, or org ID to build a Grafana link or query.
- Construct a Grafana URL with the correct
var-* template variables set.
- Sanity-check why a Grafana link someone shared shows no data.
Do NOT use this for: querying the event warehouse or building funnels (funnel-query), or cohort/retention statistics (cohort-compare). This skill does not fetch or compute anything — it tells you which pre-built dashboard to open and how to address it correctly. If the question needs raw data manipulation, it's the wrong skill.
How to use the reference
- Read
references/dashboards.md.
- Match the user's symptom in the symptom → dashboard table to a dashboard UID and the panel/row to look at.
- Pull the datasource UID, org ID, cluster, and region from the datasource UIDs and clusters / regions tables.
- Build the link with every required
var-* set, and confirm the org matches before sharing it.
Don't paraphrase UIDs or cluster names from memory — copy them from the reference. They drift across environments and are not guessable.
Gotchas
ALWAYS treat these as real, observed failure modes — each has produced a "dashboard is broken" report that was actually a wrong-UID/wrong-org link.
-
Datasource UIDs drift across orgs and environments. The same logical datasource ("prod metrics") has a different UID in staging, in each region's org, and after a datasource is recreated. A link hard-coding a stale UID renders an empty panel, not an error. Always copy the UID from the table for the specific env/org you're targeting; never reuse a UID from another environment.
-
You must switch to the right org before a dashboard link resolves. Grafana is multi-org; a dashboard UID is only valid within its org. Opening a link while signed into a different org shows "dashboard not found" or, worse, a same-named dashboard from the wrong org with the wrong data. Set orgId= in the URL (values in the reference) and verify the org name in the top-left before trusting anything you see.
-
Dashboard template variables (cluster / region) must be set or the dashboard shows nothing — or everything. Most dashboards default to a placeholder or All for var-cluster / var-region. Unset, you either get an empty graph or an unreadable all-clusters overlay. Always append the var-cluster= / var-region= (and any var-namespace=) the dashboard requires; the reference lists which variables each dashboard needs and the valid values.
-
All on a high-cardinality variable can time out or mislead. Selecting All clusters/regions on a busy dashboard either times the query out or averages away the very spike you're chasing. Narrow to the specific cluster/region from the symptom before reading the panel.
-
A symptom can map to more than one dashboard — start at the one in the table, then follow its drill-down. The table gives the entry dashboard for a symptom; the real cause often lives one drill-down deeper (e.g. latency → service dashboard → dependency dashboard). Don't stop at the first panel if it only confirms the symptom without localizing it.
Files
references/dashboards.md — the lookup tables: datasource UIDs (per org/env), cluster and region names, org IDs, and the symptom → dashboard map with the required var-* for each. Read this; do not reconstruct UIDs or cluster names from memory.