with one click
add-metrics
// Add or change Prometheus metrics — cache WithLabelValues, hot path, cleanup, avoid high cardinality, naming, MustRegister, backward compatibility. From tikv/pd metrics PRs.
// Add or change Prometheus metrics — cache WithLabelValues, hot path, cleanup, avoid high cardinality, naming, MustRegister, backward compatibility. From tikv/pd metrics PRs.
Use when a problem, task, or idea needs to be tracked as a GitHub issue on tikv/pd. Searches for duplicates, picks the right issue template, drafts the title and body, shows the draft for user approval, and submits via `gh issue create`.
Repair cherry-pick pull requests in tikv/pd when automated cherry-picks leave committed conflict markers, drift from the source PR, or need parity verification against the original PR. Use when given a source PD PR and its cherry-pick PR, or a cherry-pick PR that references an original PR, and asked to compare diffs, resolve release-branch cherry-pick conflicts, run failpoint-aware verification, and push the fixed cherry-pick branch.
Push the current branch and create a pull request on tikv/pd following the repository's PR template. Analyzes commits, generates PR title/body from changes, and submits via `gh pr create`. Use when you have local commits ready to submit as a PR.
| name | add-metrics |
| description | Add or change Prometheus metrics — cache WithLabelValues, hot path, cleanup, avoid high cardinality, naming, MustRegister, backward compatibility. From tikv/pd metrics PRs. |
Before adding or changing metrics, ensure:
Cache WithLabelValues — Call once at init, store result in package-level vars; never call on every request. WithLabelValues does lookup/creation; on hot paths it adds overhead and can worsen cardinality.
Hot path — Remind developers to judge whether the current path is a hot path; if it is, suggest moving metrics operations to a background (periodic) task so the request path stays fast.
Related paths & cleanup — Instrument add + delete (or create/teardown). When an entity (store, group, stream) is removed, call vec.DeleteLabelValues(...) so cardinality does not grow and stale series are removed. Same for create/delete, register/unregister, connect/disconnect.
Avoid high-cardinality labels — Prefer reducing or aggregating; avoid many unique label values on hot paths. High cardinality can overwhelm Prometheus (memory, scrape cost).
Naming — Align with existing metrics in the same module: Namespace and Subsystem (e.g. pd_client, request), name with _seconds/_total/_count, snake_case, label names (e.g. type, host, stream) for consistent dashboards and queries.
Encapsulation — If you wrap metrics logic in a func, include Metrics in the name (e.g. recordRequestDurationMetrics, initMetrics, registerMetrics) so it’s easy to find and grep.
Backward compatibility — Do not change the type of existing metrics (e.g. Counter → Gauge) or remove/rename existing labels; dashboards and alerts may break. Prefer adding new metrics or deprecating old ones in docs.
Registration — Use prometheus.MustRegister(...) (not Register); duplicate or invalid registration will panic at init and fail fast.
Dashboard/panel alignment — When adding or changing a metric, ensure Grafana panels and alerts use the same definition (e.g. store used vs user storage size).
Don’t record when there’s nothing to record — Skip updating metrics when the operation didn’t happen (e.g. no retry, no forward); avoid recording zeros or empty aggregates that add noise.
Const labels for stable dimensions — For dimensions fixed per process/component (e.g. resource group name in client), use const labels at init instead of dynamic label values on every call.
Fix wrong metrics and document semantics — If a metric measures the wrong thing (e.g. “processing time” including network), fix the measurement or add a new metric/version; do not silently change semantics.
Instrument full lifecycle — For gRPC streams or long-lived resources, add metrics for the full lifecycle and reflect cleanup/end so dashboards don’t show stuck or misleading series.