add-metrics

Name: Add Metrics
Author: tikv

// Add or change Prometheus metrics — cache WithLabelValues, hot path, cleanup, avoid high cardinality, naming, MustRegister, backward compatibility. From tikv/pd metrics PRs.

Run Skill in Manus

$ git log --oneline --stat

stars:1,152

forks:766

updated:March 5, 2026 at 05:57

SKILL.md

readonly

name	add-metrics
description	Add or change Prometheus metrics — cache WithLabelValues, hot path, cleanup, avoid high cardinality, naming, MustRegister, backward compatibility. From tikv/pd metrics PRs.

Principles (checklist + detail)

Before adding or changing metrics, ensure:

Cache WithLabelValues — Call once at init, store result in package-level vars; never call on every request. WithLabelValues does lookup/creation; on hot paths it adds overhead and can worsen cardinality.
Hot path — Remind developers to judge whether the current path is a hot path; if it is, suggest moving metrics operations to a background (periodic) task so the request path stays fast.
Related paths & cleanup — Instrument add + delete (or create/teardown). When an entity (store, group, stream) is removed, call vec.DeleteLabelValues(...) so cardinality does not grow and stale series are removed. Same for create/delete, register/unregister, connect/disconnect.
Avoid high-cardinality labels — Prefer reducing or aggregating; avoid many unique label values on hot paths. High cardinality can overwhelm Prometheus (memory, scrape cost).
Naming — Align with existing metrics in the same module: Namespace and Subsystem (e.g. pd_client, request), name with _seconds/_total/_count, snake_case, label names (e.g. type, host, stream) for consistent dashboards and queries.
Encapsulation — If you wrap metrics logic in a func, include Metrics in the name (e.g. recordRequestDurationMetrics, initMetrics, registerMetrics) so it’s easy to find and grep.
Backward compatibility — Do not change the type of existing metrics (e.g. Counter → Gauge) or remove/rename existing labels; dashboards and alerts may break. Prefer adding new metrics or deprecating old ones in docs.
Registration — Use prometheus.MustRegister(...) (not Register); duplicate or invalid registration will panic at init and fail fast.
Dashboard/panel alignment — When adding or changing a metric, ensure Grafana panels and alerts use the same definition (e.g. store used vs user storage size).
Don’t record when there’s nothing to record — Skip updating metrics when the operation didn’t happen (e.g. no retry, no forward); avoid recording zeros or empty aggregates that add noise.
Const labels for stable dimensions — For dimensions fixed per process/component (e.g. resource group name in client), use const labels at init instead of dynamic label values on every call.
Fix wrong metrics and document semantics — If a metric measures the wrong thing (e.g. “processing time” including network), fix the measurement or add a new metric/version; do not silently change semantics.
Instrument full lifecycle — For gRPC streams or long-lived resources, add metrics for the full lifecycle and reflect cleanup/end so dashboards don’t show stuck or misleading series.

related-skills.json

same repository

create-issue.md

from "tikv/pd"

Use when a problem, task, or idea needs to be tracked as a GitHub issue on tikv/pd. Searches for duplicates, picks the right issue template, drafts the title and body, shows the draft for user approval, and submits via `gh issue create`.

2026-03-241.2k

fix-cherry-pick-pr.md

from "tikv/pd"

Repair cherry-pick pull requests in tikv/pd when automated cherry-picks leave committed conflict markers, drift from the source PR, or need parity verification against the original PR. Use when given a source PD PR and its cherry-pick PR, or a cherry-pick PR that references an original PR, and asked to compare diffs, resolve release-branch cherry-pick conflicts, run failpoint-aware verification, and push the fixed cherry-pick branch.

2026-03-121.2k

create-pr.md

from "tikv/pd"

Push the current branch and create a pull request on tikv/pd following the repository's PR template. Analyzes commits, generates PR title/body from changes, and submits via `gh pr create`. Use when you have local commits ready to submit as a PR.

2026-03-041.2k

package.json

"author": "tikv"

"repository": "tikv/pd"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name	add-metrics
description	Add or change Prometheus metrics — cache WithLabelValues, hot path, cleanup, avoid high cardinality, naming, MustRegister, backward compatibility. From tikv/pd metrics PRs.

Principles (checklist + detail)

Before adding or changing metrics, ensure:

Cache WithLabelValues — Call once at init, store result in package-level vars; never call on every request. WithLabelValues does lookup/creation; on hot paths it adds overhead and can worsen cardinality.
Hot path — Remind developers to judge whether the current path is a hot path; if it is, suggest moving metrics operations to a background (periodic) task so the request path stays fast.
Related paths & cleanup — Instrument add + delete (or create/teardown). When an entity (store, group, stream) is removed, call vec.DeleteLabelValues(...) so cardinality does not grow and stale series are removed. Same for create/delete, register/unregister, connect/disconnect.
Avoid high-cardinality labels — Prefer reducing or aggregating; avoid many unique label values on hot paths. High cardinality can overwhelm Prometheus (memory, scrape cost).
Naming — Align with existing metrics in the same module: Namespace and Subsystem (e.g. pd_client, request), name with _seconds/_total/_count, snake_case, label names (e.g. type, host, stream) for consistent dashboards and queries.
Encapsulation — If you wrap metrics logic in a func, include Metrics in the name (e.g. recordRequestDurationMetrics, initMetrics, registerMetrics) so it’s easy to find and grep.
Backward compatibility — Do not change the type of existing metrics (e.g. Counter → Gauge) or remove/rename existing labels; dashboards and alerts may break. Prefer adding new metrics or deprecating old ones in docs.
Registration — Use prometheus.MustRegister(...) (not Register); duplicate or invalid registration will panic at init and fail fast.
Dashboard/panel alignment — When adding or changing a metric, ensure Grafana panels and alerts use the same definition (e.g. store used vs user storage size).
Don’t record when there’s nothing to record — Skip updating metrics when the operation didn’t happen (e.g. no retry, no forward); avoid recording zeros or empty aggregates that add noise.
Const labels for stable dimensions — For dimensions fixed per process/component (e.g. resource group name in client), use const labels at init instead of dynamic label values on every call.
Fix wrong metrics and document semantics — If a metric measures the wrong thing (e.g. “processing time” including network), fix the measurement or add a new metric/version; do not silently change semantics.
Instrument full lifecycle — For gRPC streams or long-lived resources, add metrics for the full lifecycle and reflect cleanup/end so dashboards don’t show stuck or misleading series.

add-metrics

Principles (checklist + detail)

More from this repository

More from this repository

Principles (checklist + detail)