| name | isutools-prometheus |
| description | Query isutools (isucon-go-tools) metrics via PromQL on a Prometheus server. Use when the user wants to analyze ISUCON performance — slow endpoints, slow SQL queries, cache hit rate, DB connection pool, lock contention, queue depth, object pool usage, or benchmark scores — using Prometheus (`/api/v1/query`, `/api/v1/query_range`, Grafana, `promtool`). Triggers on phrases like "isutools metrics", "isucon-go-tools metrics", "isutools_api_*", "isutools_db_*", "ISUCON のメトリクスをクエリ", "PromQL for isutools". |
Querying isutools Metrics via Prometheus
PromQL reference for metrics emitted by isucon-go-tools v2. Assume a Prometheus server is already scraping the application — this skill only covers querying, not setting up the exporter.
All metrics use the isutools namespace. Subsystems: api, db, cache, locker, pool, queue, benchmark.
How to issue queries
- Prometheus HTTP API:
GET /api/v1/query?query=<PromQL> (instant), GET /api/v1/query_range?query=...&start=...&end=...&step=... (range).
promtool: promtool query instant http://<prom>:9090 '<PromQL>' / promtool query range ....
- Grafana: paste the PromQL into a panel.
When invoking via curl, URL-encode the PromQL expression. Example:
curl -sG http://prometheus:9090/api/v1/query \
--data-urlencode 'query=topk(5, sum by (url) (rate(isutools_api_request_duration_seconds_sum[1m])))'
Metric reference
api — HTTP server
| Metric | Type | Labels |
|---|
isutools_api_request_total | Counter | code, method, host, url |
isutools_api_request_duration_seconds | Histogram | code, method, url |
isutools_api_request_size_bytes | Histogram | code, method, url |
isutools_api_response_size_bytes | Histogram | code, method, url |
isutools_api_flow_total | Counter | source_method, source_path, target_method, target_path |
url is pre-normalized: UUIDs → <uuid>, digit runs → <number>. Use the normalized form when filtering.
db — database/sql wrapper
| Metric | Type | Labels |
|---|
isutools_db_query_count | Counter | driver, addr, query |
isutools_db_query_duration_seconds | Histogram | driver, addr, query |
isutools_db_max_open_connections | Gauge | driver, addr, connection_id |
isutools_db_connection_pool | Gauge | driver, addr, connection_id, status (idle/open/in_use) |
isutools_db_wait_count | Gauge | driver, addr, connection_id |
isutools_db_wait_duration | Gauge | driver, addr, connection_id |
isutools_db_max_idle_closed | Gauge | driver, addr, connection_id |
isutools_db_max_lifetime_closed | Gauge | driver, addr, connection_id |
isutools_db_max_idle_time_closed | Gauge | driver, addr, connection_id |
query is a normalized SQL string (high cardinality). isutools_db_wait_duration is in nanoseconds.
cache — motoki317/sc and isutools maps/slices
| Metric | Type | Labels |
|---|
isutools_cache_hit_count | Gauge | name, stat (hit/grace_hit/miss/replace) |
isutools_cache_load_count | Gauge | name, status (hit/miss) |
isutools_cache_store_count | Gauge | name, status (replace/new/remove) |
isutools_cache_index_access | Histogram | name |
isutools_cache_length | Gauge | name |
isutools_cache_hit_count is a Gauge that mirrors the cumulative sc.Stats() snapshot — query it directly, don't wrap it in rate().
locker — RWMutex
| Metric | Type | Labels |
|---|
isutools_locker_index_access | Histogram | name, type (read/write) |
pool — object pool
| Metric | Type | Labels |
|---|
isutools_pool_count | Counter | name, type (alloc/get/put) |
queue — channel-backed queue
| Metric | Type | Labels |
|---|
isutools_queue_counter | Gauge | name, status (in/out) |
benchmark
| Metric | Type | Labels |
|---|
isutools_benchmark_score | Gauge | — |
isutools_benchmark_duration | Gauge | — |
PromQL recipes
Slowest endpoints (p95 over the last 1m)
topk(10,
histogram_quantile(0.95,
sum by (method, url, le) (
rate(isutools_api_request_duration_seconds_bucket[1m])
)
)
)
Endpoints consuming the most total time (the score-killers)
topk(10,
sum by (method, url) (
rate(isutools_api_request_duration_seconds_sum[1m])
)
)
Throughput and 5xx error rate per endpoint
sum by (method, url) (rate(isutools_api_request_total[1m]))
sum by (url) (rate(isutools_api_request_total{code=~"5.."}[1m]))
/ ignoring(code) sum by (url) (rate(isutools_api_request_total[1m]))
Endpoint transition flows
topk(20,
sum by (source_path, target_path) (
rate(isutools_api_flow_total[5m])
)
)
Slowest SQL queries (p95)
topk(10,
histogram_quantile(0.95,
sum by (query, le) (
rate(isutools_db_query_duration_seconds_bucket[1m])
)
)
)
SQL queries by total time consumed
topk(10,
sum by (query) (
rate(isutools_db_query_duration_seconds_sum[1m])
)
)
DB connection pool saturation
isutools_db_connection_pool{status="in_use"}
/ on(connection_id) isutools_db_max_open_connections
Approaching 1.0 means the pool is the bottleneck. Cross-check with growth in isutools_db_wait_count and isutools_db_wait_duration:
rate(isutools_db_wait_count[1m])
rate(isutools_db_wait_duration[1m]) / 1e9 # seconds per second of wait
Cache hit rate (sc.Cache)
sum by (name) (isutools_cache_hit_count{stat=~"hit|grace_hit"})
/ sum by (name) (isutools_cache_hit_count{stat=~"hit|grace_hit|miss"})
Map cache hit rate
sum by (name) (isutools_cache_load_count{status="hit"})
/ sum by (name) (isutools_cache_load_count)
Slice index access distribution (p99)
histogram_quantile(0.99,
sum by (name, le) (
rate(isutools_cache_index_access_bucket[1m])
)
)
Lock contention (RWMutex acquisition latency p99)
histogram_quantile(0.99,
sum by (name, type, le) (
rate(isutools_locker_index_access_bucket[1m])
)
)
Object pool reuse ratio
sum by (name) (rate(isutools_pool_count{type="get"}[1m]))
/ sum by (name) (rate(isutools_pool_count{type="alloc"}[1m]))
A high ratio means most gets reuse a pooled object instead of allocating.
Queue depth and throughput
sum by (name) (isutools_queue_counter{status="in"})
- sum by (name) (isutools_queue_counter{status="out"})
sum by (name) (rate(isutools_queue_counter{status="in"}[1m]))
sum by (name) (rate(isutools_queue_counter{status="out"}[1m]))
Benchmark score and duration
isutools_benchmark_score
isutools_benchmark_duration
Workflow when answering an isutools metrics question
- Map the question to a subsystem and metric from the reference above.
- Aggregate first with
sum by (...) before applying histogram_quantile — url and query are high-cardinality, and querying buckets without aggregation is both slow and wrong.
- Pick the time window deliberately:
[1m] for live load, [5m]–[15m] for trends, range queries for benchmarks.
- Report the exact PromQL used so the user can paste it into Grafana /
promtool and verify.
Gotchas
- Histograms expose
_bucket / _sum / _count — always rate(..._bucket[...]) and sum by (..., le) before histogram_quantile.
isutools_cache_hit_count is a Gauge (cumulative snapshot). Use delta(...[5m]) if you need the change over a window; never rate().
isutools_db_wait_duration is in nanoseconds — divide by 1e9 for seconds.
- Counter resets happen on app restart; use
rate() / increase() (which handle resets) rather than raw ... - ... offset ....
url and query labels are pre-normalized inside the application; queries should use the normalized form (e.g. /users/<number>, not /users/42).
isutools_api_request_total carries a host label that the duration/size histograms do not; don't on()-join across them on host.