| name | metrics-query-cookbook |
| description | Cookbook of ready-to-use PromQL queries, preset catalog, metric name dictionaries, and label references for Ceph storage, network traffic, pod statistics, and MTV migrations. Use when you need specific queries, exact metric names, or label filters. |
Metrics Query Cookbook
Ready-to-use queries, preset catalog, and metric name/label references for OpenShift clusters with ODF, OVN-Kubernetes, KubeVirt, and Forklift/MTV.
All examples use the kubectl-metrics MCP server tools (metrics_read and metrics_help).
Output format guidance: Use default (markdown) when presenting to user. Use output: "json" only when you need to parse values programmatically. Use selector to filter results by labels post-query.
Preset Catalog
Every preset works as both an instant (default) and range query. Pass start to get a time-series trend.
Cluster & Namespace
| Preset | Description |
|---|
cluster_cpu_utilization | Cluster CPU utilization percentage |
cluster_memory_utilization | Cluster memory utilization percentage |
cluster_pod_status | Pod counts by phase (Running, Pending, Failed, Succeeded, Unknown) |
cluster_node_readiness | Node readiness status counts |
namespace_cpu_usage | Top 10 namespaces by CPU usage (cores) |
namespace_memory_usage | Top 10 namespaces by memory usage (bytes) |
namespace_network_rx | Top 10 namespaces by network receive rate |
namespace_network_tx | Top 10 namespaces by network transmit rate |
namespace_network_errors | Network errors + drops by namespace (top 10) |
pod_restarts_top10 | Top 10 pods by container restart count |
Forklift / MTV Migration
| Preset | Description |
|---|
mtv_migration_status | Migration counts by status (succeeded/failed/running) |
mtv_plan_status | Plan-level status counts |
mtv_migration_duration | Migration duration per plan (seconds) |
mtv_avg_migration_duration | Average migration duration (seconds) |
mtv_data_transferred | Total bytes migrated per plan |
mtv_net_throughput | Migration network throughput |
mtv_storage_throughput | Migration storage throughput |
mtv_migration_pod_rx | Migration pod receive rate (bytes/sec, top 20) |
mtv_migration_pod_tx | Migration pod transmit rate (bytes/sec, top 20) |
mtv_forklift_traffic | Forklift operator pod network traffic (bytes/sec) |
mtv_vmi_migrations_pending | KubeVirt VMI migrations in pending phase |
mtv_vmi_migrations_running | KubeVirt VMI migrations in running phase |
Storage Metrics (Ceph / ODF)
Cluster-wide storage health
metrics_read { "command": "query", "flags": { "query": "ceph_health_status", "output": "markdown" } }
Result: 0 = OK, 1 = WARN, 2 = ERR.
Storage capacity
metrics_read { "command": "query", "flags": { "query": "ceph_cluster_total_bytes", "output": "markdown" } }
metrics_read { "command": "query", "flags": { "query": "ceph_cluster_total_used_bytes", "output": "markdown" } }
Pool-level statistics
metrics_read { "command": "query", "flags": { "query": "ceph_pool_percent_used * 100", "output": "markdown" } }
Pool I/O rates
metrics_read { "command": "query", "flags": { "query": "rate(ceph_pool_rd[5m])", "output": "markdown" } }
metrics_read { "command": "query", "flags": { "query": "rate(ceph_pool_wr[5m])", "output": "markdown" } }
OSD operation latency
metrics_read { "command": "query", "flags": { "query": "rate(ceph_osd_op_latency_sum[5m]) / rate(ceph_osd_op_latency_count[5m])", "output": "markdown" } }
Placement group health
metrics_read { "command": "query", "flags": { "query": "ceph_pg_total", "output": "markdown" } }
metrics_read { "command": "query", "flags": { "query": "ceph_pg_degraded", "output": "markdown" } }
Available labels on ceph_* metrics
| Label | Description | Example values |
|---|
pool_id | Ceph pool identifier (pool-level metrics) | 1, 2, 3, 4 |
ceph_daemon | OSD daemon name (OSD-level metrics) | osd.0, osd.1, osd.2 |
namespace | Storage operator namespace | openshift-storage |
managedBy | Managing resource | ocs-storagecluster |
job | Scrape job | rook-ceph-mgr, rook-ceph-exporter |
Storage metrics reference
| Metric | Description |
|---|
ceph_health_status | Overall cluster health (0=OK, 1=WARN, 2=ERR) |
ceph_cluster_total_bytes | Total cluster capacity |
ceph_cluster_total_used_bytes | Used cluster capacity |
ceph_pool_percent_used | Per-pool usage percentage |
ceph_pool_stored | Bytes stored per pool |
ceph_pool_max_avail | Available bytes per pool |
ceph_pool_rd, ceph_pool_wr | Read/write IOPS per pool |
ceph_pool_rd_bytes, ceph_pool_wr_bytes | Read/write bytes per pool |
ceph_osd_op_latency_sum/count | OSD operation latency (use as rate ratio) |
ceph_pg_total, ceph_pg_active, ceph_pg_degraded | Placement group counts |
node_filesystem_avail_bytes, node_filesystem_size_bytes | Node filesystem capacity |
Network Traffic Metrics
Network traffic by namespace
metrics_read { "command": "preset", "flags": { "name": "namespace_network_rx", "output": "markdown" } }
metrics_read { "command": "preset", "flags": { "name": "namespace_network_tx", "output": "markdown" } }
Network traffic by pod in a namespace
Replace TARGET_NAMESPACE with the actual namespace -- ASK the user if not known.
metrics_read {
"command": "query",
"flags": { "query": "topk(10, sort_desc(sum by (pod)(rate(container_network_receive_bytes_total{namespace=\"TARGET_NAMESPACE\"}[5m]))))", "output": "markdown" }
}
metrics_read {
"command": "query",
"flags": { "query": "topk(10, sort_desc(sum by (pod)(rate(container_network_transmit_bytes_total{namespace=\"TARGET_NAMESPACE\"}[5m]))))", "output": "markdown" }
}
Network errors and drops by namespace
metrics_read { "command": "preset", "flags": { "name": "namespace_network_errors", "output": "markdown" } }
Node-level network throughput
metrics_read {
"command": "query",
"flags": { "query": "instance:node_network_receive_bytes_excluding_lo:rate1m + instance:node_network_transmit_bytes_excluding_lo:rate1m", "output": "markdown" }
}
Available labels on network metrics
| Label | Description | Example values |
|---|
namespace | Pod namespace | openshift-storage, konveyor-forklift |
pod | Pod name | forklift-controller-6df77f6bf5-jtt7q |
interface | Network interface (per-pod metrics) | eth0 |
instance | Node instance (node-level metrics) | 10.0.0.5:9100 |
node | Node name (node-level metrics) | worker-0 |
Network metrics reference
| Metric | Description |
|---|
container_network_receive_bytes_total | Bytes received per pod/namespace |
container_network_transmit_bytes_total | Bytes transmitted per pod/namespace |
container_network_receive_errors_total | Receive errors per pod/namespace |
container_network_transmit_errors_total | Transmit errors per pod/namespace |
container_network_receive_packets_dropped_total | Dropped receive packets |
container_network_transmit_packets_dropped_total | Dropped transmit packets |
node_network_receive_bytes_total | Bytes received per node/interface |
node_network_transmit_bytes_total | Bytes transmitted per node/interface |
instance:node_network_receive_bytes_excluding_lo:rate1m | Pre-computed node receive rate |
instance:node_network_transmit_bytes_excluding_lo:rate1m | Pre-computed node transmit rate |
Pod and Container Statistics
Pod count by namespace
metrics_read { "command": "query", "flags": { "query": "topk(15, count by (namespace)(kube_pod_info))", "output": "markdown" } }
Pod phase summary
metrics_read { "command": "preset", "flags": { "name": "cluster_pod_status", "output": "markdown" } }
Container CPU usage by namespace
metrics_read { "command": "preset", "flags": { "name": "namespace_cpu_usage", "output": "markdown" } }
Container memory usage by namespace
metrics_read { "command": "preset", "flags": { "name": "namespace_memory_usage", "output": "markdown" } }
Container restart counts (instability indicator)
metrics_read { "command": "preset", "flags": { "name": "pod_restarts_top10", "output": "markdown" } }
Pods with high recent restarts (use debug_read for details)
After finding pods with high restarts, use debug_read to get pod details and logs:
debug_read { "command": "list", "flags": { "resource": "pods", "namespace": "<NAMESPACE>", "query": "where status.containerStatuses[0].restartCount > 5", "output": "markdown" } }
debug_read { "command": "logs", "flags": { "name": "<POD_NAME>", "namespace": "<NAMESPACE>", "tail": 100, "query": "where level = 'ERROR'", "output": "markdown" } }
Available labels on pod/container metrics
| Label | Description | Example values |
|---|
namespace | Pod namespace | konveyor-forklift, openshift-cnv |
pod | Pod name | forklift-controller-6df77f6bf5-jtt7q |
container | Container name | main, inventory, extract |
node | Node the pod runs on | worker-0, worker-1 |
phase | Pod phase (on status metrics) | Running, Pending, Failed, Succeeded |
uid | Pod UID | 793fb1cb-3e58-4eef-b95a-733f237365a3 |
created_by_kind | Owner resource kind (on kube_pod_info) | ReplicaSet, DaemonSet, StatefulSet |
created_by_name | Owner resource name (on kube_pod_info) | forklift-controller-6df77f6bf5 |
host_ip | Node IP (on kube_pod_info) | 192.168.0.77 |
pod_ip | Pod IP (on kube_pod_info) | 10.129.3.3 |
Pod/container metrics reference
| Metric | Description |
|---|
kube_pod_info | Pod metadata (node, namespace, IPs, owner) |
kube_pod_status_phase | Pod phase (Running/Pending/Failed/Succeeded) |
kube_pod_container_status_restarts_total | Container restart count |
kube_pod_container_status_waiting_reason | Waiting reason (CrashLoopBackOff, ImagePullBackOff, etc.) |
container_cpu_usage_seconds_total | Container CPU usage |
container_memory_working_set_bytes | Container memory usage |
namespace:container_cpu_usage:sum | Pre-aggregated CPU by namespace |
namespace:container_memory_usage_bytes:sum | Pre-aggregated memory by namespace |
Forklift / MTV Migration Metrics
Available labels on mtv_* metrics
All mtv_* metrics share these labels for filtering and grouping:
| Label | Description | Example values |
|---|
provider | Source provider type | vsphere, ovirt, openstack, ova, ec2 |
mode | Migration mode | Cold, Warm |
target | Target cluster | Local (host cluster) or remote cluster name |
owner | User who owns the migration | admin@example.com |
plan | Migration plan UUID | 363ce137-dace-4fb4-b815-759c214c9fec |
namespace | Forklift operator namespace | konveyor-forklift, openshift-mtv |
status | Migration/plan status (on status metrics) | Succeeded, Failed, Executing |
MTV migration metrics reference
| Metric | Description |
|---|
mtv_migrations_status_total | Migration counts by status (succeeded/failed/running) |
mtv_plans_status | Plan-level status counts |
mtv_migration_data_transferred_bytes | Total bytes migrated per plan |
mtv_migration_net_throughput | Migration network throughput |
mtv_migration_storage_throughput | Migration storage throughput |
mtv_migration_duration_seconds | Migration duration per plan |
mtv_plan_alert_status | Alerts on migration plans |
mtv_workload_migrations_status_total | Per-workload migration status (per plan + status) |
kubevirt_vmi_migrations_in_pending_phase | Live VMI migrations pending |
kubevirt_vmi_migrations_in_running_phase | Live VMI migrations in progress |
Migration status overview
metrics_read { "command": "preset", "flags": { "name": "mtv_migration_status", "output": "markdown" } }
Migration plan status
metrics_read { "command": "preset", "flags": { "name": "mtv_plan_status", "output": "markdown" } }
Migration data transfer and throughput
metrics_read { "command": "preset", "flags": { "name": "mtv_data_transferred", "output": "markdown" } }
metrics_read { "command": "preset", "flags": { "name": "mtv_net_throughput", "output": "markdown" } }
metrics_read { "command": "preset", "flags": { "name": "mtv_storage_throughput", "output": "markdown" } }
Migration duration
metrics_read { "command": "preset", "flags": { "name": "mtv_migration_duration", "output": "markdown" } }
metrics_read { "command": "preset", "flags": { "name": "mtv_avg_migration_duration", "output": "markdown" } }
Migration alerts
metrics_read { "command": "query", "flags": { "query": "mtv_plan_alert_status", "output": "markdown" } }
Narrowing migration metrics with label filters
Use {label="value"} in PromQL or use the selector flag:
metrics_read { "command": "query", "flags": { "query": "mtv_migration_data_transferred_bytes", "selector": "provider=vsphere", "output": "markdown" } }
metrics_read { "command": "query", "flags": { "query": "mtv_migration_data_transferred_bytes{mode=\"Cold\"}", "output": "markdown" } }
metrics_read { "command": "query", "flags": { "query": "mtv_migration_data_transferred_bytes{provider=\"ovirt\", mode=\"Warm\"}", "output": "markdown" } }
metrics_read { "command": "query", "flags": { "query": "mtv_migrations_status_total{status=\"Failed\"}", "output": "markdown" } }
metrics_read { "command": "query", "flags": { "query": "mtv_workload_migrations_status_total{plan=\"PLAN_UUID\", status=\"Failed\"}", "output": "markdown" } }
Grouping migration metrics
metrics_read { "command": "query", "flags": { "query": "sum by (provider)(mtv_migration_data_transferred_bytes)", "output": "markdown" } }
metrics_read { "command": "query", "flags": { "query": "sum by (mode)(mtv_migration_data_transferred_bytes)", "output": "markdown" } }
metrics_read { "command": "query", "flags": { "query": "sum by (provider, mode)(mtv_migration_data_transferred_bytes)", "output": "markdown" } }
metrics_read { "command": "query", "flags": { "query": "sum by (status, provider)(mtv_migrations_status_total)", "output": "markdown" } }
metrics_read { "command": "query", "flags": { "query": "avg by (provider)(mtv_migration_duration_seconds)", "output": "markdown" } }
metrics_read { "command": "query", "flags": { "query": "sum by (plan, status)(mtv_workload_migrations_status_total)", "output": "markdown" } }
metrics_read { "command": "query", "flags": { "query": "sum by (provider, status)(mtv_plans_status)", "output": "markdown" } }
Network traffic of migration pods
During active Forklift migrations, data-transfer pods run in the target namespace. Migration pod names follow the pattern {plan-name}-{vm-id}-{random} (e.g. test-vmware-metrics-vm-43-tws62).
Step 1 -- Discover migration pods:
VMware/general migration pods (carry a plan label):
debug_read { "command": "list", "flags": { "resource": "pods", "namespace": "<NAMESPACE>", "selector": "plan", "output": "markdown" } }
oVirt/OpenStack populator pods (named populate-{uuid}-...):
debug_read { "command": "list", "flags": { "resource": "pods", "namespace": "<NAMESPACE>", "query": "where name ~= '^populate-'", "output": "markdown" } }
Step 2 -- Query network traffic for discovered pods:
Use the pod names from Step 1 to build a regex filter (replace POD1|POD2 with the actual names):
metrics_read {
"command": "query",
"flags": { "query": "topk(10, sort_desc(sum by (pod)(rate(container_network_receive_bytes_total{namespace=\"TARGET_NAMESPACE\",pod=~\"POD1|POD2\"}[5m]))))", "output": "markdown" }
}
metrics_read {
"command": "query",
"flags": { "query": "topk(10, sort_desc(sum by (pod)(rate(container_network_transmit_bytes_total{namespace=\"TARGET_NAMESPACE\",pod=~\"POD1|POD2\"}[5m]))))", "output": "markdown" }
}
Short-lived pod network metrics
Pods that run under ~60 seconds (e.g. oVirt/OpenStack populator pods) may not have container-level network metrics (container_network_*). This is because cadvisor needs 1-2 collection cycles (~10-20s) to establish network namespace tracking, and the pod may complete before tracking starts. CPU and memory metrics are unaffected.
Node-level network metrics capture the transfer at the node level. Determine which node ran the pod (spec.nodeName or kube_pod_info), then query RX and TX together:
metrics_read {
"command": "query_range",
"flags": {
"query": [
"instance:node_network_receive_bytes_excluding_lo:rate1m{instance=~\"NODE_NAME.*\"}",
"instance:node_network_transmit_bytes_excluding_lo:rate1m{instance=~\"NODE_NAME.*\"}"
],
"name": ["node_rx", "node_tx"],
"start": "<MIGRATION_START>",
"end": "<MIGRATION_END>",
"step": "30s",
"output": "markdown"
}
}
Compare against baseline before/after the migration window to isolate transfer traffic.
CPU activity confirms the pod was active during the window:
metrics_read { "command": "query_range", "flags": { "query": "rate(container_cpu_usage_seconds_total{pod=\"<POD>\",namespace=\"<NS>\"}[1m])", "start": "<START>", "end": "<END>", "step": "30s", "output": "markdown" } }
Completed migration historical queries
When querying metrics for a migration that already finished, use the plan's start/completion timestamps as absolute time bounds:
- Get timestamps from the plan:
mtv_read { "command": "describe plan", "flags": { "name": "<PLAN>", "namespace": "<NS>", "output": "markdown" } }
- Use ISO-8601
start/end in query_range:
metrics_read {
"command": "query_range",
"flags": {
"query": "sum by (pod)(rate(container_network_receive_bytes_total{namespace=\"<NS>\"}[5m]))",
"start": "2025-06-15T10:00:00Z",
"end": "2025-06-15T12:30:00Z",
"step": "60s",
"output": "markdown"
}
}
Do not use relative offsets like -1h for completed migrations -- the data may fall outside that window.
Checking migration pod status with debug_read
To investigate migration pod issues alongside metrics:
debug_read { "command": "list", "flags": { "resource": "pods", "namespace": "<NAMESPACE>", "selector": "plan", "output": "markdown" } }
debug_read { "command": "logs", "flags": { "name": "<POD_NAME>", "namespace": "<NAMESPACE>", "tail": 100, "query": "where level = 'ERROR'", "output": "markdown" } }
Network traffic of the Forklift operator itself
metrics_read { "command": "preset", "flags": { "name": "mtv_forklift_traffic", "output": "markdown" } }
KubeVirt VMI migration metrics
These track live VM migrations (vMotion-style), not Forklift cold migrations:
metrics_read { "command": "preset", "flags": { "name": "mtv_vmi_migrations_pending", "output": "markdown" } }
metrics_read { "command": "preset", "flags": { "name": "mtv_vmi_migrations_running", "output": "markdown" } }
Quick Health Dashboard
Run key queries for a cluster overview:
metrics_read { "command": "preset", "flags": { "name": "cluster_cpu_utilization", "output": "markdown" } }
metrics_read { "command": "preset", "flags": { "name": "cluster_memory_utilization", "output": "markdown" } }
metrics_read { "command": "query", "flags": { "query": "ceph_health_status", "output": "markdown" } }
metrics_read { "command": "preset", "flags": { "name": "namespace_network_rx", "output": "markdown" } }
metrics_read { "command": "preset", "flags": { "name": "mtv_migration_status", "output": "markdown" } }