// Analyzes distributed systems using Prometheus (PromQL), Loki (LogQL), and Tempo (TraceQL). Constructs efficient queries for metrics, logs, and traces. Interprets results with token-efficient structured output. Use when debugging performance issues, investigating errors, analyzing latency, or correlating observability signals across metrics, logs, and traces.
| name | observability |
| description | Analyzes distributed systems using Prometheus (PromQL), Loki (LogQL), and Tempo (TraceQL). Constructs efficient queries for metrics, logs, and traces. Interprets results with token-efficient structured output. Use when debugging performance issues, investigating errors, analyzing latency, or correlating observability signals across metrics, logs, and traces. |
Query construction and analysis for Prometheus, Loki, and Tempo.
Start with all available metrics then drill down to logs and traces for context.
Progressive Query Construction
Multi-Signal Correlation
trace_id, service.name, timestamp for correlationToken-Efficient Results
## Finding: [One-sentence summary]
**Evidence**: [Specific values/metrics]
**Impact**: [User/business effect]
**Cause**: [Root issue if identified]
**Action**: [Next step]
Target: <500 tokens for complete analysis
Common starting points (adapt based on context):
# Metrics: Error rate, latency percentiles, traffic patterns
sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))
histogram_quantile(0.95, sum by (le) (rate(http_duration_bucket[5m])))
sum(rate(http_requests_total[5m])) by (endpoint)
# Logs: Error details, slow operations
{job="service"} |= "error" | json
{job="service"} | json | unwrap duration_ms | duration_ms > threshold
# Traces: Error traces, slow requests, request flow
{status=error && service.name="service"}
{duration > threshold && service.name="service"}
{kind="server" && service.name="service"}
Labels: Use specific labels, avoid high cardinality aggregations Time ranges: Match analysis needs (5m for rate, adjust as needed) Aggregations: Filter first, then aggregate for efficiency
Extract key information:
Quantify impact: Convert metrics to business/user impact Prioritize: Focus on severity, scope, and trend
Consult references for detailed syntax, patterns, and workflows:
When to use references:
DO:
DON'T:
Effective analysis provides: