| name | exploring-apm-traces |
| description | Investigates distributed application performance using PostHog APM (OpenTelemetry span) data via MCP. Use when the user asks about service traces, slow HTTP/database spans, error spans, trace IDs, or span attributes — not AI observability traces or product logs. Uses posthog:query-apm-spans, posthog:apm-trace-get, posthog:apm-services-list, posthog:apm-attributes-list, and posthog:apm-attribute-values-list.
|
Exploring APM traces (OpenTelemetry spans)
PostHog captures distributed traces from OpenTelemetry. Each trace is a tree of spans representing a request's path through services.
Disambiguation: This skill is for APM / OpenTelemetry traces. Do not confuse with AI observability traces (agent/model $ai_* events) or logs (posthog:query-logs, posthog:logs-*).
Available tools
| Tool | Purpose |
|---|
posthog:query-apm-spans | Search and filter spans (compact list view) |
posthog:apm-trace-get | Get the full span list for one hex trace_id |
posthog:apm-spans-aggregate | Per-operation aggregates (count, p50/p95, errors) |
posthog:apm-spans-tree | Call-tree aggregates per (parent, child) edge |
posthog:apm-services-list | List distinct service names |
posthog:apm-attributes-list | List span or resource attribute keys |
posthog:apm-attribute-values-list | List values for a specific attribute key |
See references/spans-and-fields.md for the response schema and the kind/status_code enums.
Workflow: debug a trace from a URL
Step 1 — Fetch the trace
posthog:apm-trace-get
{
"trace_id": "<hex_trace_id>"
}
The response is { results: [span, span, …] } — a flat list of every span in the trace.
The list can be very large for fan-out request flows; when it exceeds the inline limit, Claude Code auto-persists it to a file.
From the result you get:
- Every span with
name, service_name, kind, status_code, parent_span_id, duration_nano, is_root_span
- The
_posthogUrl — always include this in your response so the user can click through to the UI
Step 2 — Parse large results with scripts
When the result is persisted to a file (traces with hundreds of spans across services), use the parsing scripts to explore it.
Start with the summary to get the full picture, then drill into specifics:
python3 scripts/print_summary.py /path/to/persisted-file.json
python3 scripts/print_timeline.py /path/to/persisted-file.json
SPAN="HTTP GET /api/users" python3 scripts/extract_span.py /path/to/persisted-file.json
SEARCH="keyword" python3 scripts/search_spans.py /path/to/persisted-file.json
python3 scripts/show_structure.py /path/to/persisted-file.json
All scripts support MAX_LEN=N env var to control truncation (0 = unlimited).
Tree reconstruction (parent_span_id → span_id)
The flat span list is a tree. Each span carries:
trace_id — same on every span in the trace
span_id — this span's unique hex ID
parent_span_id — points to the parent's span_id (zero-padded hex 000…000 for the root)
is_root_span — convenience flag for the trace entry
To rebuild the tree:
- Spans where
is_root_span is true (or parent_span_id == "00000000…") are root spans.
- Every other span is a child of the span whose
span_id matches its parent_span_id.
- Group by
parent_span_id, walk from each root downward.
scripts/print_timeline.py does this for you and prints a DFS-indented tree.
Investigation patterns
"Where is time going?"
- Run
print_summary.py — it surfaces the top-5 slowest spans by duration_nano.
- For a noisy trace, run
print_timeline.py and scan the indented durations — you can see whether time is dominated by one child span or fan-out across many.
- To dig into one slow span,
SPAN="<name>" python3 scripts/extract_span.py FILE.
"Where did the error happen?"
print_summary.py lists every span with status_code == 2 (Error). Each entry shows service, span name, and parent context.
- Walk up the tree from an error span via
parent_span_id to see what request path led there.
apm-attribute-values-list is the only way to fetch error message attributes — they're not in the trace payload.
"Did the request hit service X?"
- Run
print_summary.py — it prints the set of services involved in the trace.
- If service X is missing, the request never reached it (or instrumentation is missing — check
apm-services-list to confirm X has emitted spans recently at all).
"Did the fan-out look right?"
print_timeline.py shows the indentation — wide trees mean parallel calls, deep trees mean sequential dependencies.
- Look for spans of kind
Client (3) followed by matching Server (2) spans on the called service — that's a synchronous downstream call.
Searching by attribute (e.g. http.method=POST)
The trace payload does not contain attributes — only the span's built-in fields. To filter or search by attributes:
- Use
apm-attributes-list / apm-attribute-values-list to discover keys and values.
- Re-issue
query-apm-spans with a filterGroup entry of type span_attribute or span_resource_attribute.
Constructing UI links
apm-trace-get and query-apm-spans return _posthogUrl — always surface this to the user so they can verify in the PostHog UI.
When presenting findings, include the relevant PostHog URL.
Finding traces
Use posthog:query-apm-spans to search and filter spans. Note this returns spans, not a tree — pass query.traceId or grab a trace_id from the results and feed it to apm-trace-get for the tree.
Discover before filtering
Before constructing filters, discover what's actually in the project:
- Confirm services exist — call
apm-services-list to see which services have emitted spans.
- Find filterable attributes — call
apm-attributes-list with attribute_type: "span" or "resource".
- Get actual values — call
apm-attribute-values-list with a key to see the real values in use.
Only then construct query-apm-spans filters. Custom attributes vary per project and cannot be guessed.
By filters
posthog:query-apm-spans
{
"query": {
"serviceNames": ["api-gateway"],
"dateRange": {"date_from": "-1h"},
"filterGroup": [
{"key": "http.status_code", "operator": "gt", "type": "span_attribute", "value": "499"}
]
}
}
By trace ID (when known)
posthog:apm-trace-get
{
"trace_id": "0123456789abcdef0123456789abcdef"
}
Common gotchas
- Durations are nanoseconds. 1 second =
1_000_000_000. Filter values in query-apm-spans for duration are also nanoseconds.
status_code == 2 is Error. 0 is Unset, 1 is OK. Use OK to match {0, 1} in the UI filter.
kind is an integer 0–5: 0 Unspecified, 1 Internal, 2 Server, 3 Client, 4 Producer, 5 Consumer.
parent_span_id of a root span is "0000000000000000" (16 zero hex chars, matching the 8-byte span ID width — not the 16-byte trace ID width), not null.
Parsing large trace results
Trace tool results are JSON. When too large to read inline, Claude Code persists them to a file.
Persisted file format
[{ "type": "text", "text": "{\"results\": [...], \"_posthogUrl\": \"...\"}" }]
Every script in scripts/ unwraps this envelope before parsing.
Trace JSON structure
results (array of span dicts)
└── each span:
├── uuid, trace_id, span_id, parent_span_id (hex strings)
├── name, kind (int 0–5), service_name
├── status_code (int 0–2), is_root_span (bool)
├── timestamp, end_time (ISO 8601)
├── duration_nano (int, nanoseconds)
└── matched_filter (bool — only meaningful when prefetching from query-apm-spans)
Available scripts
| Script | Purpose | Usage |
|---|
print_summary.py | Trace metadata, services, slowest spans, errors | python3 scripts/print_summary.py FILE |
print_timeline.py | DFS-indented tree from parent_span_id walk | python3 scripts/print_timeline.py FILE |
extract_span.py | Full row + parent/children for spans matching a name | SPAN="name" python3 scripts/extract_span.py FILE |
search_spans.py | Find a keyword across name, service_name, IDs | SEARCH="kw" python3 scripts/search_spans.py FILE |
show_structure.py | Show JSON keys and types without values | python3 scripts/show_structure.py FILE |
Tips
- Always set
dateRange on query-apm-spans — queries without a time range are slow. Default is -1h; widen only when needed.
- Always include the
_posthogUrl in your response so the user can click through.
- Attributes are not in the
apm-trace-get payload — use apm-attribute-values-list for those.
is_root_span is the cheap way to find the trace entry — don't string-match 00000000….
- For aggregates (p95 by operation, slowest children of a span), use
apm-spans-aggregate for a flat view or apm-spans-tree for parent→child edges — don't reach for SQL.