| name | log-correlator |
| description | Use when you have one request ID and need its whole story — a customer escalation citing a request_id, a trace that dead-ends, or a "what happened to this specific request" question during an incident. Fans out to every system that touched the request (gateway, model server, billing, traces), merges by request_id, and prints one timeline sorted by timestamp. Reach for this instead of grepping each log store by hand. |
Log Correlator
Overview
A single inference request touches several systems on its way through: the
gateway accepts it, a model server runs it, billing meters it, and a trace records
the spans. When something goes wrong with one specific request, the answer is
spread across all of those stores, each with its own UI, query language, and
clock. Stitching them together by hand is slow and error-prone — exactly the wrong
thing to be doing while a customer waits.
scripts/correlate.py takes a request ID, fans out to every system in parallel,
merges the results by request_id, sorts the merged events by timestamp, and
prints one unified timeline. You get the request's whole story in one view.
When to Use
Use this skill when:
- A customer escalation cites a request_id — "request
req_a1b2c3 failed,
what happened?"
- A trace dead-ends — you have a request_id from a partial trace and need the
log lines the trace didn't capture (sampling dropped spans, or a hop wasn't
instrumented).
- You're root-causing one request during an incident and need to see the
gateway → model-server → billing handoffs in order.
Do NOT use this skill when:
- You don't have a request_id — for symptom-first or aggregate investigation, use
inference-api-debugging; for alert triage, use oncall-runner.
- You need aggregate behavior across many requests — this tool is single-request
by design.
How to Use
Run the script with the request ID:
python scripts/correlate.py req_a1b2c3d4
The script will:
- Fan out to all four systems (gateway logs, model-server logs, billing,
traces) concurrently, querying each for the request_id.
- Merge the returned events into one list, tagging each with its source
system.
- Sort by timestamp (normalized to UTC) and print a single timeline, one
line per event:
timestamp source level message.
- Flag correlation problems at the bottom — systems that returned nothing,
suspected clock skew, and a note if the trace looks sampled-out.
Options:
--since 1h / --window 2h — bound the query window (helps when retention
differs across systems).
--systems gateway,model-server — restrict the fan-out to specific systems.
--json — emit the merged timeline as JSON for further processing.
Read the timeline top to bottom: the last event before things go quiet, or the
first error-level line, is usually where the request died.
Gotchas
- request_id propagation has gaps — absence in a system is a clue, not proof of
nothing. A downstream hop may receive the request under a different
correlation field (e.g. billing keys off
meter_id carrying the request_id in a
sub-field), or may simply not log the id at all. If a system returns zero
events, check whether the id is propagated there before concluding the request
never reached it. The script flags empty systems explicitly for this reason.
- Clocks are skewed between systems — sort carefully and don't over-read sub-
second ordering. Gateway, model-server, and billing run on different hosts
with independent clocks; skew of tens to hundreds of milliseconds is normal. The
script normalizes to UTC and sorts, but two events milliseconds apart from
different systems may actually be in the opposite order. Trust causal ordering
(a response can't precede its request) over raw timestamp ordering when they
conflict; the script warns when it detects a likely-skewed adjacent pair.
- Retention windows differ — an old request may be gone from some stores but not
others. Model-server debug logs may retain for days while gateway access logs
retain for weeks. A partial timeline for an old request_id is often a retention
artifact, not a sign the request skipped a hop. Use
--since/--window to set
expectations, and read the per-system retention note the script prints.
- Traces are sampled — a missing trace doesn't mean the request didn't run.
Distributed tracing samples a fraction of requests, so many perfectly normal
requests have no trace at all. If the trace system returns nothing but the logs
show a full successful path, that's sampling, not a problem. The script notes
when traces are empty but logs are present.
Files
SKILL.md — this file: when to trigger, how to run the script, gotchas.
scripts/correlate.py — fans out to gateway logs, model-server logs, billing,
and traces for a request_id; merges by request_id, sorts by timestamp, prints a
unified timeline; flags empty systems, clock skew, and sampled-out traces.