원클릭으로 Manus에서 모든 스킬 실행

log-correlator

스타0

포크0

업데이트2026년 6월 24일 03:26

Use when you have one request ID and need its whole story — a customer escalation citing a request_id, a trace that dead-ends, or a "what happened to this specific request" question during an incident. Fans out to every system that touched the request (gateway, model server, billing, traces), merges by request_id, and prints one timeline sorted by timestamp. Reach for this instead of grepping each log store by hand.

설치

Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.

Manus에서 실행

출처

az9713

az9713/skill-best-practices

GitHub 저장소 열기 Creator 저장소 보기

다운로드

Manus에서 실행

파일 탐색기

2 개 파일

SKILL.md

readonly

name

log-correlator

description

Log Correlator

Overview

A single inference request touches several systems on its way through: the gateway accepts it, a model server runs it, billing meters it, and a trace records the spans. When something goes wrong with one specific request, the answer is spread across all of those stores, each with its own UI, query language, and clock. Stitching them together by hand is slow and error-prone — exactly the wrong thing to be doing while a customer waits.

scripts/correlate.py takes a request ID, fans out to every system in parallel, merges the results by request_id, sorts the merged events by timestamp, and prints one unified timeline. You get the request's whole story in one view.

When to Use

Use this skill when:

A customer escalation cites a request_id — "request req_a1b2c3 failed, what happened?"
A trace dead-ends — you have a request_id from a partial trace and need the log lines the trace didn't capture (sampling dropped spans, or a hop wasn't instrumented).
You're root-causing one request during an incident and need to see the gateway → model-server → billing handoffs in order.

Do NOT use this skill when:

You don't have a request_id — for symptom-first or aggregate investigation, use inference-api-debugging; for alert triage, use oncall-runner.
You need aggregate behavior across many requests — this tool is single-request by design.

How to Use

Run the script with the request ID:

python scripts/correlate.py req_a1b2c3d4

The script will:

Fan out to all four systems (gateway logs, model-server logs, billing, traces) concurrently, querying each for the request_id.
Merge the returned events into one list, tagging each with its source system.
Sort by timestamp (normalized to UTC) and print a single timeline, one line per event: timestamp source level message.
Flag correlation problems at the bottom — systems that returned nothing, suspected clock skew, and a note if the trace looks sampled-out.

Options:

--since 1h / --window 2h — bound the query window (helps when retention differs across systems).
--systems gateway,model-server — restrict the fan-out to specific systems.
--json — emit the merged timeline as JSON for further processing.

Read the timeline top to bottom: the last event before things go quiet, or the first error-level line, is usually where the request died.

Gotchas

request_id propagation has gaps — absence in a system is a clue, not proof of nothing. A downstream hop may receive the request under a different correlation field (e.g. billing keys off meter_id carrying the request_id in a sub-field), or may simply not log the id at all. If a system returns zero events, check whether the id is propagated there before concluding the request never reached it. The script flags empty systems explicitly for this reason.
Clocks are skewed between systems — sort carefully and don't over-read sub- second ordering. Gateway, model-server, and billing run on different hosts with independent clocks; skew of tens to hundreds of milliseconds is normal. The script normalizes to UTC and sorts, but two events milliseconds apart from different systems may actually be in the opposite order. Trust causal ordering (a response can't precede its request) over raw timestamp ordering when they conflict; the script warns when it detects a likely-skewed adjacent pair.
Retention windows differ — an old request may be gone from some stores but not others. Model-server debug logs may retain for days while gateway access logs retain for weeks. A partial timeline for an old request_id is often a retention artifact, not a sign the request skipped a hop. Use --since/--window to set expectations, and read the per-system retention note the script prints.
Traces are sampled — a missing trace doesn't mean the request didn't run. Distributed tracing samples a fraction of requests, so many perfectly normal requests have no trace at all. If the trace system returns nothing but the logs show a full successful path, that's sampling, not a problem. The script notes when traces are empty but logs are present.

Files

SKILL.md — this file: when to trigger, how to run the script, gotchas.
scripts/correlate.py — fans out to gateway logs, model-server logs, billing, and traces for a request_id; merges by request_id, sorts by timestamp, prints a unified timeline; flags empty systems, clock skew, and sampled-out traces.

이 저장소의 다른 Skills

같은 저장소

adversarial-review

az9713/skill-best-practices

Use when a change is written and "looks done" but has not had a hostile second pass before merge — especially diffs touching auth, money, migrations, concurrency, or anything the author is quietly unsure about. Spawns a fresh-eyes reviewer subagent that sees ONLY the diff and the spec, collects findings, drives fixes, and re-dispatches until findings degrade to nitpicks. Reach for this instead of self-reviewing; the author is the worst reviewer of their own diff.

2026-06-240

babysit-pr

az9713/skill-best-practices

Use when a PR is open and green-but-blocked, or red on CI for reasons that smell like flake — a timed-out test runner, a transient network 500 in a setup step, a check that passed locally but failed in CI. Reach for this whenever someone says "this PR keeps failing CI but the test is flaky", "can you babysit this PR to merge", "it's just a flaky check, retry it", or wants a PR shepherded through retries, conflict resolution, and auto-merge without sitting on it manually. Prefer this over hand-clicking "Re-run failed jobs" in the GitHub UI, which gives up no signal on flaky-vs-real and forgets to enable auto-merge.

2026-06-240

billing-lib

az9713/skill-best-practices

Use when writing or reviewing code that meters API token usage, bills accounts, issues invoices, applies credit grants, or computes balances with the internal `billing` library — especially around retries, mid-cycle plan changes, cache-read vs cache-write token pricing, or any place where double-billing or rounding drift would be a problem.

2026-06-240

checkout-verifier

az9713/skill-best-practices

Use when an API-credits checkout or paid-plan upgrade needs to be proven end-to-end against Stripe test mode — confirming a card charge actually creates the invoice and subscription in the right state, reproducing a "I paid but my credits didn't show up" report, checking that a declined or 3DS card fails the way the UI claims, or wiring a billing smoke test into CI so a checkout regression is caught before a customer's money is.

2026-06-240

cherry-pick-prod

az9713/skill-best-practices

Use when a specific fix that's already on main needs to land on a production/release branch without dragging along everything else — a hotfix to backport, a "cherry-pick this commit onto release-2.4", a "we need just that one PR on prod" request. Reach for this whenever someone wants to port one or a few commits to a release branch and open a PR for it, especially before doing it by hand in their main checkout, which pollutes their working tree and routinely leaves conflict markers committed or loses the original commit's provenance.

2026-06-240

code-style

az9713/skill-best-practices

Use when writing or editing code in this org's Python or JS/TS, especially before committing or opening a PR — and proactively the moment a diff adds an import, an except/catch, or any logging. Enforces the style rules Claude gets wrong by default: import grouping, error-wrapping (no bare except / empty catch), no leftover debug prints, explicit over clever. Runs scripts/check_style.sh (ruff, mypy --strict, eslint + grep guards) which exits nonzero so it drops into a pre-commit hook or CI.

2026-06-240

name

log-correlator

description

Log Correlator

Overview

When to Use

Use this skill when:

A customer escalation cites a request_id — "request req_a1b2c3 failed, what happened?"
A trace dead-ends — you have a request_id from a partial trace and need the log lines the trace didn't capture (sampling dropped spans, or a hop wasn't instrumented).
You're root-causing one request during an incident and need to see the gateway → model-server → billing handoffs in order.

Do NOT use this skill when:

You don't have a request_id — for symptom-first or aggregate investigation, use inference-api-debugging; for alert triage, use oncall-runner.
You need aggregate behavior across many requests — this tool is single-request by design.

How to Use

Run the script with the request ID:

python scripts/correlate.py req_a1b2c3d4

The script will:

Fan out to all four systems (gateway logs, model-server logs, billing, traces) concurrently, querying each for the request_id.
Merge the returned events into one list, tagging each with its source system.
Sort by timestamp (normalized to UTC) and print a single timeline, one line per event: timestamp source level message.
Flag correlation problems at the bottom — systems that returned nothing, suspected clock skew, and a note if the trace looks sampled-out.

Options:

--since 1h / --window 2h — bound the query window (helps when retention differs across systems).
--systems gateway,model-server — restrict the fan-out to specific systems.
--json — emit the merged timeline as JSON for further processing.

Read the timeline top to bottom: the last event before things go quiet, or the first error-level line, is usually where the request died.

Gotchas

request_id propagation has gaps — absence in a system is a clue, not proof of nothing. A downstream hop may receive the request under a different correlation field (e.g. billing keys off meter_id carrying the request_id in a sub-field), or may simply not log the id at all. If a system returns zero events, check whether the id is propagated there before concluding the request never reached it. The script flags empty systems explicitly for this reason.
Clocks are skewed between systems — sort carefully and don't over-read sub- second ordering. Gateway, model-server, and billing run on different hosts with independent clocks; skew of tens to hundreds of milliseconds is normal. The script normalizes to UTC and sorts, but two events milliseconds apart from different systems may actually be in the opposite order. Trust causal ordering (a response can't precede its request) over raw timestamp ordering when they conflict; the script warns when it detects a likely-skewed adjacent pair.
Retention windows differ — an old request may be gone from some stores but not others. Model-server debug logs may retain for days while gateway access logs retain for weeks. A partial timeline for an old request_id is often a retention artifact, not a sign the request skipped a hop. Use --since/--window to set expectations, and read the per-system retention note the script prints.
Traces are sampled — a missing trace doesn't mean the request didn't run. Distributed tracing samples a fraction of requests, so many perfectly normal requests have no trace at all. If the trace system returns nothing but the logs show a full successful path, that's sampling, not a problem. The script notes when traces are empty but logs are present.

Files

SKILL.md — this file: when to trigger, how to run the script, gotchas.
scripts/correlate.py — fans out to gateway logs, model-server logs, billing, and traces for a request_id; merges by request_id, sorts by timestamp, prints a unified timeline; flags empty systems, clock skew, and sampled-out traces.