con un clic
exploring-llm-clusters
// Investigate AI observability clusters — understand usage patterns in AI/LLM traffic, compare cluster behavior, compute cost/latency metrics, and drill into individual traces within clusters.
// Investigate AI observability clusters — understand usage patterns in AI/LLM traffic, compare cluster behavior, compute cost/latency metrics, and drill into individual traces within clusters.
Start, inspect, and drive the PostHog dev stack. Use for /run and /verify on this repo — when asked to launch PostHog, check whether the stack is healthy, inspect a running process, or verify a UI change against the live app.
Find feature flags that were soft-deleted in the active project within a recent time window. Use when the user asks "what flags were deleted in the last N days", "show me recently deleted feature flags", "who deleted flag X", "audit recent flag deletions", or anything similar. Handles the non-obvious gotcha that system.feature_flags exposes the deleted boolean but does not expose a deletion timestamp — the actual deleted-at time lives in the per-flag activity log and must be cross-referenced.
Focused security audit of code, calibrated to surface real exploitable bugs and suppress theoretical findings. Use when the user asks to "audit", "security-audit", "find vulnerabilities", "check for IDOR/SSRF/XSS/injection", or wants a security review of a file, directory, branch diff, or PR. Covers access control, injection, auth/secrets, sensitive data, business logic, web boundary, and AI agent/LLM trifecta risks. Produces calibrated findings with data flow, exploit request, fix, and confidence — no theoretical or defense-in-depth nits.
Investigate LLM spend in PostHog — total cost over time, cost by model, provider, user, trace, or custom dimension, token and cache-hit economics, and cost regressions. Use when the user asks "how much are we spending on LLMs?", "which model / user / feature is most expensive?", "why did cost spike?", wants to build a cost dashboard or alert, or pastes a trace URL and asks about its cost.
Investigate AI observability evaluations of both types — `hog` (deterministic code-based) and `llm_judge` (LLM-prompt-based). Find existing evaluations, inspect their configuration, run them against specific generations, query individual pass/fail results, and generate AI-powered summaries of patterns across many runs. Use when the user asks to debug why an evaluation is failing, surface common failure modes, compare results across filters, dry-run a Hog evaluator, prototype a new LLM-judge prompt, or manage the evaluation lifecycle (create, update, enable/disable, delete).
ABSOLUTE MUST to debug and inspect LLM/AI agent traces using PostHog's MCP tools. Use when the user pastes a trace or session URL (e.g. /ai-observability/traces/<id> or /ai-observability/sessions/<id>), asks to debug a trace, figure out what went wrong, check if an agent used a tool correctly, verify context/files were surfaced, inspect subagent behavior, investigate LLM decisions, or analyze token usage and costs.
| name | exploring-llm-clusters |
| description | Investigate AI observability clusters — understand usage patterns in AI/LLM traffic, compare cluster behavior, compute cost/latency metrics, and drill into individual traces within clusters. |
Use this skill when investigating AI observability clusters — understanding what patterns exist in your AI/LLM traffic, comparing cluster behavior, and drilling into individual clusters.
| Tool | Purpose |
|---|---|
posthog:llma-clustering-job-list | List clustering job configurations for the team |
posthog:llma-clustering-job-get | Get a specific clustering job by ID |
posthog:execute-sql | Query cluster run events and compute metrics |
posthog:query-llm-traces-list | Find traces belonging to a cluster |
posthog:query-llm-trace | Inspect a specific trace in detail |
PostHog clusters LLM traces (or individual generations) by embedding similarity.
A Temporal workflow runs periodically or on-demand, producing cluster events stored as
$ai_trace_clusters (trace-level) or $ai_generation_clusters (generation-level).
Each cluster event contains:
$ai_clustering_run_id — unique run identifier (format: <team_id>_<level>_<YYYYMMDD>_<HHMMSS>[_<job_id>])$ai_clustering_level — "trace" or "generation"$ai_window_start / $ai_window_end — time window analyzed$ai_total_items_analyzed — number of traces/generations processed$ai_clusters — JSON array of cluster objects$ai_clustering_params — algorithm parameters used$ai_clusters){
"cluster_id": 0,
"size": 42,
"title": "User authentication flows",
"description": "Traces involving login, signup, and token refresh operations",
"traces": {
"<trace_or_generation_id>": {
"distance_to_centroid": 0.123,
"rank": 0,
"x": -2.34,
"y": 1.56,
"timestamp": "2026-03-28T10:00:00Z",
"trace_id": "abc-123",
"generation_id": "gen-456"
}
},
"centroid_x": -2.1,
"centroid_y": 1.4
}
cluster_id: -1 is the noise/outlier cluster (items that didn't fit any cluster)traces are keyed by trace ID (trace-level) or generation event UUID (generation-level)rank orders items by proximity to centroid (0 = closest)x, y are 2D coordinates for visualization (UMAP/PCA/t-SNE reduced)Each team can have up to 5 clustering jobs. A job defines:
"trace" or "generation"Default jobs named "Default - trace" and "Default - generation" are auto-created
and disabled when a custom job is created for the same level.
posthog:execute-sql
SELECT
JSONExtractString(properties, '$ai_clustering_run_id') as run_id,
JSONExtractString(properties, '$ai_clustering_level') as level,
JSONExtractString(properties, '$ai_window_start') as window_start,
JSONExtractString(properties, '$ai_window_end') as window_end,
JSONExtractInt(properties, '$ai_total_items_analyzed') as total_items,
timestamp
FROM events
WHERE event IN ('$ai_trace_clusters', '$ai_generation_clusters')
AND timestamp >= now() - INTERVAL 7 DAY
ORDER BY timestamp DESC
LIMIT 10
posthog:execute-sql
SELECT
JSONExtractString(properties, '$ai_clustering_run_id') as run_id,
JSONExtractString(properties, '$ai_clustering_level') as level,
JSONExtractString(properties, '$ai_clustering_job_id') as job_id,
JSONExtractString(properties, '$ai_clustering_job_name') as job_name,
JSONExtractString(properties, '$ai_window_start') as window_start,
JSONExtractString(properties, '$ai_window_end') as window_end,
JSONExtractInt(properties, '$ai_total_items_analyzed') as total_items,
JSONExtractRaw(properties, '$ai_clusters') as clusters,
JSONExtractRaw(properties, '$ai_clustering_params') as params
FROM events
WHERE event IN ('$ai_trace_clusters', '$ai_generation_clusters')
AND JSONExtractString(properties, '$ai_clustering_run_id') = '<run_id>'
LIMIT 1
The clusters field is a JSON array. Parse it to see cluster titles, sizes, and descriptions.
Important: The clusters JSON can be very large (thousands of trace IDs with coordinates).
When the result is too large for inline display, it auto-persists to a file.
Use print_clusters.py from scripts/ to get a readable summary.
For trace-level clusters, compute cost/latency/token metrics:
posthog:execute-sql
SELECT
JSONExtractString(properties, '$ai_trace_id') as trace_id,
sum(toFloat(properties.$ai_total_cost_usd)) as total_cost,
max(toFloat(properties.$ai_latency)) as latency,
sum(toInt(properties.$ai_input_tokens)) as input_tokens,
sum(toInt(properties.$ai_output_tokens)) as output_tokens,
countIf(properties.$ai_is_error = 'true') as error_count
FROM events
WHERE event IN ('$ai_generation', '$ai_embedding', '$ai_span')
AND timestamp >= parseDateTimeBestEffort('<window_start>')
AND timestamp <= parseDateTimeBestEffort('<window_end>')
AND JSONExtractString(properties, '$ai_trace_id') IN ('<trace_id_1>', '<trace_id_2>', ...)
GROUP BY trace_id
For generation-level clusters, match by event UUID:
posthog:execute-sql
SELECT
toString(uuid) as generation_id,
toFloat(properties.$ai_total_cost_usd) as cost,
toFloat(properties.$ai_latency) as latency,
toInt(properties.$ai_input_tokens) as input_tokens,
toInt(properties.$ai_output_tokens) as output_tokens,
if(properties.$ai_is_error = 'true', 1, 0) as is_error
FROM events
WHERE event = '$ai_generation'
AND timestamp >= parseDateTimeBestEffort('<window_start>')
AND timestamp <= parseDateTimeBestEffort('<window_end>')
AND toString(uuid) IN ('<gen_uuid_1>', '<gen_uuid_2>', ...)
Once you've identified interesting clusters, use the trace tools to inspect individual traces:
posthog:query-llm-trace
{
"traceId": "<trace_id_from_cluster>",
"dateRange": {"date_from": "<window_start>", "date_to": "<window_end>"}
}
avg(cost), avg(latency), sum(cost) per clustertraces field)rank (closest to centroid = most representative)query-llm-trace to understand the patterntitle and description for the AI-generated summaryerror_countitems_with_errors / total_itemshttps://app.posthog.com/ai-observability/clustershttps://app.posthog.com/ai-observability/clusters/<url_encoded_run_id>https://app.posthog.com/ai-observability/clusters/<url_encoded_run_id>/<cluster_id>Always surface these links so the user can verify visually in the PostHog UI.
cluster_id: -1) contains outliers that didn't fit any patternllma-clustering-job-list to understand what clustering configs are activequery-llm-trace for deep inspection