원클릭으로
metrics-collector
>-
Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.
메뉴
>-
Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.
SOC 직업 분류 기준
| id | metrics-collector |
| name | metrics-collector |
| type | skill |
| version | 1.0.0 |
| created | 20/03/2026 |
| modified | 20/03/2026 |
| status | active |
| metadata | {"author":"NodeJS-Starter-V1","version":"1.0.0","locale":"en-AU"} |
| description | >- |
| context | fork |
Standardised patterns for collecting, storing, and querying application metrics. Codifies the project's existing database-backed approach (Supabase + PostgreSQL) and defines conventions for metric naming, aggregation, and display. Designed for Vercel/serverless — no Prometheus scraping required.
Codifies database-backed metrics instrumentation for NodeJS-Starter-V1 using Supabase/PostgreSQL, covering standardised metric types (counters, gauges, histograms), naming conventions, time-series aggregation queries, and optional OpenTelemetry export for serverless-compatible observability.
structured-logging instead)dashboard-patterns when available)error-taxonomy instead)cron-scheduler instead){domain}_{entity}_{measurement} convention| Component | Location | Purpose |
|---|---|---|
AgentMetrics class | apps/backend/src/monitoring/agent_metrics.py | Task execution tracking, health reports |
TaskMetrics model | apps/backend/src/monitoring/agent_metrics.py | Per-task metric schema |
AgentHealthReport model | apps/backend/src/monitoring/agent_metrics.py | Agent health aggregation |
| Analytics routes | apps/backend/src/api/routes/analytics.py | /metrics/overview, /metrics/agents, /metrics/costs |
| Dashboard routes | apps/backend/src/api/routes/agent_dashboard.py | /stats, /list, /{id}/health, /performance/trends |
| Health routes | apps/backend/src/api/routes/health.py | /health, /ready |
| Table | Stores | Key Columns |
|---|---|---|
agent_runs | Agent execution records | id, agent_type, status, metadata, started_at, completed_at |
api_usage | LLM API call costs | model, cost_usd, input_tokens, output_tokens, created_at |
tool_usage_events | Tool invocation records | agent_run_id, tool details, timestamps |
| Component | Location | Purpose |
|---|---|---|
MetricTile | apps/web/components/status-command-centre/ | Stat tile with trend indicator |
| Analytics dashboard | apps/web/app/dashboard-analytics/page.tsx | Overview metrics display |
| Proxy route | apps/web/app/api/analytics/metrics/overview/route.ts | Backend proxy |
| Type | Behaviour | Storage | Examples |
|---|---|---|---|
| Counter | Monotonically increasing | INSERT per event, COUNT(*) for total | agent_run_total, llm_token_total |
| Gauge | Point-in-time value (up or down) | UPSERT — latest value wins | agent_run_active, queue_depth |
| Histogram | Value distribution | INSERT per observation, percentile queries | api_request_duration_ms |
All three types are stored in PostgreSQL (no in-memory counters — they vanish between serverless invocations). Counters and histograms go to metrics_events; gauges go to metrics_gauges.
All metric names follow the pattern:
{domain}_{entity}_{measurement}[_{unit}]
| Segment | Examples | Rules |
|---|---|---|
domain | agent, api, cron, auth | snake_case, matches module |
entity | run, request, token, job | singular noun |
measurement | total, duration, rate, size | what is being measured |
unit (optional) | ms, bytes, usd, percent | SI or currency unit |
| Metric Name | Type | Labels | Description |
|---|---|---|---|
agent_run_total | Counter | agent_type, status | Total agent executions |
agent_run_duration_ms | Histogram | agent_type | Execution time |
agent_run_active | Gauge | agent_type | Currently running agents |
api_request_total | Counter | method, route, status_code | HTTP requests |
api_request_duration_ms | Histogram | method, route | Request latency |
llm_token_total | Counter | model, direction | Token usage (input/output) |
llm_cost_usd | Counter | model | LLM API cost |
cron_job_duration_ms | Histogram | job_name | Cron execution time |
cron_job_total | Counter | job_name, status | Cron executions |
auth_login_total | Counter | method, result | Login attempts |
A thin wrapper around the existing AgentMetrics pattern, extended with standard metric types. Three methods: increment() (counter), observe() (histogram), set_gauge() (gauge).
from datetime import datetime, UTC
from src.state.supabase import SupabaseStateStore
from src.utils import get_logger
logger = get_logger(__name__)
class MetricsRegistry:
"""Centralised metrics collection and query interface."""
def __init__(self) -> None:
self.store = SupabaseStateStore()
self.client = self.store.client
async def increment(self, metric: str, value: int = 1, labels: dict[str, str] | None = None) -> None:
"""Increment a counter metric."""
self.client.table("metrics_events").insert(
{"metric_name": metric, "metric_type": "counter", "value": value,
"labels": labels or {}, "recorded_at": datetime.now(UTC).isoformat()}
).execute()
logger.debug("metric_recorded", metric=metric, type="counter", value=value)
async def observe(self, metric: str, value: float, labels: dict[str, str] | None = None) -> None:
"""Record a histogram observation."""
self.client.table("metrics_events").insert(
{"metric_name": metric, "metric_type": "histogram", "value": value,
"labels": labels or {}, "recorded_at": datetime.now(UTC).isoformat()}
).execute()
async def set_gauge(self, metric: str, value: float, labels: dict[str, str] | None = None) -> None:
"""Set a gauge value (replaces previous)."""
label_key = str(sorted((labels or {}).items()))
self.client.table("metrics_gauges").upsert(
{"metric_name": metric, "label_key": label_key, "labels": labels or {},
"value": value, "recorded_at": datetime.now(UTC).isoformat()},
on_conflict="metric_name,label_key",
).execute()
# Singleton instance
metrics = MetricsRegistry()
Use BaseHTTPMiddleware to record api_request_total (counter) and api_request_duration_ms (histogram) on every request. Label with method, route, status_code. Example:
import time
from starlette.middleware.base import BaseHTTPMiddleware
from src.monitoring.metrics_registry import metrics
class MetricsMiddleware(BaseHTTPMiddleware):
async def dispatch(self, request, call_next):
start = time.monotonic()
response = await call_next(request)
duration_ms = (time.monotonic() - start) * 1000
labels = {"method": request.method, "route": request.url.path, "status_code": str(response.status_code)}
await metrics.increment("api_request_total", labels=labels)
await metrics.observe("api_request_duration_ms", duration_ms, labels=labels)
return response
Extend AgentMetrics.track_task_execution to also record standard metrics:
await metrics.increment("agent_run_total", labels={"agent_type": agent_type, "status": status})
await metrics.observe("agent_run_duration_ms", task_metrics.duration_seconds * 1000, labels={"agent_type": agent_type})
Query metrics grouped by time period for trend charts. Fetch events from metrics_events, group by truncated timestamp (minute/hour/day), and compute count, sum, avg, min, max per bucket. Use PostgreSQL date_trunc via Supabase RPC for server-side efficiency, or bucket in Python for small datasets:
async def get_metric_timeseries(
self, metric_name: str, bucket: str = "hour", since_hours: int = 24,
) -> list[dict[str, Any]]:
since = (datetime.now(UTC) - timedelta(hours=since_hours)).isoformat()
result = self.client.table("metrics_events").select("value, recorded_at").eq(
"metric_name", metric_name
).gte("recorded_at", since).order("recorded_at").execute()
buckets: dict[str, list[float]] = {}
for row in (result.data or []):
ts = datetime.fromisoformat(row["recorded_at"])
fmt = {"hour": "%Y-%m-%dT%H:00:00Z", "day": "%Y-%m-%dT00:00:00Z"}.get(bucket, "%Y-%m-%dT%H:%M:00Z")
buckets.setdefault(ts.strftime(fmt), []).append(row["value"])
return [{"timestamp": ts, "count": len(v), "sum": sum(v), "avg": sum(v) / len(v), "min": min(v), "max": max(v)} for ts, v in buckets.items()]
For histogram metrics, compute p50/p90/p95/p99 using statistics.quantiles. Query metrics_events filtered by metric_name and time range, sort values, then calculate:
import statistics
async def get_percentiles(self, metric_name: str, since_hours: int = 24) -> dict[str, float]:
since = (datetime.now(UTC) - timedelta(hours=since_hours)).isoformat()
result = self.client.table("metrics_events").select("value").eq(
"metric_name", metric_name
).gte("recorded_at", since).execute()
values = sorted(r["value"] for r in (result.data or []))
if len(values) < 2:
return {"p50": 0, "p90": 0, "p95": 0, "p99": 0}
q = statistics.quantiles(values, n=100)
return {"p50": q[49], "p90": q[89], "p95": q[94], "p99": q[98]}
Expose a /metrics/summary endpoint returning counters, gauges, and histogram percentiles:
@router.get("/metrics/summary")
async def get_metrics_summary(since_hours: int = Query(24, ge=1, le=720)) -> dict[str, Any]:
registry = MetricsRegistry()
return {
"counters": await registry.get_counter_totals(since_hours),
"gauges": await registry.get_current_gauges(),
"histograms": {
"api_request_duration_ms": await registry.get_percentiles("api_request_duration_ms", since_hours),
"agent_run_duration_ms": await registry.get_percentiles("agent_run_duration_ms", since_hours),
},
"since_hours": since_hours,
"generated_at": datetime.now(UTC).isoformat(),
}
Define MetricSummary TypeScript interface mirroring the backend response. Use the existing proxy route pattern (apps/web/app/api/analytics/) to forward requests. Poll at 30-second intervals (matching the existing analytics dashboard) or use Supabase Realtime for gauge updates. Display via the existing MetricTile component from status-command-centre/.
CREATE TABLE IF NOT EXISTS metrics_events (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
metric_name TEXT NOT NULL,
metric_type TEXT NOT NULL CHECK (metric_type IN ('counter', 'histogram')),
value DOUBLE PRECISION NOT NULL,
labels JSONB NOT NULL DEFAULT '{}',
recorded_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_metrics_events_name_time ON metrics_events (metric_name, recorded_at DESC);
CREATE INDEX idx_metrics_events_labels ON metrics_events USING GIN (labels);
CREATE TABLE IF NOT EXISTS metrics_gauges (
metric_name TEXT NOT NULL,
label_key TEXT NOT NULL DEFAULT '',
labels JSONB NOT NULL DEFAULT '{}',
value DOUBLE PRECISION NOT NULL,
recorded_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
PRIMARY KEY (metric_name, label_key)
);
Schedule a cron job (/api/cron/metrics-cleanup) to DELETE FROM metrics_events WHERE recorded_at < NOW() - INTERVAL '90 days'.
For production with full observability infrastructure (Grafana, Datadog), optionally export metrics to an OTel collector by creating MeterProvider instruments that mirror the database metric names. Only enable when OTEL_EXPORTER_OTLP_ENDPOINT is configured. The database-backed approach works standalone for Vercel/serverless.
| Anti-Pattern | Why It Fails | Correct Approach |
|---|---|---|
| In-memory counters on serverless | Lost between invocations, no persistence | Database-backed metrics |
| Prometheus scrape endpoint on Vercel | No persistent process to scrape | Database storage + query endpoints |
| Logging metrics as unstructured strings | Cannot aggregate or query | MetricsRegistry with typed methods |
| Recording per-request without labels | Cannot filter by route or status | Always include method, route, status_code labels |
| Querying raw events for dashboards | Slow on large datasets | Pre-aggregate via cron or use time-bucket queries |
| Unbounded metrics_events growth | Storage costs, slow queries | 90-day retention cron job |
{domain}_{entity}_{measurement}[_{unit}] conventionMetricsRegistry singleton (metrics.increment, metrics.observe, metrics.set_gauge)structured-logging (debug-level metric recording logs)[AGENT_ACTIVATED]: Metrics Collector
[PHASE]: {Design | Implementation | Review}
[STATUS]: {in_progress | complete}
{metrics analysis or implementation guidance}
[NEXT_ACTION]: {what to do next}
MetricsRegistry emits metric_recorded debug-level logscorrelation_id from request context for tracingSYS_RUNTIME_METRICS (500) — logged but never thrown to callerDATA_VALIDATION_METRIC_NAME (422)metrics-cleanup cron job deletes events older than 90 dayscron_job_total, cron_job_duration_ms) recorded by cron handlers/metrics/summary endpoint included in OpenAPI docsMetricSummary response model typed with Pydantic (backend) and TypeScript interface (frontend)MetricsRegistry provides the data layer for dashboard visualisationsrecorded_at stored as UTC, converted in displayllm_cost_usd stored in USD (API billing currency); convert to AUD for displayRoute complex requests to the right specialist agent or chain of agents. This skill acts as the central brain of an agent swarm — it analyses what the user needs, determines which specialist domain(s) are required, and coordinates parallel or sequential agent execution. Use this skill when a request spans multiple domains (e.g., "research competitors and create a pitch deck"), when you need to decide which specialist should handle an ambiguous request, or when a task requires a multi-step pipeline across different skills. Triggers on: multi-step requests, cross-domain tasks, "coordinate", "plan this out", "I need help with multiple things", or any complex request that touches more than one specialist area. Also triggers when the user seems unsure which tool or approach to use.
>-
Hybrid DAG execution primitive combining deterministic and agentic nodes with hard iteration caps
">"
Act as a brand ambassador — create authentic, platform-specific social media content that embodies a brand's identity and connects with audiences. Use this skill whenever the user asks to "create social media content for a brand", "act as a brand ambassador", "write ambassador posts", "promote [brand] on social media", "create influencer-style content", "write authentic brand content", "social media ambassador", or any request involving representing a brand through social content. Also triggers on "ambassador voice", "brand promotion posts", "influencer content", "authentic brand posts", "UGC-style content", or when someone wants social media content that sounds like a real person recommending a brand rather than corporate marketing copy. Even if the user just says "help me promote [brand]" or "I need content for [brand]'s social channels" — use this skill.
>-