Run any Skill in Manus with one click

monitoring-observability

Stars193

Forks15

UpdatedJune 20, 2026 at 10:50

Monitoring and observability patterns for Prometheus metrics, Grafana dashboards, Langfuse v4 LLM tracing (as_type, score_current_span, should_export_span, LangfuseMedia), and drift detection. Use when adding logging, metrics, distributed tracing, LLM cost tracking, or quality drift monitoring.

Installation

Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.

Run Skill in Manus

Source

yonatangross

yonatangross/orchestkit

View GitHub Repository View Creator Repositories

Download

Run Skill in Manus

Related occupationsSOC

Based on SOC occupation classification

Software DevelopersComputer and Mathematical Occupations·SOC 15-1252

File Explorer

53 files

SKILL.md

readonly

More from this repository

same repository

doctor

yonatangross/orchestkit

OrchestKit doctor for health diagnostics across manifest integrity, hook configuration, skill validation, agent frontmatter, MCP server connectivity, CC version compatibility, and permission rules. Reports issues with severity levels and auto-remediation suggestions. Validates component counts, detects orphaned entries, and checks CC version matrix compliance. Use when diagnosing plugin health, troubleshooting configuration issues, or running pre-release checks.

2026-06-23193

mcp-visual-output

yonatangross/orchestkit

Interactive MCP visual output via @json-render/mcp. Upgrade plain JSON tool responses to interactive dashboards rendered in sandboxed iframes inside Claude, Cursor, ChatGPT, VS Code Copilot, Goose, and Postman conversations. Covers createMcpApp(), registerJsonRenderTool(), registerJsonRenderResource(), CSP config, JSON Patch streaming, and dashboard component patterns. Use when building MCP servers that return visual output, upgrading existing MCP tools with interactive UI, or creating eval/monitoring dashboards.

2026-06-22193

multi-surface-render

yonatangross/orchestkit

Multi-surface rendering with json-render — same JSON spec produces React web, Next.js apps, React Native, Ink terminal UIs, PDFs, emails, Remotion videos, OG images, and 3D scenes. Covers renderer target selection, registry mapping, and platform-specific APIs (renderToBuffer, renderToStream, renderToFile). Use when generating output for multiple platforms, creating PDF reports, email templates, demo videos, or social media images from a single component spec.

2026-06-22193

validate-counts

yonatangross/orchestkit

Validates hook, skill, and agent counts are consistent across CLAUDE.md, hooks.json, manifests, and source directories. Use when counts may be stale after adding or removing components, before releases, or when CLAUDE.md Project Overview looks wrong.

2026-06-22193

browser-tools

yonatangross/orchestkit

OrchestKit security wrapper for browser automation. Adds URL blocklisting, rate limiting, robots.txt enforcement, and ethical scraping guardrails on top of the upstream agent-browser skill. Use when automating browser workflows that need safety guardrails.

2026-06-22193

dev

yonatangross/orchestkit

One-command dev loop boot. Spins up portless (named HTTPS subdomain), emulate (stateful API mocks), the project's dev server, and an agent-browser session — all using the current git branch as the namespace key. Replaces the 4-terminal manual setup with a single `/ork:dev` invocation. Use when starting a new feature branch, switching worktrees, or returning to a project after a break. Skip silently when prerequisite binaries (portless, emulate, agent-browser) are missing — emits install hints.

2026-06-22193

name	monitoring-observability
license	MIT
compatibility	Claude Code 2.1.183+.
description	Monitoring and observability patterns for Prometheus metrics, Grafana dashboards, Langfuse v4 LLM tracing (as_type, score_current_span, should_export_span, LangfuseMedia), and drift detection. Use when adding logging, metrics, distributed tracing, LLM cost tracking, or quality drift monitoring.
tags	["monitoring","observability","prometheus","grafana","langfuse","tracing","metrics","drift-detection","logging"]
context	fork
version	3.0.0
author	OrchestKit
user-invocable	false
disable-model-invocation	true
complexity	medium
persuasion-type	reference
targets	[{"library":"langfuse","version":">=4.0.0"}]
metadata	{"category":"document-asset-creation"}
allowed-tools	["Read","Glob","Grep","WebFetch","WebSearch"]
path_patterns	["/metrics/","/tracing/","prometheus.","grafana/*"]

Monitoring & Observability

Comprehensive patterns for infrastructure monitoring, LLM observability, and quality drift detection. Each category has individual rule files in rules/ loaded on-demand.

Quick Reference

Category	Rules	Impact	When to Use
Infrastructure Monitoring	3	CRITICAL	Prometheus metrics, Grafana dashboards, alerting rules
LLM Observability	3	HIGH	Langfuse tracing, cost tracking, evaluation scoring
Drift Detection	3	HIGH	Statistical drift, quality regression, drift alerting
Silent Failures	3	HIGH	Tool skipping, quality degradation, loop/token spike alerting

Total: 12 rules across 4 categories

Quick Start

# Prometheus metrics with RED method
from prometheus_client import Counter, Histogram

http_requests = Counter('http_requests_total', 'Total requests', ['method', 'endpoint', 'status'])
http_duration = Histogram('http_request_duration_seconds', 'Request latency',
    buckets=[0.01, 0.05, 0.1, 0.5, 1, 2, 5])

# Langfuse v4 LLM tracing — semantic as_type + inline scoring
from langfuse import observe, get_client

@observe(as_type="generation", name="analyze_content")
async def analyze_content(content: str):
    get_client().update_current_trace(
        user_id="user_123", session_id="session_abc",
        tags=["production", "orchestkit"],
    )
    result = await llm.generate(content)
    get_client().score_current_span(name="response_quality", value=0.85)
    return result

# PSI drift detection
import numpy as np

psi_score = calculate_psi(baseline_scores, current_scores)
if psi_score >= 0.25:
    alert("Significant quality drift detected!")

Infrastructure Monitoring

Prometheus metrics, Grafana dashboards, and alerting for application health.

Rule	File	Key Pattern
Prometheus Metrics	`rules/monitoring-prometheus.md`	RED method, counters, histograms, cardinality
Grafana Dashboards	`rules/monitoring-grafana.md`	Golden Signals, SLO/SLI, health checks
Alerting Rules	`rules/monitoring-alerting.md`	Severity levels, grouping, escalation, fatigue prevention

CC 2.1.161 — OTEL resource attributes as metric labels: OTEL_RESOURCE_ATTRIBUTES values are now attached as labels on metric datapoints, so usage metrics can be sliced by custom dimensions (team, repo, environment). Add label selectors to dashboards for multi-tenant / per-team cost and usage tracking.

LLM Observability

Langfuse-based tracing, cost tracking, and evaluation for LLM applications.

Rule	File	Key Pattern
Langfuse Traces	`rules/llm-langfuse-traces.md`	@observe decorator, OTEL spans, agent graphs
Cost Tracking	`rules/llm-cost-tracking.md`	Token usage, spend alerts, Metrics API v2
Eval Scoring	`rules/llm-eval-scoring.md`	Custom scores, evaluator tracing, quality monitoring

Drift Detection

Statistical and quality drift detection for production LLM systems.

Rule	File	Key Pattern
Statistical Drift	`rules/drift-statistical.md`	PSI, KS test, KL divergence, EWMA
Quality Drift	`rules/drift-quality.md`	Score regression, baseline comparison, canary prompts
Drift Alerting	`rules/drift-alerting.md`	Dynamic thresholds, correlation, anti-patterns

Silent Failures

Detection and alerting for silent failures in LLM agents.

Rule	File	Key Pattern
Tool Skipping	`rules/silent-tool-skipping.md`	Expected vs actual tool calls, Langfuse traces
Quality Degradation	`rules/silent-degraded-quality.md`	Heuristics + LLM-as-judge, z-score baselines
Silent Alerting	`rules/silent-alerting.md`	Loop detection, token spikes, escalation workflow

CC 2.1.169 — OTEL client-cert paths require trust: untrusted project settings can no longer set OTEL client-certificate paths without a trust confirmation. If your OTEL exporter uses client certs configured in project .claude/settings.json, expect a one-time trust prompt on first use in an untrusted project — telemetry silently not flowing after 2.1.169 is usually this gate, not the collector.

Key Decisions

Decision	Recommendation	Rationale
Metric methodology	RED method (Rate, Errors, Duration)	Industry standard, covers essential service health
Log format	Structured JSON	Machine-parseable, supports log aggregation
Tracing	OpenTelemetry	Vendor-neutral, auto-instrumentation, broad ecosystem
LLM observability	Langfuse (not LangSmith)	Open-source, self-hosted, built-in prompt management
LLM tracing API	`@observe(as_type=...)` + `score_current_span()`	v4: semantic types, inline scoring, span filtering
Langfuse APIs	Observations API v2 + Metrics API v2	v4 (Mar 2026): faster querying, aggregations at scale
Drift method	PSI for production, KS for small samples	PSI is stable for large datasets, KS more sensitive
Threshold strategy	Dynamic (95th percentile) over static	Reduces alert fatigue, context-aware
Alert severity	4 levels (Critical, High, Medium, Low)	Clear escalation paths, appropriate response times

Detailed Documentation

Resource	Description
`${CLAUDE_SKILL_DIR}/references/`	Logging, metrics, tracing, Langfuse, drift analysis guides
`${CLAUDE_SKILL_DIR}/checklists/`	Implementation checklists for monitoring and Langfuse setup
`${CLAUDE_SKILL_DIR}/examples/`	Real-world monitoring dashboard and trace examples
`${CLAUDE_SKILL_DIR}/scripts/`	Templates: Prometheus, OpenTelemetry, health checks, Langfuse

Related Skills

defense-in-depth - Layer 8 observability as part of security architecture
devops-deployment - Observability integration with CI/CD and Kubernetes
resilience-patterns - Monitoring circuit breakers and failure scenarios
llm-evaluation - Evaluation patterns that integrate with Langfuse scoring
caching - Caching strategies that reduce costs tracked by Langfuse