| name | setup-wide-logging |
| description | Set up wide-event logging with tail sampling to replace scattered logs with canonical log lines. Auto-detects framework (express|koa|fastify|nextjs) and logger (pino|winston|bunyan|console). |
External Output Boundary (MANDATORY)
Workflow artifacts and command internals are private implementation context. Never expose them in external-facing outputs.
- Internal context includes workflow artifact paths, stage names or numbers, skill names, sub-agent names, prompt/tooling details, control-file metadata, and private chain-of-thought or reasoning traces.
- External-facing outputs include commit messages, branch names, PR titles/bodies/comments, release notes, changelog entries, user documentation, README content, code comments/docstrings, issue comments, deployment notes, and any file outside the private workflow artifact directories.
- When producing external-facing output, translate workflow context into product/project language: user-visible change, rationale, affected areas, verification, risks, migration notes, and follow-up work.
- Before writing, committing, pushing, opening a PR, updating docs/comments, or publishing anything, perform a leak check and remove internal workflow references unless the user explicitly asks for a private/internal artifact.
Setup Wide-Event Logging
You are an observability architect implementing wide events / canonical log lines with tail sampling following the philosophy from loggingsucks.com.
Arguments: $ARGUMENTS — optional FRAMEWORK and LOGGER (e.g. express pino, fastify winston). If not provided, auto-detect from the project.
Core Philosophy
Traditional logging is broken:
- Optimized for writing, not querying - scattered logs create noise, not insight
- Missing business context - logs lack user tier, feature flags, cart value, account age
- String search inadequacy - grep can't correlate events across services
- Multi-search debugging nightmare - requires multiple searches to understand one request
The Solution: Emit ONE comprehensive event per request per service containing:
- Technical metadata (timestamps, IDs, duration)
- Business context (user subscription, cart value, feature flags)
- Error details when applicable
- Complete request context in a single queryable event
Step 1: Detect Current Environment
1.1 Detect Web Framework
Search project files to identify the web framework:
cat package.json | grep -E '"(express|koa|fastify|next)"'
grep -r "from 'express'" src/
grep -r "from 'koa'" src/
grep -r "from 'fastify'" src/
grep -r "from 'next'" src/
Common patterns:
- Express:
import express from 'express', app.get(), app.post()
- Koa:
import Koa from 'koa', app.use(async (ctx, next) => {})
- Fastify:
import fastify from 'fastify', fastify.get(), fastify.post()
- Next.js:
pages/api/, app/api/, getServerSideProps
Set FRAMEWORK to detected value.
1.2 Detect Logger
Search project files to identify the logger:
cat package.json | grep -E '"(pino|winston|bunyan)"'
grep -r "from 'pino'" src/
grep -r "from 'winston'" src/
grep -r "from 'bunyan'" src/
grep -r "console.log" src/
Common patterns:
- Pino:
import pino from 'pino', logger.info({ ... }, 'message')
- Winston:
import winston from 'winston', logger.info('message', { ... })
- Bunyan:
import bunyan from 'bunyan', logger.info({ ... }, 'message')
- Console:
console.log(), console.error()
Set LOGGER to detected value.
1.3 Analyze Current Logging Approach
Search for existing log statements:
grep -r "logger\.\(info\|error\|debug\|warn\)" src/
grep -r "console\.\(log\|error\|info\)" src/
grep -r "logger\.info" src/ | cut -d: -f1 | sort | uniq -c | sort -nr
grep -B5 -A5 "logger\.info.*started" src/
grep -B5 -A5 "logger\.info.*completed" src/
Identify anti-patterns:
- Multiple log statements per request handler (diary logging)
- Log statements at start, middle, end of function
- Missing correlation IDs
- Logging primitive values instead of structured objects
- Secrets or PII in logs
Step 2: Design Wide Event Schema
Based on the detected framework and application type, design a TypeScript interface for wide events.
2.1 Base Wide Event Schema
export interface WideEvent {
timestamp: string;
request_id: string;
trace_id?: string;
span_id?: string;
parent_span_id?: string;
service: string;
version: string;
deployment_id: string;
region: string;
environment: string;
hostname: string;
method: string;
path: string;
route?: string;
status_code?: number;
duration_ms?: number;
outcome?: 'success' | 'error';
user?: {
id: string;
subscription?: 'free' | 'premium' | 'enterprise';
account_age_days?: number;
lifetime_value_cents?: number;
cohort?: string;
is_internal?: boolean;
};
client?: {
ip?: string;
user_agent?: string;
country?: string;
device_type?: 'mobile' | 'tablet' | 'desktop';
};
feature_flags?: Record<string, boolean | string>;
performance?: {
db_query_count?: number;
db_duration_ms?: number;
cache_hit?: boolean;
external_api_calls?: number;
external_api_duration_ms?: number;
};
error?: {
type: string;
code: string;
message: string;
stack?: string;
retriable: boolean;
provider_code?: string;
};
[key: string]: any;
}
2.2 Domain-Specific Extensions
Add fields based on application type:
E-commerce:
export interface EcommerceWideEvent extends WideEvent {
cart?: {
total_cents: number;
item_count: number;
currency: string;
coupon_applied?: string;
};
payment?: {
provider: 'stripe' | 'paypal' | 'square';
method: 'card' | 'bank' | 'wallet';
latency_ms: number;
attempt: number;
decline_reason?: string;
};
order?: {
id: string;
total_cents: number;
items: number;
shipping_method?: string;
};
}
SaaS Application:
export interface SaaSWideEvent extends WideEvent {
workspace?: {
id: string;
plan: 'free' | 'pro' | 'enterprise';
seat_count: number;
mrr_cents: number;
};
usage?: {
api_calls_today: number;
quota_limit: number;
quota_remaining: number;
};
}
Step 3: Implement Tail Sampling
export function shouldSample(event: WideEvent): boolean {
if (event.status_code && event.status_code >= 500) return true;
if (event.error) return true;
if (event.status_code && event.status_code >= 400 && event.status_code < 500) {
return Math.random() < 0.10;
}
if (event.duration_ms && event.duration_ms > 2000) return true;
if (event.duration_ms && event.duration_ms > 1000) return Math.random() < 0.50;
if (event.user?.subscription === 'enterprise') return true;
if (event.user?.lifetime_value_cents && event.user.lifetime_value_cents > 10000_00) return true;
if (event.user?.is_internal) return true;
if (event.feature_flags && Object.keys(event.feature_flags).length > 0) return true;
if (event.user?.account_age_days !== undefined && event.user.account_age_days < 7) return true;
const criticalPaths = ['/api/checkout', '/api/payment', '/api/auth/login', '/api/auth/signup'];
if (event.path && criticalPaths.some(path => event.path.startsWith(path))) {
return Math.random() < 0.20;
}
return Math.random() < 0.05;
}
Step 4: Implement Wide Event Middleware (Framework-Specific)
4.1 Express Middleware
import type { Request, Response, NextFunction } from 'express';
import crypto from 'crypto';
import { WideEvent } from '../wideEvent';
import { shouldSample } from '../tailSampling';
declare global {
namespace Express {
interface Request {
wideEvent: WideEvent;
}
}
}
export function wideEventMiddleware(options: {
logger: { info: (obj: any, msg?: string) => void; error: (obj: any, msg?: string) => void };
serviceName?: string;
version?: string;
deploymentId?: string;
region?: string;
environment?: string;
}) {
const {
logger,
serviceName = process.env.SERVICE_NAME || 'unknown',
version = process.env.SERVICE_VERSION || process.env.npm_package_version || '0.0.0',
deploymentId = process.env.DEPLOYMENT_ID || 'local',
region = process.env.AWS_REGION || process.env.REGION || 'local',
environment = process.env.NODE_ENV || 'development',
} = options;
return function (req: Request, res: Response, next: NextFunction) {
const start = Date.now();
const request_id = req.header('x-request-id') ?? crypto.randomUUID();
const event: WideEvent = {
timestamp: new Date().toISOString(),
request_id,
service: serviceName,
version,
deployment_id: deploymentId,
region,
environment,
hostname: process.env.HOSTNAME || require('os').hostname(),
method: req.method,
path: req.path,
client: { ip: req.ip, user_agent: req.header('user-agent') },
};
req.wideEvent = event;
res.setHeader('x-request-id', request_id);
res.on('finish', () => {
event.status_code = res.statusCode;
event.duration_ms = Date.now() - start;
event.outcome = res.statusCode >= 500 ? 'error' : 'success';
if (req.route) event.route = req.route.path;
if (shouldSample(event)) {
if (event.error || res.statusCode >= 500) {
logger.error(event, 'request_complete');
} else {
logger.info(event, 'request_complete');
}
}
});
next();
};
}
export function wideEventErrorHandler(err: any, req: Request, res: Response, next: any) {
if ((req as any).wideEvent) {
(req as any).wideEvent.error = {
type: err.name || 'Error',
code: err.code || 'unknown',
message: err.message || String(err),
stack: process.env.NODE_ENV === 'production' ? undefined : err.stack,
retriable: err.retriable ?? false,
};
}
next(err);
}
4.2 Fastify Plugin
import type { FastifyInstance, FastifyRequest, FastifyReply } from 'fastify';
import fp from 'fastify-plugin';
import crypto from 'crypto';
import { WideEvent } from '../wideEvent';
import { shouldSample } from '../tailSampling';
declare module 'fastify' {
interface FastifyRequest { wideEvent: WideEvent; }
}
async function wideEventPlugin(fastify: FastifyInstance, options: {
serviceName?: string; version?: string; deploymentId?: string; region?: string; environment?: string;
}) {
const {
serviceName = process.env.SERVICE_NAME || 'unknown',
version = process.env.SERVICE_VERSION || '0.0.0',
deploymentId = process.env.DEPLOYMENT_ID || 'local',
region = process.env.AWS_REGION || 'local',
environment = process.env.NODE_ENV || 'development',
} = options;
fastify.addHook('onRequest', async (request, reply) => {
const request_id = request.headers['x-request-id'] as string || crypto.randomUUID();
request.wideEvent = {
timestamp: new Date().toISOString(),
request_id,
service: serviceName, version, deployment_id: deploymentId,
region, environment,
hostname: process.env.HOSTNAME || require('os').hostname(),
method: request.method, path: request.url,
route: request.routerPath,
client: { ip: request.ip, user_agent: request.headers['user-agent'] },
};
reply.header('x-request-id', request_id);
});
fastify.addHook('onResponse', async (request, reply) => {
const event = request.wideEvent;
if (!event) return;
event.status_code = reply.statusCode;
event.duration_ms = reply.getResponseTime();
event.outcome = reply.statusCode >= 500 ? 'error' : 'success';
if (shouldSample(event)) {
if (event.error || reply.statusCode >= 500) fastify.log.error(event, 'request_complete');
else fastify.log.info(event, 'request_complete');
}
});
fastify.addHook('onError', async (request, reply, error) => {
if (request.wideEvent) {
request.wideEvent.error = {
type: error.name || 'Error',
code: (error as any).code || 'unknown',
message: error.message,
stack: process.env.NODE_ENV === 'production' ? undefined : error.stack,
retriable: (error as any).retriable ?? false,
};
}
});
}
export default fp(wideEventPlugin, { name: 'wide-event-logging', fastify: '4.x' });
4.3 Koa Middleware
import type { Context, Next } from 'koa';
import crypto from 'crypto';
import { WideEvent } from '../wideEvent';
import { shouldSample } from '../tailSampling';
export function wideEventMiddleware(options: {
logger: { info: (obj: any, msg?: string) => void; error: (obj: any, msg?: string) => void };
serviceName?: string; version?: string; deploymentId?: string; region?: string; environment?: string;
}) {
const {
logger,
serviceName = process.env.SERVICE_NAME || 'unknown',
version = process.env.SERVICE_VERSION || '0.0.0',
deploymentId = process.env.DEPLOYMENT_ID || 'local',
region = process.env.AWS_REGION || 'local',
environment = process.env.NODE_ENV || 'development',
} = options;
return async function (ctx: Context, next: Next) {
const start = Date.now();
const request_id = ctx.request.header['x-request-id'] as string || crypto.randomUUID();
const event: WideEvent = {
timestamp: new Date().toISOString(), request_id,
service: serviceName, version, deployment_id: deploymentId,
region, environment, hostname: process.env.HOSTNAME || require('os').hostname(),
method: ctx.method, path: ctx.path,
route: ctx._matchedRoute,
client: { ip: ctx.ip, user_agent: ctx.header['user-agent'] },
};
ctx.wideEvent = event;
ctx.set('x-request-id', request_id);
try {
await next();
} catch (err: any) {
event.error = {
type: err.name || 'Error', code: err.code || 'unknown',
message: err.message || String(err),
stack: process.env.NODE_ENV === 'production' ? undefined : err.stack,
retriable: err.retriable ?? false,
};
throw err;
} finally {
event.status_code = ctx.status;
event.duration_ms = Date.now() - start;
event.outcome = ctx.status >= 500 ? 'error' : 'success';
if (shouldSample(event)) {
if (event.error || ctx.status >= 500) logger.error(event, 'request_complete');
else logger.info(event, 'request_complete');
}
}
};
}
Step 5: Enrich Wide Events in Route Handlers
router.post('/api/checkout', authenticateUser, async (req, res) => {
const event = req.wideEvent;
try {
const user = req.user;
event.user = {
id: user.id, subscription: user.subscription,
account_age_days: Math.floor((Date.now() - user.createdAt.getTime()) / (1000 * 60 * 60 * 24)),
lifetime_value_cents: user.lifetimeValueCents,
is_internal: user.email.endsWith('@company.com'),
};
event.feature_flags = await getFeatureFlags(user.id);
const cart = await getCart(user.id);
event.cart = { total_cents: cart.totalCents, item_count: cart.items.length, currency: cart.currency };
const paymentStart = Date.now();
const payment = await processPayment(cart, user);
event.payment = { provider: payment.provider, method: payment.method, latency_ms: Date.now() - paymentStart, attempt: payment.attempt || 1 };
event.order = { id: payment.orderId, total_cents: payment.amountCents, items: cart.items.length };
res.json({ success: true, orderId: payment.orderId, requestId: event.request_id });
} catch (err: any) {
event.error = {
type: err.name, code: err.code || 'unknown', message: err.message,
retriable: err.retriable ?? false, provider_code: err.providerCode,
};
res.status(err.statusCode || 500).json({ error: err.code, message: err.userMessage || 'Payment failed', requestId: event.request_id });
}
});
Step 6: Configure Logger with Redaction
Pino Logger Configuration
import pino from 'pino';
const logger = pino({
level: process.env.LOG_LEVEL || 'info',
...(process.env.NODE_ENV === 'development' && {
transport: { target: 'pino-pretty', options: { colorize: true, ignore: 'pid,hostname', translateTime: 'SYS:HH:MM:ss' } },
}),
formatters: { level: (label) => ({ level: label }) },
redact: {
paths: [
'password', 'passwordHash', 'token', 'accessToken', 'refreshToken',
'apiKey', 'api_key', 'secret', 'authorization', 'cookie', 'session',
'creditCard', 'credit_card', 'cardNumber', 'card_number', 'cvv', 'ssn',
'bankAccount', 'bank_account', 'email', 'phone', 'address', 'ip',
'req.headers.authorization', 'req.headers.cookie', 'req.body.password', 'error.stack',
],
remove: true,
},
});
export default logger;
Winston Logger Configuration
import winston from 'winston';
const redactFields = ['password', 'token', 'apiKey', 'secret', 'creditCard', 'ssn', 'authorization', 'cookie'];
function redactSensitive(info: any): any {
const redacted = { ...info };
function redactObject(obj: any) {
for (const key in obj) {
if (redactFields.some(field => key.toLowerCase().includes(field.toLowerCase()))) {
obj[key] = '[REDACTED]';
} else if (typeof obj[key] === 'object' && obj[key] !== null) {
redactObject(obj[key]);
}
}
}
redactObject(redacted);
return redacted;
}
const logger = winston.createLogger({
level: process.env.LOG_LEVEL || 'info',
format: winston.format.combine(
winston.format.timestamp(),
winston.format((info) => redactSensitive(info))(),
winston.format.json()
),
transports: [new winston.transports.Console({
format: process.env.NODE_ENV === 'development'
? winston.format.combine(winston.format.colorize(), winston.format.simple())
: winston.format.json(),
})],
});
export default logger;
Step 7: Migrate Existing Log Statements
DO NOT migrate all at once. Follow this phased approach:
Phase 1: Add middleware (Week 1)
- Add wide event middleware; dual log (keep existing + new wide events); verify events; test sampling.
Phase 2: Enrich critical paths (Week 2)
- Add business context to top 5 endpoints; add user context, feature flags, domain-specific fields.
Phase 3: Remove scattered logs (Week 3)
- Remove redundant log statements in migrated endpoints; keep infrastructure logs; update runbooks.
Phase 4: Tune sampling (Week 4)
- Monitor sampling rates; adjust thresholds; optimize for cost vs signal.
Migration Patterns
logger.info('Payment processing started', { userId, amount });
const result = await processPayment();
logger.info('Payment processing completed', { userId, duration: Date.now() - start });
event.payment = { provider: 'stripe', amount_cents: amount };
const result = await processPayment();
try { await processPayment(); } catch (err) { logger.error('Payment failed', { error: err.message }); throw err; }
try { await processPayment(); } catch (err: any) {
event.error = { type: err.name, code: err.code, message: err.message, retriable: err.retriable };
throw err;
}
logger.info('User info', { userId, subscription });
logger.info('Cart info', { cartTotal, items });
event.user = { id: userId, subscription };
event.cart = { total_cents: cartTotal, item_count: items };
Step 8: Document Query Examples
SELECT error.code, COUNT(*) as failure_count, AVG(duration_ms) as avg_duration_ms
FROM logs
WHERE path = '/api/checkout' AND outcome = 'error'
AND user.subscription = 'premium' AND feature_flags.new_checkout_flow = true
AND @timestamp > ago(1h)
GROUP BY error.code ORDER BY failure_count DESC;
SELECT payment.provider, region,
PERCENTILE(payment.latency_ms, 95) as p95, PERCENTILE(payment.latency_ms, 99) as p99
FROM logs
WHERE path = '/api/checkout' AND payment.provider IS NOT NULL AND @timestamp > ago(24h)
GROUP BY payment.provider, region ORDER BY p95 DESC;
SELECT feature_flags.new_checkout_flow as has_new_flow,
COUNT(*) as request_count,
SUM(CASE WHEN outcome = 'error' THEN 1 ELSE 0 END) / COUNT(*) as error_rate,
AVG(duration_ms) as avg_latency_ms
FROM logs
WHERE path = '/api/checkout' AND @timestamp > ago(1h)
GROUP BY has_new_flow;
SELECT path, user.subscription, cart.total_cents, duration_ms, error.code
FROM logs WHERE duration_ms > 2000 AND @timestamp > ago(1h)
ORDER BY duration_ms DESC LIMIT 100;
Deliverables
At the end of this skill, you should have:
- Wide Event Schema (TypeScript interface in
src/observability/wideEvent.ts)
- Tail Sampling Logic (
shouldSample function in src/observability/tailSampling.ts)
- Framework Middleware (Express/Fastify/Koa/Next.js, whichever applies)
- Logger Configuration (Pino/Winston/Bunyan with redaction, whichever applies)
- Migration Examples (before/after code for the main request handlers)
- Query Examples (SQL for CloudWatch/Datadog/Elastic)
Success Metrics
Track these to measure improvement:
- Log Volume Reduction: 80-90% reduction with tail sampling
- MTTR (Mean Time to Resolution): Faster debugging with full context
- Query Simplicity: Multi-step grep → single structured query
- Context Completeness: % of events with business fields populated
References
- Logging Sucks - Core philosophy
- OpenTelemetry - Trace context integration
- Pino - High-performance logger
- Wide-Event Observability skill - Full philosophy and examples