// Analyze incident logs, extract error patterns, identify root causes, generate insights and metrics. Use when user mentions logs, incidents, errors, failures, debugging, troubleshooting, log analysis, or investigating production issues.
| name | incident-log-analyzer |
| description | Analyze incident logs, extract error patterns, identify root causes, generate insights and metrics. Use when user mentions logs, incidents, errors, failures, debugging, troubleshooting, log analysis, or investigating production issues. |
| allowed-tools | Read, Grep, Glob, RunCommand |
Analyze logs to identify patterns, extract errors, and generate actionable insights for incident response.
Discovery Phase
Analysis Phase
Report Phase
User: "Analyze the logs in /var/logs/app/ for errors in the last hour"
Claude executes:
1. python scripts/parse_logs.py /var/logs/app/ --since "1 hour ago"
2. python scripts/pattern_detector.py --input parsed_logs.json
3. Generates report with findings
User: "Why did the API start failing at 3 PM?"
Claude executes:
1. Filters logs around 3 PM
2. Identifies spike in 500 errors
3. Traces error source to database connection pool exhaustion
4. Provides evidence and recommendations
User: "Check if the frontend errors are related to backend issues"
Claude:
1. Analyzes frontend logs
2. Analyzes backend logs
3. Correlates timestamps and error patterns
4. Maps frontend errors to backend failures
Parses log files and extracts structured data.
Usage:
python scripts/parse_logs.py <log_directory> [options]
Options:
--format json|text|auto Log format (default: auto)
--since "time" Start time (e.g., "1 hour ago", "2024-01-01")
--until "time" End time
--level error|warn|info Filter by log level
--output FILE Output file (default: parsed_logs.json)
Output: JSON file with structured log entries
Detects patterns, clusters similar errors, generates statistics.
Usage:
python scripts/pattern_detector.py [options]
Options:
--input FILE Input JSON from parse_logs.py
--threshold N Minimum occurrences to report (default: 5)
--output FILE Output report file
Output: JSON report with error patterns and statistics
Generates ASCII timeline visualization of incidents.
Usage:
python scripts/timeline_visualizer.py --input parsed_logs.json
Output: ASCII chart showing error frequency over time
# Incident Log Analysis Report
**Analysis Period**: 2024-01-01 14:00 - 15:00
**Logs Analyzed**: 45,234 entries
**Errors Found**: 1,247
## Summary
Critical errors detected in payment service causing cascade failures
across dependent services.
## Timeline
14:05 ████░░░░░░░░ First errors appear (DB connection) 14:15 ████████████ Error spike (payment service) 14:30 ████████░░░░ Partial recovery 14:45 ██░░░░░░░░░░ Normal operation resumed
## Top Errors
1. **DatabaseConnectionError** (423 occurrences)
- First seen: 14:05:23
- Last seen: 14:32:15
- Affected: payment-service, order-service
- Pattern: Connection pool exhausted
2. **PaymentTimeoutException** (312 occurrences)
- First seen: 14:08:45
- Last seen: 14:28:33
- Affected: payment-service
- Pattern: Downstream service timeout
## Root Cause Analysis
**Primary**: Database connection pool exhausted
**Contributing factors**:
- Sudden traffic spike (3x normal)
- Connection timeout too high (30s)
- No connection pooling limits
## Recommendations
1. Increase connection pool size
2. Reduce connection timeout to 5s
3. Implement circuit breaker
4. Add connection pool monitoring
## Evidence
[14:05:23] ERROR [payment-service] DatabaseConnectionError: Cannot get connection from pool (exhausted) at ConnectionPool.getConnection()
[14:08:45] ERROR [payment-service] PaymentTimeoutException: Timeout waiting for payment processor response at PaymentGateway.processPayment()
The analyzer can correlate errors across multiple services by:
Detects unusual patterns:
Automatically extracts:
This Skill can delegate to:
Scripts require Python 3.8+ with:
pip install python-dateutil pandas numpy
For JSON logs: jq command-line tool (optional, improves performance)