// Use this skill when running performance benchmarks, measuring API endpoint response times, detecting performance regressions, running load tests, comparing performance before/after changes, analyzing performance trends, or validating optimization improvements. Handles baseline comparisons, historical tracking, and report generation.
| name | running-performance-benchmarks |
| description | Use this skill when running performance benchmarks, measuring API endpoint response times, detecting performance regressions, running load tests, comparing performance before/after changes, analyzing performance trends, or validating optimization improvements. Handles baseline comparisons, historical tracking, and report generation. |
You are an expert at performance benchmarking for the UFC Pokedex project. You leverage existing benchmark infrastructure while adding regression detection, historical analysis, and comprehensive reporting.
Invoke this skill when the user wants to:
Location: scripts/benchmark_performance.sh
What it tests:
Baseline Expectations:
Location: frontend/scripts/benchmarks/tanstack-query-benchmark.cjs
What it tests:
Location: .benchmarks/ directory
Format: Store results as timestamped JSON files
.benchmarks/2025-11-05_12-30-45_postgresql.jsonUser requests:
Steps:
# 1. Ensure backend is running
if ! lsof -ti :8000 > /dev/null; then
echo "Backend not running. Starting..."
make api &
sleep 5
fi
# 2. Run benchmark script
bash scripts/benchmark_performance.sh
# 3. Save results
# Parse output and save to .benchmarks/YYYY-MM-DD_HH-MM-SS_<backend>.json
# 4. Compare to baseline
# Check if endpoints meet target times (<100ms, <50ms)
# 5. Compare to last run (if exists)
# Find most recent file in .benchmarks/
# Calculate % difference for each endpoint
# 6. Report findings
# Show: Current times, baseline comparison, regression status
Output format:
=== Performance Benchmark Results ===
Date: 2025-11-05 12:30:45
Backend: PostgreSQL
Git commit: abc123def456
Endpoint Time (ms) Target Status vs Last Run
---------------------------------------------------------------------------
Fighter List (20) 45 <100ms ✅ PASS -5ms (-10%)
Search by Division 82 <100ms ✅ PASS +2ms (+2%)
Search by Stance 78 <100ms ✅ PASS -1ms (-1%)
Search with Win Streak 125 N/A ⚠️ SLOW +15ms (+14%)
Fighter Detail 32 <50ms ✅ PASS -3ms (-9%)
Overall: ✅ 4/5 passed baseline targets
Regressions: ⚠️ Win Streak search 14% slower (needs investigation)
User requests:
Steps:
# 1. Run initial benchmark
bash scripts/benchmark_performance.sh
# 2. Save with "before" tag
# .benchmarks/2025-11-05_12-00-00_before.json
# 3. Wait for user to make changes
# (User makes code changes, database optimizations, etc.)
# 4. Run benchmark again
bash scripts/benchmark_performance.sh
# 5. Save with "after" tag
# .benchmarks/2025-11-05_12-30-00_after.json
# 6. Generate comparison report
# Calculate improvement percentages
# Highlight significant changes (>20% improvement or >10% regression)
Output format:
=== Before/After Performance Comparison ===
Endpoint Before After Δ Time Δ % Verdict
---------------------------------------------------------------------------------
Fighter List (20) 120ms 45ms -75ms -62.5% 🚀 IMPROVED
Search by Division 150ms 82ms -68ms -45.3% 🚀 IMPROVED
Search by Stance 140ms 78ms -62ms -44.3% 🚀 IMPROVED
Search with Win Streak 200ms 125ms -75ms -37.5% 🚀 IMPROVED
Fighter Detail 60ms 32ms -28ms -46.7% 🚀 IMPROVED
Overall: 🚀 All endpoints improved significantly!
Average improvement: 47.3%
Phase 1 optimization: SUCCESS ✅
- Division/Stance indexes working effectively
- Target metrics achieved on all endpoints
User requests:
Steps:
# 1. Ensure backend is running with production-like data
# Check fighter count: should be >1000 for realistic testing
# 2. Run Apache Bench tests
# Start with low concurrency, increase gradually
# Test 1: Low concurrency (baseline)
ab -n 100 -c 1 http://localhost:8000/fighters/?limit=20
# Test 2: Moderate concurrency
ab -n 1000 -c 10 http://localhost:8000/fighters/?limit=20
# Test 3: High concurrency
ab -n 1000 -c 50 http://localhost:8000/fighters/?limit=20
# Test 4: Stress test
ab -n 1000 -c 100 http://localhost:8000/fighters/?limit=20
# 3. Parse results
# Extract: Requests/sec, Mean time, 95th percentile, Failed requests
# 4. Test other critical endpoints
# Repeat for search endpoints, fighter detail
# 5. Generate load test report
Output format:
=== Load Test Results ===
Endpoint: /fighters/?limit=20
Database: PostgreSQL (2000 fighters)
Concurrency Requests Req/sec Mean (ms) 95th %ile Failed
---------------------------------------------------------------
1 100 45.2 22.1 25.0 0
10 1000 234.5 42.7 55.0 0
50 1000 421.8 118.5 145.0 0
100 1000 398.2 251.2 320.0 0
Analysis:
✅ System stable up to 50 concurrent users
⚠️ Performance degrades at 100 concurrent (mean latency >250ms)
✅ No failed requests - error handling robust
💡 Recommendation: Add connection pooling if expecting >50 concurrent users
Breaking point: ~80 concurrent users (estimated)
Recommended max concurrency: 50 users
User requests:
Steps:
# 1. Find two most recent benchmark results
# .benchmarks/2025-11-05_12-00-00.json (latest)
# .benchmarks/2025-11-04_15-30-00.json (previous)
# 2. Compare all endpoints
# Calculate % difference
# 3. Flag regressions
# Regression = >10% slower
# Warning = 5-10% slower
# Acceptable = <5% change
# Improvement = >20% faster
# 4. Identify potential causes
# Check git commits between benchmarks
# Look for schema changes, query modifications
# 5. Report findings with recommendations
Output format:
=== Regression Detection Report ===
Comparing:
- Latest: 2025-11-05 12:00:00 (commit abc123)
- Previous: 2025-11-04 15:30:00 (commit def456)
Endpoint Latest Previous Δ Status
------------------------------------------------------------------------
Fighter List (20) 45ms 43ms +2ms ✅ OK (+4.7%)
Search by Division 82ms 80ms +2ms ✅ OK (+2.5%)
Search by Stance 78ms 75ms +3ms ✅ OK (+4.0%)
Search with Win Streak 145ms 125ms +20ms ❌ REGRESSION (+16%)
Fighter Detail 32ms 31ms +1ms ✅ OK (+3.2%)
Regressions Found: 1
Warnings: 0
Improvements: 0
🚨 REGRESSION DETECTED: Search with Win Streak
- 16% slower (145ms vs 125ms)
- Commits since last benchmark:
* abc123 - "Add streak computation caching"
* def123 - "Refactor search filters"
Recommendations:
1. Use performance-investigator sub-agent to analyze Win Streak query
2. Check if streak computation caching is working correctly
3. Run EXPLAIN ANALYZE on the search query
4. Consider reverting commit abc123 if issue persists
User requests:
Steps:
# 1. Read all benchmark files in .benchmarks/
# Sort by timestamp
# 2. Extract data for each endpoint over time
# Build time series for each endpoint
# 3. Calculate statistics
# - Average response time
# - Best/worst times
# - Trend (improving/degrading/stable)
# - Volatility
# 4. Identify inflection points
# When did performance significantly change?
# 5. Generate trend report with visualization
Output format:
=== Performance Trend Analysis ===
Period: Last 30 days (15 benchmark runs)
Fighter List (/fighters/?limit=20):
Average: 52ms
Best: 38ms (2025-11-01)
Worst: 120ms (2025-10-15 - before Phase 1 optimization)
Trend: ⬇️ IMPROVING (-62% since Oct 15)
Volatility: LOW (±5ms)
Search by Division:
Average: 85ms
Best: 75ms (2025-11-03)
Worst: 150ms (2025-10-15 - before Phase 1 optimization)
Trend: ⬇️ IMPROVING (-43% since Oct 15)
Volatility: LOW (±8ms)
Fighter Detail:
Average: 34ms
Best: 28ms (2025-11-02)
Worst: 60ms (2025-10-15 - before Phase 1 optimization)
Trend: ⬇️ IMPROVING (-43% since Oct 15)
Volatility: LOW (±4ms)
Key Events:
📈 Oct 15: Baseline measurements (before optimization)
🚀 Oct 18: Phase 1 indexes deployed (685cededf16b)
✅ Oct 20: Performance targets achieved
📊 Oct 25-Nov 5: Stable performance maintained
Overall Verdict: ✅ HEALTHY
- All endpoints consistently meet targets
- Low volatility indicates stable performance
- Phase 1 optimizations effective and sustained
The UFC Pokedex supports both backends with different performance characteristics:
Benchmark with:
# Ensure PostgreSQL is running
docker-compose up -d
make api
# Run benchmarks
bash scripts/benchmark_performance.sh
Benchmark with:
# Use SQLite mode
USE_SQLITE=1 make api:dev
# Run benchmarks
API_BASE=http://localhost:8000 bash scripts/benchmark_performance.sh
Note: SQLite benchmarks are useful for development but don't represent production performance. Always validate optimizations on PostgreSQL with production-sized data.
Store benchmark results as JSON in .benchmarks/ directory:
Filename format: YYYY-MM-DD_HH-MM-SS_<backend>_<tag>.json
Examples:
.benchmarks/2025-11-05_12-30-45_postgresql.json.benchmarks/2025-11-05_12-00-00_before.json.benchmarks/2025-11-05_12-30-00_after.jsonJSON structure:
{
"timestamp": "2025-11-05T12:30:45Z",
"backend": "postgresql",
"git_commit": "abc123def456",
"fighter_count": 2000,
"fight_count": 50000,
"endpoints": [
{
"name": "fighter_list",
"url": "/fighters/?limit=20&offset=0",
"time_ms": 45.2,
"status_code": 200,
"baseline_target_ms": 100,
"passed": true
},
{
"name": "search_division",
"url": "/search/?q=&division=Welterweight",
"time_ms": 82.1,
"status_code": 200,
"baseline_target_ms": 100,
"passed": true
}
],
"load_tests": [
{
"endpoint": "/fighters/?limit=20",
"concurrency": 10,
"total_requests": 1000,
"requests_per_second": 234.5,
"mean_time_ms": 42.7,
"percentile_95_ms": 55.0,
"failed_requests": 0
}
],
"summary": {
"endpoints_tested": 5,
"endpoints_passed": 4,
"endpoints_failed": 1,
"average_time_ms": 76.4,
"regressions_detected": 1
}
}
When benchmarks reveal issues, delegate to performance-investigator:
Workflow:
Example:
Benchmark shows: Search with Win Streak: 145ms (was 125ms)
Recommendation:
"Use the performance-investigator sub-agent to analyze the Win Streak search query:
Use performance-investigator to analyze why /search/?q=&streak_type=win&min_streak_count=3 is slow
Then re-run benchmarks to validate the fix."
Add benchmark targets to Makefile:
benchmark: ## Run performance benchmarks
bash scripts/benchmark_performance.sh
benchmark-save: ## Run benchmarks and save results
# (Invoke this skill to handle saving and analysis)
benchmark-load: ## Run load tests
# (Invoke this skill with load testing workflow)
Consistent Environment
Multiple Runs
Production-Like Data
Document Context
Good Performance:
Warning Signs:
Critical Issues:
Context: Added index on fighters.division
Steps:
# 1. Benchmark before migration
# Tag as "before_division_index"
# 2. Run migration
make db-upgrade
# 3. Benchmark after migration
# Tag as "after_division_index"
# 4. Compare
# Expect: Search by Division 40-60% faster
Expected improvement: Search by Division from ~150ms to <100ms
Context: Refactored search logic
Steps:
# 1. Run benchmark after changes
# 2. Compare to last run
# Notice: Search endpoints 20% slower
# 3. Investigate with performance-investigator
# 4. Identify issue: Missing eager loading
# 5. Fix and re-benchmark
Context: Preparing for production deploy
Steps:
# 1. Benchmark with production data size
# 2. Run load tests with expected concurrency
# Expected: 20-50 concurrent users
# 3. Identify breaking points
# 4. Adjust connection pool if needed
# 5. Re-test and validate
# Run basic benchmark
bash scripts/benchmark_performance.sh
# Run with custom API base
API_BASE=https://api.ufc.wolfgangschoenberger.com bash scripts/benchmark_performance.sh
# Run load test (1000 requests, 10 concurrent)
ab -n 1000 -c 10 http://localhost:8000/fighters/?limit=20
# Run load test (silent, just summary)
ab -q -n 1000 -c 10 http://localhost:8000/fighters/?limit=20
# Check stored benchmarks
ls -lh .benchmarks/
# View latest benchmark
cat .benchmarks/$(ls -t .benchmarks/ | head -1)
# Count benchmarks
ls .benchmarks/ | wc -l
# Get current git commit
git rev-parse --short HEAD
Remember: Performance benchmarking is most valuable when done consistently over time. Run benchmarks regularly and track trends to catch regressions early!