بنقرة واحدة
canary
Post-deploy monitoring: watch production after a deploy and alert on regressions
القائمة
Post-deploy monitoring: watch production after a deploy and alert on regressions
Audit Claude Code agents defined in .claude/agents/ for description specificity, model tier appropriateness, tools scoping, and system prompt quality. Detects dispatch ambiguity between agents, flags over-permissive tool grants, and checks for human-in-the-loop patterns that break programmatic orchestration. Use when onboarding to a project with existing agents, after adding new agents to a fleet, or when an orchestrator consistently selects the wrong agent.
Audit Claude Code hooks defined in settings.json files for validity, performance safety, and correctness. Resolves each command against the filesystem, checks exit-code strategy for blocking hooks, flags missing timeouts, and reviews interactive vs async patterns. Use when setting up hooks for the first time, debugging a hook that never fires or hangs the agent, or doing a periodic hooks hygiene pass.
Autonomous improvement loop: scan codebase metrics, scaffold experiment files, run agent-driven iterations until metric improves
Restore context after /clear by summarizing recent work and project state
Launch and navigate the ccboard TUI/Web dashboard for Claude Code. Use when monitoring token usage, tracking costs, browsing sessions, or checking MCP server status across projects.
Audit Claude Code setup for cache bugs (CC#40524): sentinel, --resume/--continue, attribution header + ArkNill B3/B4/B5
Watch a live application after deployment. Alert on errors and regressions. Compare against a pre-deploy baseline.
Two modes:
--baseline: capture the current state BEFORE deployingParse the user's arguments and detect the deployment context.
# Detect current branch and recent deploy commit
git branch --show-current
git log --oneline -5
# Auto-detect platform from config files
[ -f fly.toml ] && echo "PLATFORM: fly"
[ -f render.yaml ] && echo "PLATFORM: render"
[ -f vercel.json ] && echo "PLATFORM: vercel"
[ -f netlify.toml ] && echo "PLATFORM: netlify"
[ -f Procfile ] && echo "PLATFORM: heroku"
[ -f railway.toml ] && echo "PLATFORM: railway"
# Check for health endpoint
curl -sf "${URL}/health" -w "\n%{http_code}" 2>/dev/null | tail -1
curl -sf "${URL}/api/health" -w "\n%{http_code}" 2>/dev/null | tail -1
Create the working directory:
mkdir -p .canary/baselines .canary/reports .canary/screenshots
--baseline mode)Run this BEFORE deploying to capture the current healthy state.
For each page to monitor, record:
# For each page URL
for PAGE_PATH in "/" "/dashboard" "/settings" "/api/health"; do
SLUG=$(echo "$PAGE_PATH" | tr '/' '_' | tr -d '?&=')
RESULT=$(curl -sf -o /dev/null -w "%{http_code}|%{time_total}" "${BASE_URL}${PAGE_PATH}" 2>/dev/null)
STATUS=$(echo "$RESULT" | cut -d'|' -f1)
TIME_MS=$(echo "$RESULT" | awk -F'|' '{printf "%.0f", $2 * 1000}')
echo " ${PAGE_PATH}: HTTP ${STATUS}, ${TIME_MS}ms"
done
Save baseline to .canary/baselines/baseline.json:
{
"url": "<base-url>",
"timestamp": "<ISO-8601>",
"branch": "<branch-name>",
"commit": "<git-SHA>",
"pages": {
"/": { "status": 200, "time_ms": 450 },
"/dashboard": { "status": 200, "time_ms": 680 },
"/api/health": { "status": 200, "time_ms": 45 }
}
}
Then STOP and tell the user: "Baseline captured. Deploy your changes, then run /canary <url> to monitor."
If no pages were specified, auto-discover pages to monitor.
From the application:
# Check sitemap if available
curl -sf "${URL}/sitemap.xml" 2>/dev/null | grep -oP '(?<=<loc>)[^<]+' | head -10
# Check robots.txt for known paths
curl -sf "${URL}/robots.txt" 2>/dev/null | grep -i "allow\|disallow" | head -10
# Common paths to always check
echo "Always check: / /login /dashboard /settings /api/health"
Default pages to monitor if nothing found: /, and the homepage only.
Monitor for the specified duration (default: 10 minutes). Run a check every 60 seconds.
Each check cycle:
TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%SZ)
CHECK_NUM=$((CHECK_NUM + 1))
for PAGE_PATH in "${PAGES[@]}"; do
# Check HTTP status and response time
RESULT=$(curl -sf -o /dev/null -w "%{http_code}|%{time_total}" \
--max-time 10 "${BASE_URL}${PAGE_PATH}" 2>/dev/null || echo "0|0")
STATUS=$(echo "$RESULT" | cut -d'|' -f1)
TIME_MS=$(echo "$RESULT" | awk -F'|' '{printf "%.0f", $2 * 1000}')
# Compare against baseline
BASELINE_STATUS=$(jq -r ".pages[\"${PAGE_PATH}\"].status // 200" .canary/baselines/baseline.json 2>/dev/null)
BASELINE_TIME=$(jq -r ".pages[\"${PAGE_PATH}\"].time_ms // 1000" .canary/baselines/baseline.json 2>/dev/null)
echo " [Check #${CHECK_NUM}] ${PAGE_PATH}: HTTP ${STATUS} (${TIME_MS}ms)"
done
Alert levels:
| Level | Condition | Trigger |
|---|---|---|
| CRITICAL | Page load failure | HTTP status is not 2xx, curl timeout, DNS failure |
| HIGH | New errors | Error rate increased vs baseline (console errors, 5xx responses) |
| MEDIUM | Performance regression | Response time exceeds 2x baseline |
| LOW | New broken links | Previously-working routes now return 404 |
Key principles:
When a CRITICAL or HIGH alert fires (2 consecutive checks):
CANARY ALERT
=====================================
Time: [check #N at Xs elapsed]
Page: [URL]
Level: [CRITICAL / HIGH / MEDIUM / LOW]
Finding: [what changed, be specific]
Baseline: [baseline value]
Current: [current value]
=====================================
Options:
A) Investigate now: stop monitoring, focus on this issue
B) Continue monitoring: wait for next check to confirm
C) Rollback: revert the deploy
D) Dismiss: known issue, continue monitoring
After monitoring completes (or user stops), produce a summary.
CANARY REPORT: [url]
=========================================
Duration: [X minutes]
Checks: [N total per page]
Pages: [N pages monitored]
Commit: [deployed SHA]
Status: [HEALTHY / DEGRADED / BROKEN]
Per-Page Results:
-----------------------------------------
Page Status Avg Time Alerts
/ HEALTHY 450ms 0
/dashboard DEGRADED 1100ms 1 medium (was 450ms)
/settings HEALTHY 380ms 0
/api/health HEALTHY 45ms 0
Alerts Fired: [N] (X critical, Y high, Z medium, W low)
VERDICT: [DEPLOY HEALTHY / DEPLOY HAS ISSUES (see alerts above)]
=========================================
Save report to .canary/reports/<date>-canary.md.
If the deploy is healthy and the user wants to update the baseline:
cp .canary/reports/latest-snapshot.json .canary/baselines/baseline.json
echo "Baseline updated to commit $(git rev-parse --short HEAD)"
See Phase 5 above for the full CANARY REPORT template.
Inline alert format (during monitoring):
[08:42:15] Check #3: /dashboard: ALERT HIGH, response time 1250ms (baseline: 420ms)
[08:43:15] Check #4: /dashboard: ALERT HIGH, response time 1180ms (baseline: 420ms)
-> Consistent across 2 checks. Firing alert.
/canary https://app.example.com # Monitor homepage for 10 min
/canary https://app.example.com --baseline # Capture baseline before deploying
/canary https://app.example.com --duration 5m # Monitor for 5 minutes
/canary https://app.example.com --quick # Single-pass health check (no loop)
/canary https://app.example.com --pages /,/dashboard,/api/health
/canary <url> --baseline.canary/baselines/ in git so any team member can run canary against the same baseline/ship: pre-deploy checklist (run before deploying)/land-and-deploy: full merge-to-verify pipeline (runs canary automatically)/qa: interactive QA testing before shipping$ARGUMENTS