name	monitor
description	Use when the user asks to monitor, health-check, watch, or check status of services, pipelines, code, teams, builds, deployments, or CI. Also triggers for "is X healthy", "check on Y", "watch Z". Runs one-shot or recurring, parameterized by data source.
user-invocable	true
allowed-tools	Bash, Read, Grep, WebSearch, WebFetch, Agent

Monitor

Strategy

Select preset (determines data source and key metrics)
Gather health data from the source
Compare against baseline/thresholds
Rate: healthy / degraded / critical
Format report
Alert if critical
If --recurring: schedule next run

Exit: report delivered. For recurring: runs until cancelled.

Report requirements (every report MUST include)

Specific metric values: exact numbers, not just "high" or "degraded"
Trend direction: is each metric improving, stable, or worsening vs baseline?
Actionable items: concrete next steps ranked by urgency, not just observations
Anomaly callouts: flag anything outside expected range with the specific threshold breached

Note: Placeholders like {user_question} in Agent prompts are filled by you (Claude) from the current task context. They are not template variables — read the user input, gather the relevant context, and substitute before spawning the agent.

Agents

GATHER phase

Agent(subagent_type="Explore", model="haiku", prompt="""
Gather health data for: {target}
Preset: {preset}

Data sources to check:
{preset_data_sources}

Key metrics to collect:
{preset_metrics}

Output: raw metrics with timestamps.
""")

ASSESS + REPORT phase

Agent(model="sonnet", prompt="""
Health data:
{gathered_metrics}

Baseline (last {baseline_days} days):
{baseline_data}

1. Compare current vs baseline
2. Flag anomalies (>2 stddev from baseline)
3. Rate overall: HEALTHY / DEGRADED / CRITICAL
4. Format as a concise health report
""")

Presets

Preset	Sources	Metrics
`--service NAME`	Observability platform, tracing	Latency p50/p99, error rate, throughput, instance count
`--pipeline NAME`	Pipeline orchestrator	Success rate, SLA compliance, last failure
`--ci`	CI/CD platform	Build success rate, flaky test %, avg build time
`--deploy APP`	Deployment platform	Deploy state, canary score, pending constraints
`--ml FLOW`	ML platform	Run status, latest metrics, accuracy trend
`--code REPO`	Git history	TODO/FIXME count, test coverage, PR merge rate
`--deps REPO`	pip/npm audit, CVE DBs	Outdated count, critical CVEs, last updated
`--docs`	Documentation platform	Pages not updated in 90d, broken links
`--team NAME`	GitHub, Slack, Jira	PR velocity, open issues, Slack activity
`--oncall`	PagerDuty, incident log	Open incidents, MTTR, handoff notes

Cross-provider review

When cross-provider tools are available, run verification on a non-Claude model in parallel for maximum blind-spot diversity.

Flags

--recurring=INTERVAL — repeat on schedule (daily, hourly, etc.)
--alert=CHANNEL — notification channel for critical alerts
--baseline=N — days of baseline data (default: 14)

Examples

/monitor --service my-api
/monitor --pipeline etl-daily --recurring=daily
/monitor --ci --alert=#builds
/monitor --team platform-eng
/monitor --code --baseline=30
/monitor --oncall

Execution routing (sagaflow-first)

Sagaflow is the default execution path. The in-session workflow above is the FALLBACK, used only when the sagaflow worker is confirmed unavailable.

Routing sequence (mandatory before any in-session work): 0. If one-shot health check (not --recurring) → produce report directly in-session. Do NOT route to sagaflow.

Run sagaflow doctor
If healthy → launch via sagaflow below. Stop. Do not run in-session.
If unhealthy → log SAGAFLOW_UNAVAILABLE, proceed with in-session fallback.

Launch command:

Bash(
  run_in_background=true,
  command="sagaflow launch monitor --arg target='<TARGET>' --arg preset=auto --arg recurring=false --await"
)

name	monitor
description	Use when the user asks to monitor, health-check, watch, or check status of services, pipelines, code, teams, builds, deployments, or CI. Also triggers for "is X healthy", "check on Y", "watch Z". Runs one-shot or recurring, parameterized by data source.
user-invocable	true
allowed-tools	Bash, Read, Grep, WebSearch, WebFetch, Agent