| name | dba |
| description | Multi-cluster MySQL/Vitess database health check specialist. Scans ALL clusters via dba-investigate.py orchestrator, classifies per-cluster health against 7-day baselines, runs causal analysis, and produces structured verdicts. USE WHEN investigating database issues, MySQL/Vitess health, replication lag, freno throttling, lock contention, query performance problems, or table traffic anomalies. |
| metadata | {"triggers":["dba","database","mysql","vitess","replication lag","freno","deadlock","lock contention","slow queries","proxysql","cluster health"],"provides":["database-health-check","mysql-analysis","structured-verdict"],"requires":["pup-cli","DD_API_KEY","DD_APP_KEY"]} |
DBA Skill
This skill wraps the DBA agent. Full instructions live in the agent definition:
Agent file: $HOME/.pi/agent/agents/dba.md
Read that file and follow its execution procedure exactly.
Quick Summary
The DBA agent uses dba-investigate.py to:
- Plan — generate Datadog + Kusto query plans for all clusters
- Execute — run all planned queries via
pup CLI and Kusto REST API
- Analyze — classify per-cluster health (5-tier: normal → critical)
- Deep-dive — symptom-driven diagnostic queries on problem clusters
- Causal analysis — apply 25 causal rules to identify root causes
- Report — markdown or JSON output
Orchestrator
$HOME/.pi/agent/skills/datadog/tools/dba-investigate.py
All metric definitions, classification thresholds, causal rules, and I/O logic live in the orchestrator. The agent runs it and executes the queries it generates.
Primary Command
Use the run subcommand — it chains all phases (plan → execute → collect-kusto → analyze → deep-dive-plan → execute deep-dive → causal analysis → report) in a single invocation:
python3 $HOME/.pi/agent/skills/datadog/tools/dba-investigate.py run \
--start "$START" --end "$END" --id "$ID" \
--service "$SERVICE" \
--concurrency 6 \
--format markdown
Subagent mode (orchestrated): Use --deep-dive-top 30 --format subagent when running as a subagent — 200 is too slow for multi-agent orchestration. Deep-dive queries are auto-capped to 200 in subagent mode.
Key flags:
--service — auto-resolves owned + dependency MySQL clusters via services-context
--clusters — comma-separated explicit cluster filter (overrides --service)
--deep-dive-top N — max clusters for deep-dive (default: 200, use 30 for subagent mode)
--max-deep-dive-queries N — cap total deep-dive queries (auto: 200 in subagent mode)
--concurrency N — parallel pup queries (default: 6, max safe: 6)
--all-tiers — include all tiers (default: tier 0+1 only)
--format — report format: markdown (default), json, subagent
--html — generate HTML report (separate flag)
Individual Phase Commands (Reference)
For debugging or partial re-runs, individual subcommands are available:
python3 $HOME/.pi/agent/skills/datadog/tools/dba-investigate.py plan --start "$START" --end "$END" --id "$ID" [--service "$SERVICE"]
python3 $HOME/.pi/agent/skills/datadog/tools/dba-investigate.py execute --id "$ID"
python3 $HOME/.pi/agent/skills/datadog/tools/dba-investigate.py collect-kusto --id "$ID"
python3 $HOME/.pi/agent/skills/datadog/tools/dba-investigate.py analyze --id "$ID"
python3 $HOME/.pi/agent/skills/datadog/tools/dba-investigate.py deep-dive-plan --id "$ID"
python3 $HOME/.pi/agent/skills/datadog/tools/dba-investigate.py execute --id "$ID" --plan deep-dive
python3 $HOME/.pi/agent/skills/datadog/tools/dba-investigate.py collect-kusto --id "$ID" --deep-dive
python3 $HOME/.pi/agent/skills/datadog/tools/dba-investigate.py deep-dive-status --id "$ID"
python3 $HOME/.pi/agent/skills/datadog/tools/dba-investigate.py analyze-deep-dive --id "$ID"
python3 $HOME/.pi/agent/skills/datadog/tools/dba-investigate.py report --id "$ID" --format markdown
Rules
- Never ask for inputs — compute defaults and proceed immediately
- Use
run for full investigations — ALWAYS prefer dba-investigate.py run which chains all phases. Use individual subcommands only for partial re-runs or debugging. NEVER write custom scripts, shell loops, or inline code to run pup queries.
- Read
$HOME/.pi/agent/agents/dba.md for the complete execution procedure
- Use the report skill for publishing — see
$HOME/.pi/agent/skills/report/SKILL.md if available for Pages, Slack, Discussion targets
- Plan posting — use
src.lib.plan_engine for live Slack/GitHub progress tracking. Phase definitions in phase-steps.json.
Plan Posting (Slack)
Post live progress as a Slack plan block:
echo '{"phase": 3, "content": "Classifying per-cluster health..."}' | \
python3 $HOME/.pi/agent/skills/report/tools/post-plan.py \
--slack "$PERMALINK" --state-file /tmp/dba-state.json