en un clic
bat-adhoc
// Run bot acceptance tests to validate MCP tools work correctly from a real AI agent's perspective. Use when testing PRs, detecting regressions, or verifying tool changes end-to-end with Claude/Gemini CLIs.
// Run bot acceptance tests to validate MCP tools work correctly from a real AI agent's perspective. Use when testing PRs, detecting regressions, or verifying tool changes end-to-end with Claude/Gemini CLIs.
Implement a GitHub issue end-to-end — create a worktree branch, implement the feature with tests, create a draft PR, then iteratively resolve all CI failures and review comments until the PR is clean. Use when you need to fully implement a GitHub issue from start to merge-ready. Triggers on "implement issue", "resolve issue", "/issue-to-pr-resolver <number>".
Manage your own GitHub pull requests — check CI status, inline review comments, PR-level comments, resolve review threads, fix issues, and iterate until all checks pass and threads are resolved. Use for managing your own PRs (not external contributions). Triggers on "check my PR", "check PR", "/my-pr-checker <number>".
Deep analysis of a single GitHub issue with codebase exploration, implementation planning, and architectural assessment. Use when you need to analyze a GitHub issue, assess its complexity, plan implementation approaches, and post a structured analysis comment. Triggers on "analyze issue", "deep analysis", "/issue-analysis <number>".
Create a git worktree in worktree/ subdirectory with up-to-date master
Review a contribution PR for safety, quality, and readiness. Checks for security concerns, test coverage, size appropriateness, and intent alignment. Use when reviewing external contributions.
Compare MCP tool behavior between target and baseline versions using pre-built and custom stories with diff-based triage.
| name | bat-adhoc |
| description | Run bot acceptance tests to validate MCP tools work correctly from a real AI agent's perspective. Use when testing PRs, detecting regressions, or verifying tool changes end-to-end with Claude/Gemini CLIs. |
| disable-model-invocation | true |
| argument-hint | ["scenario-description or --help"] |
| allowed-tools | Bash, Read, Write |
Bot acceptance testing validates that MCP tools work correctly from a real AI agent's perspective. You design test scenarios dynamically, run them via tests/uat/run_uat.py, and evaluate results.
python tests/uat/run_uat.pyall_passed per agent. If true, you're done.results_file for full output, stderr, raw JSON--branch master to compareThe runner returns a concise summary to stdout (saves context when all passes):
{
"results_file": "/tmp/bat_results_abc123.json",
"agents": {
"gemini": {
"all_passed": true,
"test": {
"completed": true,
"duration_ms": 8100,
"exit_code": 0,
"num_turns": 5,
"tool_stats": { "totalCalls": 4, "totalSuccess": 4, "totalFail": 0 }
},
"aggregate": {
"total_duration_ms": 15300,
"total_turns": 12,
"total_tool_calls": 9,
"total_tool_success": 9,
"total_tool_fail": 0
}
}
}
}
num_turns, tool_stats (per phase) for fine-grained comparisonoutput and stderr for diagnosisresults_filecat <<'EOF' | python tests/uat/run_uat.py --agents gemini
{
"setup_prompt": "Create a test automation called 'bat_error_test' with a time trigger at 23:59 and action to turn on light.bed_light.",
"test_prompt": "Try to get automation 'automation.nonexistent_xyz'. Report if the tool signaled an error or returned a normal response. Then get automation 'automation.bat_error_test' and report its structure.",
"teardown_prompt": "Delete automation 'bat_error_test' if it exists."
}
EOF
Full BAT comparison (recommended):
git fetch origin master && git checkout master && git pullgit checkout feat/my-branchCompare these metrics:
Primary (decide pass/fail on these):
aggregate.total_tool_calls vs total_tool_failSecondary (report but don't decide on these alone):
aggregate.total_tool_calls, aggregate.total_turns — directional signal, not conclusive (agent exploration varies between runs)aggregate.total_duration_ms — noisy due to network, cache misses, server load. Only flag large (>2x) regressions.Robustness tip: Ask the same task in different ways (variation testing) to check if results are consistent across phrasings.
Quick comparison (single command):
# Test the PR branch
echo '{"test_prompt":"..."}' | python tests/uat/run_uat.py --branch feat/tool-errors --agents gemini
# Compare against master
echo '{"test_prompt":"..."}' | python tests/uat/run_uat.py --branch master --agents gemini
Each scenario invocation costs API credits (one per agent per phase). Design scenarios efficiently:
When /bat-adhoc is invoked with arguments:
If arguments contain a scenario description, generate the JSON scenario and run it:
/bat-adhoc test automation create with sunrise trigger then modify to sunset
→ Generate appropriate scenario JSON and execute
If --help or no arguments, show this help text.
Otherwise, treat $ARGUMENTS as instructions for what to test and design+run the scenario accordingly.
For complete CLI reference and output format, see tests/uat/README.md.