Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

$pwd:

bat-adhoc

Name: Bat Adhoc
Author: homeassistant-ai

// Run bot acceptance tests to validate MCP tools work correctly from a real AI agent's perspective. Use when testing PRs, detecting regressions, or verifying tool changes end-to-end with Claude/Gemini CLIs.

Exécuter dans Manus

$ git log --oneline --stat

stars:3 173

forks:124

updated:17 février 2026 à 07:44

SKILL.md

readonly

related-skills.json

même dépôt

issue-to-pr-resolver.md

from "homeassistant-ai/ha-mcp"

Implement a GitHub issue end-to-end — create a worktree branch, implement the feature with tests, create a draft PR, then iteratively resolve all CI failures and review comments until the PR is clean. Use when you need to fully implement a GitHub issue from start to merge-ready. Triggers on "implement issue", "resolve issue", "/issue-to-pr-resolver <number>".

2026-05-183.2k

my-pr-checker.md

from "homeassistant-ai/ha-mcp"

Manage your own GitHub pull requests — check CI status, inline review comments, PR-level comments, resolve review threads, fix issues, and iterate until all checks pass and threads are resolved. Use for managing your own PRs (not external contributions). Triggers on "check my PR", "check PR", "/my-pr-checker <number>".

2026-05-183.2k

issue-analysis.md

from "homeassistant-ai/ha-mcp"

Deep analysis of a single GitHub issue with codebase exploration, implementation planning, and architectural assessment. Use when you need to analyze a GitHub issue, assess its complexity, plan implementation approaches, and post a structured analysis comment. Triggers on "analyze issue", "deep analysis", "/issue-analysis <number>".

2026-05-163.2k

wt.md

from "homeassistant-ai/ha-mcp"

Create a git worktree in worktree/ subdirectory with up-to-date master

2026-05-063.2k

contrib-pr-review.md

from "homeassistant-ai/ha-mcp"

Review a contribution PR for safety, quality, and readiness. Checks for security concerns, test coverage, size appropriateness, and intent alignment. Use when reviewing external contributions.

2026-03-133.2k

bat-story-eval.md

from "homeassistant-ai/ha-mcp"

Compare MCP tool behavior between target and baseline versions using pre-built and custom stories with diff-based triage.

2026-02-173.2k

package.json

"author": "homeassistant-ai"

"repository": "homeassistant-ai/ha-mcp"

Ouvrir le dépôt GitHub Voir les dépôts du créateur

$ install --global

$ download --local

Exécuter dans Manus

$ useful --forSOC

Analystes en assurance qualité des logiciels et testeursProfessions informatiques et mathématiques15-1253L4

name	bat-adhoc
description	Run bot acceptance tests to validate MCP tools work correctly from a real AI agent's perspective. Use when testing PRs, detecting regressions, or verifying tool changes end-to-end with Claude/Gemini CLIs.
disable-model-invocation	true
argument-hint	["scenario-description or --help"]
allowed-tools	Bash, Read, Write

BAT - Bot Acceptance Testing

Bot acceptance testing validates that MCP tools work correctly from a real AI agent's perspective. You design test scenarios dynamically, run them via tests/uat/run_uat.py, and evaluate results.

When to Use BAT

PR validation: Test that tool changes work correctly from an agent's perspective
Regression detection: Compare behavior between branches
Integration verification: Ensure MCP tools work end-to-end with real agent CLIs

Workflow

Analyze the change: Read the diff, identify which tools are affected
Design scenario: Generate a scenario JSON with setup/test/teardown prompts
Run the script: Pipe the scenario to python tests/uat/run_uat.py
Evaluate summary: Check all_passed per agent. If true, you're done.
Dig deeper on failure: Read results_file for full output, stderr, raw JSON
Regression check: If test fails, re-run with --branch master to compare

Output Structure

The runner returns a concise summary to stdout (saves context when all passes):

{
  "results_file": "/tmp/bat_results_abc123.json",
  "agents": {
    "gemini": {
      "all_passed": true,
      "test": {
        "completed": true,
        "duration_ms": 8100,
        "exit_code": 0,
        "num_turns": 5,
        "tool_stats": { "totalCalls": 4, "totalSuccess": 4, "totalFail": 0 }
      },
      "aggregate": {
        "total_duration_ms": 15300,
        "total_turns": 12,
        "total_tool_calls": 9,
        "total_tool_success": 9,
        "total_tool_fail": 0
      }
    }
  }
}

Phase stats: num_turns, tool_stats (per phase) for fine-grained comparison
Aggregate stats: Total counts across all phases for overall efficiency comparison
On failure: also includes output and stderr for diagnosis
Full results: raw JSON, complete output always available at results_file

Scenario Design Guidelines

setup_prompt: Create any entities/state the test needs
test_prompt: Exercise the tools being tested, ask the agent to report results clearly
teardown_prompt: Clean up created entities
Keep prompts focused - each scenario tests ONE behavior
Ask the agent to report: what succeeded, what failed, any unexpected behavior

Example: Testing Error Signaling

cat <<'EOF' | python tests/uat/run_uat.py --agents gemini
{
  "setup_prompt": "Create a test automation called 'bat_error_test' with a time trigger at 23:59 and action to turn on light.bed_light.",
  "test_prompt": "Try to get automation 'automation.nonexistent_xyz'. Report if the tool signaled an error or returned a normal response. Then get automation 'automation.bat_error_test' and report its structure.",
  "teardown_prompt": "Delete automation 'bat_error_test' if it exists."
}
EOF

Regression Comparison Workflow

Full BAT comparison (recommended):

Pull latest master: git fetch origin master && git checkout master && git pull
Run on master: Save scenario to file, run and save results
Switch to branch: git checkout feat/my-branch
Run on branch: Run same scenario, compare stats

Compare these metrics:

Primary (decide pass/fail on these):

Task completion: Did both pass? Any new failures?
Accuracy: Check agent output quality - did it understand the task correctly?
Tool success rate: Compare aggregate.total_tool_calls vs total_tool_fail

Secondary (report but don't decide on these alone):

Tool call count: Compare aggregate.total_tool_calls, aggregate.total_turns — directional signal, not conclusive (agent exploration varies between runs)
Duration: Compare aggregate.total_duration_ms — noisy due to network, cache misses, server load. Only flag large (>2x) regressions.

Robustness tip: Ask the same task in different ways (variation testing) to check if results are consistent across phrasings.

Quick comparison (single command):

# Test the PR branch
echo '{"test_prompt":"..."}' | python tests/uat/run_uat.py --branch feat/tool-errors --agents gemini

# Compare against master
echo '{"test_prompt":"..."}' | python tests/uat/run_uat.py --branch master --agents gemini

Cost Awareness

Each scenario invocation costs API credits (one per agent per phase). Design scenarios efficiently:

Combine related checks in a single test_prompt when possible
Only use setup/teardown when the test needs specific state
Start with one agent, expand to both only when cross-agent comparison matters

Handling Arguments

When /bat-adhoc is invoked with arguments:

If arguments contain a scenario description, generate the JSON scenario and run it:

/bat-adhoc test automation create with sunrise trigger then modify to sunset

→ Generate appropriate scenario JSON and execute

If --help or no arguments, show this help text.

Otherwise, treat $ARGUMENTS as instructions for what to test and design+run the scenario accordingly.

Full Documentation

For complete CLI reference and output format, see tests/uat/README.md.

bat-adhoc

Plus depuis ce dépôt

Plus depuis ce dépôt

BAT - Bot Acceptance Testing

When to Use BAT

Workflow

Output Structure

Scenario Design Guidelines

Example: Testing Error Signaling

Regression Comparison Workflow

Cost Awareness

Handling Arguments

Full Documentation

BAT - Bot Acceptance Testing

When to Use BAT

Workflow

Output Structure

Scenario Design Guidelines

Example: Testing Error Signaling

Regression Comparison Workflow

Cost Awareness

Handling Arguments

Full Documentation