Run any Skill in Manus with one click

muggle-test

Run change-driven E2E acceptance testing using Muggle AI — detect the user's recent code changes (local diff or a PR), map them to affected user flows, then run real-browser acceptance tests on localhost or a deployed preview/staging URL, publish results to the dashboard, and post a screenshot summary to the PR. Use this whenever the user wants to test, validate, check, or regression-test their own changes or work — 'test my changes', 'validate my changes', 'regression test my work', 'make sure I didn't break anything', 'did my recent commits/changes break anything or any user flows?', 'test before I push' — and especially as the acceptance-test gate before opening a pull request or merging (e.g. 'validate my changes before I open the PR' means run the acceptance suite, not just a completion checklist). The defining signal is change-driven validation of in-progress work tied to a commit, push, PR, or merge. For testing one specific named feature/flow use muggle-test-feature-local; this is not importing existi

Run Skill in Manus

Overview

Install command

npx skills add https://github.com/multiplex-ai/muggle-ai-works --skill muggle-test

Copy and paste this command into Claude Code to install the skill

Source

multiplex-ai/muggle-ai-works

Stars2

Forks2

UpdatedJune 3, 2026 at 22:23

SKILL.md

readonly

Muggle Test — Change-Driven E2E Acceptance Router

Telemetry first step: see _shared/telemetry-emit.md. Use skillName: "muggle-test".

A router skill that detects code changes, resolves impacted test cases, executes them locally or remotely, publishes results to the Muggle AI dashboard, and posts E2E acceptance summaries to the PR. The user can invoke this at any moment, in any state.

UX Guidelines — Minimize Typing

Every selection-based question MUST use the AskUserQuestion tool (or the platform's equivalent structured selection tool). Never ask the user to "reply with a number" in a plain text message — always present clickable options.

Selections (project, use case, test case, mode, approval): Use AskUserQuestion with labeled options the user can click.
Multi-select (use cases, test cases): Use AskUserQuestion with allow_multiple: true.
Free-text inputs (URLs, descriptions): Only use plain text prompts when there is no finite set of options. Even then, offer a detected/default value when possible.
Batch related questions: If two questions are independent, present them together in a single AskUserQuestion call rather than asking sequentially.
Parallelize job-creation calls: Whenever you're kicking off N independent cloud jobs — creating multiple use cases, generating/creating multiple test cases, fetching details for multiple test cases, starting multiple remote workflows, publishing multiple local runs, or fetching per-step screenshots for multiple runs — issue all N tool calls in a single message so they run in parallel. Never loop them sequentially unless there is a real ordering constraint (e.g. a single local Electron browser that can only run one test at a time).

Test Case Design: One Atomic Behavior Per Test Case

Every test case verifies exactly one user-observable behavior. Never bundle multiple concerns, sequential flows, or bootstrap/setup into a single test case — even if you think it would be "cleaner" or "more efficient."

Ordering, dependencies, and bootstrap are Muggle Test's service responsibility, not yours. Muggle Test's cloud handles test case dependencies, prerequisite state, and execution ordering. Your job is to describe the atomic behavior to verify — never the flow that gets there.

❌ Wrong: one test case that "signs up, logs in, navigates to the detail modal, verifies icon stacking, verifies tab order, verifies history format, and verifies reference layout."
✅ Right: four separate test cases — one per verifiable behavior — each with instruction text like "Verify the detail modal shows stacked pair of icons per card" with no signup / login / navigation / setup language.

Never bake bootstrap into a test case description. Signup, login, seed data, prerequisite navigation, tear-down — none of these belong inside the test case body. Write only the verification itself. The service will prepend whatever setup is needed based on its own dependency graph.

Never consolidate the generator's output. When muggle-remote-test-case-generate-from-prompt returns N micro-tests from a single prompt, that decomposition is the authoritative one. Do not "merge them into 1 for simplicity," do not "rewrite them to share bootstrap," do not "collapse them to match a 4 UC / 4 TC plan." Accept what the generator gave you.

Never skip the generate→review cycle. Even when you are 100% confident about the right shape, always present the generated test cases to the user before calling muggle-remote-test-case-create. "I'll skip the generate→review cycle and create directly" is a sign you're about to get it wrong.

Preferences

Gates run per preference-gates/README.md.

Preference	Step	Decision it gates
`autoLogin`	3	Reuse saved credentials when auth is required
`autoSelectProject`	4	Reuse last-used Muggle Test project for this repo
`autoSelectLocalHost`	7A	Reuse last-used local dev server URL for this repo
`autoDetectChanges`	2	Scan local git changes and map to affected test cases
`defaultExecutionMode`	1	Default to local or remote test execution
`autoPublishLocalResults`	7A	Upload local results to Muggle Test cloud after run
`showElectronBrowser`	7A	Show the Electron browser window during local test execution (vs. run headless)
`postPRVisualWalkthrough`	9	Post visual walkthrough to PR after results are available
`autoCreatePR`	9 (if no PR)	Auto-create the PR when posting the walkthrough has no PR to target
`autoWatchPR`	9.5 (if a PR exists)	Start a `muggle-pr-followup` watcher on the PR after the run

Step 1: Confirm Scope of Work (Always First)

Parse the user's query and explicitly confirm their expectation. There are exactly two modes:

Mode A: Local Test Generation (default for PRs)

Test impacted use cases/test cases against localhost using the Electron browser.

Execution tool: muggle-local-execute-test-generation

Signs the user wants this: mentions "localhost", "local", "my machine", "dev server", "my changes locally", or just "test my changes" in a repo context. Also: passing a GitHub PR/issue/repo URL (github.com/<org>/<repo>/pull/<n>) defaults to Local mode — PR review almost always means checking out the branch and validating against the dev server, not testing the PR's preview deployment.

Mode B: Remote Test Generation

Ask Muggle Test's cloud to generate test scripts against a preview/staging URL.

Execution tool: muggle-remote-workflow-start-test-script-generation

Signs the user wants this: mentions "preview", "staging", "deployed", "preview URL", "test on preview", "test the deployment", or provides an actual deployed preview/staging URL (e.g. *.vercel.app, staging.foo.com, custom preview domains). GitHub PR URLs do not count — see Mode A.

Confirming (gated by `defaultExecutionMode`)

Gate defaultExecutionMode (per preference-gates/README.md). Uses local/remote/ask.

local → proceed in Local mode.
remote → proceed in Remote mode.
ask + intent clear → skip Picker 1, confirm one-shot then skip Picker 2.
ask + ambiguous → run Picker 1 from gate file.

Only proceed after selection.

Step 2: Detect Local Changes (gated by `autoDetectChanges`)

Gate autoDetectChanges (per preference-gates/README.md):

always → run the scan and proceed to analysis below.
never → ask "What would you like to test?" then jump to Step 3.
ask → run Picker 1 from preference-gates/autoDetectChanges.md via AskUserQuestion; map the answer back to one of the actions above.

Analysis (when scan is enabled)

Analyze the changes to understand what's impacted. Two sources, picked by what the user passed:

Working directory (default):

Run git status and git diff --stat for an overview
Run git diff (or git diff --cached if staged) to read actual diffs

PR URL (user passed github.com/<org>/<repo>/pull/<n>):

gh pr diff <n> --repo <org>/<repo> --name-only for the changed file list
gh pr diff <n> --repo <org>/<repo> for the actual diff
Materialize the PR branch in a dedicated worktree per _shared/pr-branch-worktree.md. Use that worktree path as the cwd for the rest of the run (including the cwd parameter on local execute tools).

Either way:

Identify impacted feature areas:
- Changed UI components, pages, routes
- Modified API endpoints or data flows
- Updated form fields, validation, user interactions
Produce a concise change summary — a list of impacted features

Present:

"Here's what changed: [list]. I'll scope E2E acceptance testing to these areas."

If no changes detected (clean tree), tell the user and ask what they want to test.

Step 3: Authenticate

Call muggle-remote-auth-status. Three states: valid (authenticated: true), expired (authenticated: false + isExpired: true, email still present), absent (authenticated: false, no email).
Valid OR expired (any stored identity) → gate autoLogin (per preference-gates/README.md). An expired token is NOT a reason to silently re-login the same account — surface the switch choice:
- always → reuse if valid; if expired, re-login the same account (muggle-remote-auth-login, then muggle-remote-auth-poll).
- never → muggle-remote-auth-login with forceNewSession: true, then muggle-remote-auth-poll.
- ask → run Picker 1 from preference-gates/autoLogin.md via AskUserQuestion; map the answer back to one of the actions above.
Absent (no stored identity) → muggle-remote-auth-login directly, then muggle-remote-auth-poll.
If login pending → call muggle-remote-auth-poll.

Account-switch caveat (never / "Switch account"). The device flow has no prompt=select_account; switching relies on forceNewSession first clearing the Auth0 session via /v2/logout?returnTo=<device-activation URL>. That redirect only works if the activation URL is in the app's Auth0 Allowed Logout URLs — otherwise the browser shows an Auth0 error page and the session is silently reused. If that happens, tell the user to complete login in a fresh incognito window (no live SSO session) so Auth0 presents an account login.

If auth fails repeatedly, suggest: muggle logout && muggle login from terminal.

Step 4: Select Project (gated by `autoSelectProject`)

A project is where all your test results, use cases, and test scripts are grouped on the Muggle AI dashboard. Pick the project that matches what you're working on.

The per-repo cache lives at <cwd>/.muggle-ai/last-project.json (managed via the muggle-local-last-project-get / muggle-local-last-project-set MCP tools). Look for the Muggle Test Last Project: id=… url=… name="…" line in session context — if present, that's this repo's cached pick.

Gate autoSelectProject (per preference-gates/README.md). Cache: Muggle Test Last Project session line.

always + cache → use cached projectId, skip to Step 5. No cache → fall through to ask.
never → full project list; skip Picker 2.
ask → project list picker (see gate file for spec + Picker 2 override). Skip Picker 2 if "Create new project".

Logic

Resolve the chosen project per the gate above.
Call muggle-remote-project-list only when the gate doesn't already give a projectId from the cache.
Wait for the user to explicitly choose when presenting the picker — do NOT auto-select based on repo name or URL matching.
If user chooses "Create new project":
- Ask for projectName, description, and the production/preview URL
- Call muggle-remote-project-create

Store the projectId only after user confirms (or after silent reuse from the cache).

Step 5: Select Use Case (Best-Effort Shortlist)

5a: List existing use cases

Call muggle-remote-use-case-list with the project ID.

5b: Best-effort match against the change summary

Using the change summary from Step 2, pick the use cases whose title/description most plausibly relate to the impacted areas. Produce a short shortlist (typically 1–5) — don't try to be exhaustive, and don't dump the full project list on the user. A confident best-effort match is the goal.

If nothing looks like a confident match, fall back to asking the user which use case(s) they have in mind.

5c: Present the shortlist for confirmation

Use AskUserQuestion with allow_multiple: true:

Prompt: "These use cases look most relevant to your changes — confirm which to test:"

Pre-check the shortlisted items so the user can accept with one click
Include "Pick a different use case" to reveal the full project list
Include "Create new use case" at the end

5d: If user picks "Pick a different use case"

Re-present the full list from 5a via AskUserQuestion with allow_multiple: true, then continue.

5e: If user chooses "Create new use case"

Ask the user to describe the use case(s) in plain English — they may want more than one
Call muggle-remote-use-case-create-from-prompts once with all descriptions batched into the instructions array (this endpoint natively fans out the jobs server-side — do NOT make one call per use case):
- projectId: The project ID
- instructions: A plain array of strings, one per use case — e.g. ["<description 1>", "<description 2>", ...]
Present the created use cases and confirm they're correct

Step 6: Select Test Case (Best-Effort Shortlist)

For the selected use case(s):

6a: List existing test cases

Call muggle-remote-test-case-list-by-use-case with each use case ID.

6b: Best-effort match against the change summary

Using the change summary from Step 2, pick the test cases that look most relevant to the impacted areas. Keep the shortlist small and confident — don't enumerate every test case attached to the use case(s).

If nothing looks like a confident match, fall back to offering to run all test cases for the selected use case(s), or ask the user what they had in mind.

6c: Present the shortlist for confirmation

Use AskUserQuestion with allow_multiple: true:

Prompt: "These test cases look most relevant — confirm which to run:"

Pre-check the shortlisted items so the user can accept with one click
Include "Show all test cases" to reveal the full list
Include "Generate new test case" at the end

6d: If user chooses "Generate new test case"

Ask the user to describe what they want to test in plain English — they may want more than one test case
For N descriptions, issue N muggle-remote-test-case-generate-from-prompt calls in parallel (single message, multiple tool calls — never loop sequentially):
- projectId, useCaseId, instruction (one description per call)
- Each instruction must describe exactly one atomic behavior to verify. No signup, no login, no "first navigate to X, then click Y, then verify Z" chains, no seed data, no cleanup. Just the verification. See Test Case Design above.
Accept the generator's decomposition as-is. If the generator returns 4 micro-tests from a single prompt, that's 4 correct test cases — never merge, consolidate, or rewrite them to bundle bootstrap.
Present the generated test case(s) for user review — always do this review cycle, even when you think you already know the right shape. Skipping straight to creation is the anti-pattern this skill most frequently gets wrong.
For the ones the user approves, issue muggle-remote-test-case-create calls in parallel

6e: Confirm final selection

Use AskUserQuestion to confirm: "You selected [N] test case(s): [list titles]. Ready to proceed?"

Option 1: "Yes, run them"
Option 2: "No, let me re-select"

Wait for user confirmation before moving to execution.

6f: Classify execution mode per test case (replay vs regen)

For each selected test case, decide whether the run should be a replay of an existing script or a fresh regen, using the rules in _shared/failure-mode-handling.md section A. Inputs: the change summary from Step 2, the test case body, and the result of muggle-remote-test-script-list for that test case (last passing timestamp + whether any replayable script exists).

Per test case, fire one muggle-local-telemetry-event-emit with eventType: "pre-execution-classification" capturing the picked mode, the rule that fired, and the matched changed-file paths.

Then show the per-case decision in one AskUserQuestion:

"Here's how I plan to run each test case — replay reuses the saved script, regen rebuilds it from scratch:

[REPLAY] Login with valid creds — selectors look unchanged

[REGEN] Sign up with valid email — last passed > 30 days ago

[REGEN] Add to cart — app/cart/page.tsx changed (UI/markup)"

Options: "Looks good — proceed", "Override one or more", "Cancel"

If the user picks "Override one or more", let them flip the mode for any test case via a second multi-select AskUserQuestion. Emit a follow-up pre-execution-classification event with userAction set whenever the user overrides.

Step 7A: Execute — Local Mode

Local environment readiness

Before anything else, invoke muggle-test-prepare — the readiness/service-start owner (idempotent; halt on what it surfaces). The URL gate below only selects the target; prepare is what guarantees something is listening and compiled.

Pre-flight question — Local URL (gated by `autoSelectLocalHost`)

Skill responsibilities (the rest is in preference-gates/autoSelectLocalHost.md):

Read the cache: Muggle Test Last Host: <url> session-context line, or muggle-local-last-host-get. Pass as {lastHost} substitution.
Auto-detect a suggested URL: lsof -iTCP -sTCP:LISTEN -nP | grep -E ':(3000|3001|4200|5173|8080)'. Pass as {suggestedHost}.
Save the cache: call muggle-local-last-host-set after the user picks (the gate file requires this on every pick).

Gate autoSelectLocalHost per preference-gates/README.md + preference-gates/autoSelectLocalHost.md.

Pre-flight visibility (gated by `showElectronBrowser`)

Gate showElectronBrowser (per preference-gates/README.md). Resolve once; apply same showUi to every test case.

always → omit showUi (defaults visible).
never → pass showUi: false.
ask → run Picker 1 from preference-gates/showElectronBrowser.md via AskUserQuestion; map the answer back to one of the actions above.

Fetch test case details (in parallel)

Before execution, fetch full test case details for all selected test cases by issuing all muggle-remote-test-case-get calls in parallel (single message, multiple tool calls).

Run the dev loop

Execute each selected test case via the shared loop in ../_shared/dev-loop/run.md: sequential replay/regen, actionScript as-is, freshSession, and timeoutMs.

Caller glue:

mode per test case comes from Step 6f; localUrl from the pre-flight question; showUi from the showElectronBrowser resolution.
cwd = the PR-branch worktree from Step 2 if one was created, else the user's repo root — it drives the cross-worktree single-flight lock so concurrent muggle-test runs from different branches serialize.
On a failed run, continue the batch and route it through Step 7C after completion.

Collect results

Fetch every runId per ../_shared/dev-loop/failures.md, reading structured fields and interpreting failures — never execute's stdout tail. Issue the muggle-local-run-result-get calls in parallel; use Error as the headline for failures and route through Step 7C.

Publish each run to cloud (gated by `autoPublishLocalResults`)

Gate autoPublishLocalResults (per preference-gates/README.md):

always → proceed to publish logic below.
never → skip to report summary; tell user Steps 8/9 and per-step screenshots are unavailable without publishing.
ask → run Picker 1 from preference-gates/autoPublishLocalResults.md via AskUserQuestion; map the answer back to one of the actions above.

Publish logic (when publishing is enabled)

Publish every completed run per ../_shared/dev-loop/publish.md — parallel muggle-local-publish-test-script with the zero-step muggle-remote-local-run-upload fallback. Store every viewUrl, testScriptId, actionScriptId — used in the next steps.

Report summary

Test Case                  Status    Duration   Steps   View Steps on Muggle AI
─────────────────────────────────────────────────────────────────────────
Login with valid creds     PASSED    12.3s      8       https://www.muggle-ai.com/...
Login with invalid creds   PASSED    9.1s       6       https://www.muggle-ai.com/...
Checkout flow              FAILED    15.7s      12      https://www.muggle-ai.com/...
─────────────────────────────────────────────────────────────────────────
Total: 3 tests | 2 passed | 1 failed | 37.1s

For failures: show which step failed, the local screenshot path, and a suggestion.

Step 7B: Execute — Remote Mode

Ask for target URL

"What's the preview/staging URL to test against?"

Fetch test case details (in parallel)

Issue all muggle-remote-test-case-get calls in parallel (single message, multiple tool calls) to hydrate the test case bodies.

Trigger remote workflows (in parallel)

Branch each test case on the mode chosen in Step 6f, then issue all workflow-start calls in parallel — never loop them sequentially. Mix regen and replay starts in the same parallel batch.

Regen-mode test case — muggle-remote-workflow-start-test-script-generation:

projectId: The project ID
useCaseId: The use case ID
testCaseId: The test case ID
name: "muggle-test: {test case title}"
url: The preview/staging URL
goal: From the test case
precondition: From the test case (use "None" if empty)
instructions: From the test case
expectedResult: From the test case

Replay-mode test case — muggle-remote-workflow-start-test-script-replay against the latest replayable script for that test case (resolve via muggle-remote-test-script-list if not already in hand from Step 6f). Tag results with mode: "replay" so Step 7C can route failures correctly.

Store each returned workflow runtime ID along with its mode tag.

Monitor and report (in parallel)

Issue all muggle-remote-wf-get-ts-gen-latest-run calls in parallel, one per runtime ID.

Test Case                  Workflow Status   Runtime ID
────────────────────────────────────────────────────────
Login with valid creds     RUNNING           rt-abc123
Login with invalid creds   COMPLETED         rt-def456
Checkout flow              QUEUED            rt-ghi789

Step 7C: Route failures through the failure-mode handler

For every run with status: "failed" (or any non-passing terminal state) from 7A or 7B, follow _shared/failure-mode-handling.md:

Replay-mode failures — section B (buckets: infra / stale-script / product-defect).
Regen-mode failures — section C (buckets: transient / infra / agent-course / product-uxux).

For each failed run:

Read the run with muggle-local-run-result-get (local) or muggle-remote-wf-get-ts-gen-latest-run / muggle-remote-wf-get-ts-replay-latest-run (remote) and extract signals per the heuristics in the shared doc.
Emit replay-failure-classified or regen-failure-classified via muggle-local-telemetry-event-emit before asking the user.
Present the recommended action via AskUserQuestion along with the alternatives the shared doc lists for that bucket.
After the user picks, emit the matching *-resolved event with userAction set to what they chose.

Process failures one at a time so the user isn't drowning in pickers — but emit telemetry per failure regardless.

Step 8: Open Results in Browser

After execution and publishing are complete, open the Muggle AI dashboard so the user can visually inspect results and screenshots.

Mode A (Local) — open each published viewUrl

For each published run's viewUrl:

open "https://www.muggle-ai.com/muggleTestV0/dashboard/projects/{projectId}/scripts?modal=script-details&testCaseId={testCaseId}"

If there are many runs (>3), open just the project-level runs page instead of individual tabs:

open "https://www.muggle-ai.com/muggleTestV0/dashboard/projects/{projectId}/runs"

Mode B (Remote) — open the project runs page

open "https://www.muggle-ai.com/muggleTestV0/dashboard/projects/{projectId}/runs"

Tell the user:

"I've opened the Muggle AI dashboard in your browser — you can see the test results, step-by-step screenshots, and action scripts there."

Step 9: Offer to Post Visual Walkthrough to PR

After reporting results:

Fire postPRVisualWalkthrough. On skip → Step 9.5.
gh pr view --json number,title,url 2>/dev/null — find the PR.
If no PR: fire autoCreatePR. On skip → Step 9.5.
Assemble the E2eReport — see ../muggle-pr-visual-walkthrough/e2e-report-assembly.md. Include all runs from Step 7A (passed and failed).
Invoke ../muggle-pr-visual-walkthrough/SKILL.md Mode A with the E2eReport.

Step 9.5: Offer to watch the PR for review follow-ups

Once a PR exists for this work, offer to keep watching its review thread.

Identify the PR — reuse the gh pr view --json number,title,url result from Step 9 if available, else run it now. No PR (none exists, none created) → Step 10.
Fire autoWatchPR with {pr} = <owner>/<repo>#<number>. On skip → Step 10.
On proceed: start the watcher reusing this run's context so it never re-prompts —
- Seed the muggle-pr-followup session slot and dispatch its loop per the stage-8 seeding in ../do/open-prs/forward.md (default slug <repo>-pr<number>).
- Additionally write state.md's ## Pre-flight answers block from the context resolved this run — validation strategy, local URL, project, credentials, auth, working tree — per ../_shared/resolve-e2e-validation-context.md. Strategy = local-e2e (local run), staging-replay (remote), or unit-only/skip if no E2E ran.

The /mprfollowup shortcut starts the same watcher manually at any time.

Step 10: Offer feedback on failures

After the report is complete, if any test in the run had a failed or unexpected status (or the user verbally flags something looked off), suggest the feedback skill:

"Looks like <N> test(s) didn't go as expected. Want to leave feedback on what should've happened? It triggers regeneration on the affected scripts."

Use AskUserQuestion:

Yes — give feedback → invoke the muggle-feedback skill via the Skill tool. Pass the failed run's runId (local) or testScriptId (remote) as anchor context so the submit flow opens with the correct script already loaded.
No — skip

This is a suggestion, not automatic invocation. Skip silently if every test passed cleanly.

Tool Reference

Phase	Tool	Mode
Auth	`muggle-remote-auth-status`	Both
Auth	`muggle-remote-auth-login`	Both
Auth	`muggle-remote-auth-poll`	Both
Project	`muggle-remote-project-list`	Both
Project	`muggle-remote-project-create`	Both
Use Case	`muggle-remote-use-case-list`	Both
Use Case	`muggle-remote-use-case-create-from-prompts`	Both
Test Case	`muggle-remote-test-case-list-by-use-case`	Both
Test Case	`muggle-remote-test-case-generate-from-prompt`	Both
Test Case	`muggle-remote-test-case-create`	Both
Test Case	`muggle-remote-test-case-get`	Both
Execute (regen)	`muggle-local-execute-test-generation`	Local
Execute (replay)	`muggle-local-execute-replay`	Local
Replay action script fetch	`muggle-remote-test-script-get`, `muggle-remote-action-script-get`	Local replay
Execute (regen)	`muggle-remote-workflow-start-test-script-generation`	Remote
Execute (replay)	`muggle-remote-workflow-start-test-script-replay`	Remote
Failure-mode telemetry	`muggle-local-telemetry-event-emit`	Both
Results	`muggle-local-run-result-get`	Local
Results	`muggle-remote-wf-get-ts-gen-latest-run`, `muggle-remote-wf-get-ts-replay-latest-run`	Remote
Publish	`muggle-local-publish-test-script`	Local
Per-step screenshots (for walkthrough)	`muggle-remote-test-script-get`	Both
Browser	`open` (shell command)	Both
PR walkthrough	`muggle-pr-visual-walkthrough` (shared skill)	Both

Guardrails

Always confirm intent first — never assume local vs remote without asking
PR URLs always run in a dedicated worktree — never switch the user's main checkout. Materialize per _shared/pr-branch-worktree.md and pass that path as cwd to local execute tools; the cross-worktree single-flight lock relies on it to serialize concurrent runs from different branches.
User MUST select project — present clickable options via AskUserQuestion, wait for explicit choice, never auto-select
Best-effort shortlist use cases — use the change summary to narrow the list to the most relevant 1–5 use cases and pre-check them; never dump every use case in the project on the user. Always leave an escape hatch to reveal the full list.
Best-effort shortlist test cases — same idea: pre-check the test cases most relevant to the change summary; never enumerate every test case attached to a use case. Always leave an escape hatch to reveal the full list.
Use AskUserQuestion for every selection — never ask the user to type a number; always present clickable options
Auto-detect localhost URL when possible; only fall back to free-text when nothing is listening on a common port
Parallelize independent cloud jobs — when creating N use cases, generating/creating N test cases, fetching N test case details, starting N remote workflows, polling N workflow runtimes, publishing N local runs, or fetching N per-step test scripts, issue all N calls in a single message so they fan out in parallel. The only tolerated sequential loop is local Electron execution (one browser, one test at a time). For use case creation specifically, use the native batch form of muggle-remote-use-case-create-from-prompts (all descriptions in one instructions array) instead of parallel calls.
One atomic behavior per test case — every test case verifies exactly one user-observable behavior. Never bundle signup/login/navigation/bootstrap/teardown into a test case body. Ordering and dependencies are Muggle Test's service responsibility, not the skill's.
Never consolidate the generator's output — if muggle-remote-test-case-generate-from-prompt returns N micro-tests, accept all N; never merge them into fewer test cases, even if "the plan" says 4 UC / 4 TC.
Never skip the generate→review cycle — always present generated test cases to the user before calling muggle-remote-test-case-create, even when you're confident. "I'll skip the review and create directly" is always wrong.
Never silently drop test cases — log failures and continue, then report them
Never guess the URL — always ask the user for localhost or preview URL
Always publish before opening browser — the dashboard needs the published data to show results
Delegate PR posting to muggle-pr-visual-walkthrough — never inline the walkthrough markdown or call gh pr comment directly from this skill; ask the user and hand off
Can be invoked at any state — if the user already has a project or use cases set up, skip to the relevant step rather than re-doing everything

Agent Dispatch

When used in a multi-agent team (e.g., muggle-ai-teams), this skill is available through the acceptance-tester agent at plugin/agents/acceptance-tester.md. Orchestrators can dispatch it via Agent() instead of invoking this skill directly. The agent wraps this skill and four others (muggle-test-import, muggle-preferences, muggle-repair, muggle-status) and returns structured test results with blocking issues and suggested fixes for coding agents to act on.

muggle-test

More from this repository

Muggle Test — Change-Driven E2E Acceptance Router

UX Guidelines — Minimize Typing

Test Case Design: One Atomic Behavior Per Test Case

Preferences

Step 1: Confirm Scope of Work (Always First)

Mode A: Local Test Generation (default for PRs)

Mode B: Remote Test Generation

Confirming (gated by defaultExecutionMode)

Step 2: Detect Local Changes (gated by autoDetectChanges)

Analysis (when scan is enabled)

Step 3: Authenticate

Step 4: Select Project (gated by autoSelectProject)

Logic

Step 5: Select Use Case (Best-Effort Shortlist)

5a: List existing use cases

5b: Best-effort match against the change summary

5c: Present the shortlist for confirmation

5d: If user picks "Pick a different use case"

5e: If user chooses "Create new use case"

Step 6: Select Test Case (Best-Effort Shortlist)

6a: List existing test cases

6b: Best-effort match against the change summary

6c: Present the shortlist for confirmation

6d: If user chooses "Generate new test case"

6e: Confirm final selection

6f: Classify execution mode per test case (replay vs regen)

Step 7A: Execute — Local Mode

Local environment readiness

Pre-flight question — Local URL (gated by autoSelectLocalHost)

Pre-flight visibility (gated by showElectronBrowser)

Fetch test case details (in parallel)

Run the dev loop

Collect results

Publish each run to cloud (gated by autoPublishLocalResults)

Publish logic (when publishing is enabled)

Report summary

Step 7B: Execute — Remote Mode

Ask for target URL

Fetch test case details (in parallel)

Trigger remote workflows (in parallel)

Monitor and report (in parallel)

Step 7C: Route failures through the failure-mode handler

Step 8: Open Results in Browser

Mode A (Local) — open each published viewUrl

Mode B (Remote) — open the project runs page

Step 9: Offer to Post Visual Walkthrough to PR

Step 9.5: Offer to watch the PR for review follow-ups

Step 10: Offer feedback on failures

Tool Reference

Guardrails

Agent Dispatch

Muggle Test — Change-Driven E2E Acceptance Router

UX Guidelines — Minimize Typing

Test Case Design: One Atomic Behavior Per Test Case

Preferences

Step 1: Confirm Scope of Work (Always First)

Mode A: Local Test Generation (default for PRs)

Mode B: Remote Test Generation

Confirming (gated by defaultExecutionMode)

Step 2: Detect Local Changes (gated by autoDetectChanges)

Analysis (when scan is enabled)

Step 3: Authenticate

Step 4: Select Project (gated by autoSelectProject)

Logic

Step 5: Select Use Case (Best-Effort Shortlist)

5a: List existing use cases

5b: Best-effort match against the change summary

5c: Present the shortlist for confirmation

5d: If user picks "Pick a different use case"

5e: If user chooses "Create new use case"

Step 6: Select Test Case (Best-Effort Shortlist)

6a: List existing test cases

6b: Best-effort match against the change summary

6c: Present the shortlist for confirmation

6d: If user chooses "Generate new test case"

6e: Confirm final selection

6f: Classify execution mode per test case (replay vs regen)

Step 7A: Execute — Local Mode

Confirming (gated by `defaultExecutionMode`)

Step 2: Detect Local Changes (gated by `autoDetectChanges`)

Step 4: Select Project (gated by `autoSelectProject`)

Pre-flight question — Local URL (gated by `autoSelectLocalHost`)

Pre-flight visibility (gated by `showElectronBrowser`)

Publish each run to cloud (gated by `autoPublishLocalResults`)

Confirming (gated by `defaultExecutionMode`)

Step 2: Detect Local Changes (gated by `autoDetectChanges`)

Step 4: Select Project (gated by `autoSelectProject`)

Pre-flight question — Local URL (gated by `autoSelectLocalHost`)

Pre-flight visibility (gated by `showElectronBrowser`)

Publish each run to cloud (gated by `autoPublishLocalResults`)