Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

desktop-app-testing

Étoiles1

Forks0

Mis à jour6 juin 2026 à 11:14

Use when the user wants to live-test a built Windows desktop application (.exe) end-to-end inside the agent-os Windows 11 VM and get a works/doesn't-work + UI/UX report — launching the real binary and driving it, not writing test code. Triggers: "test my Windows app", "QA this .exe", "run my desktop build on agent-os and tell me what breaks", "click through the app", "review the desktop app's UX", "does my exe work", "test the built binary in the Windows VM". Gets the .exe into the VM, launches it, enumerates controls from the UI Automation tree, drives every feature (click/type by element), watches for crashes/hangs/error dialogs, captures screenshots + control-tree dumps, and emits a structured report. Drives the agent-os VM via the Windows-MCP gateway. Sibling of web-app-testing and android-app-testing (shared report format). Does NOT fire for: building/coding a desktop app, the user's personal Windows on steamy (use nircmd), or general agent-os VM driving (use agent-os).

Installation

Installer avec Codex ou Claude Copiez ce prompt, collez-le dans Codex, Claude ou un autre assistant, puis laissez-le vérifier la page du skill et l'installer pour vous.

Exécuter dans Manus

Source

jmagar

jmagar/lab

Ouvrir le dépôt GitHub Voir les dépôts du créateur

Téléchargement

Exécuter dans Manus

Métiers associésSOC

Basé sur la classification professionnelle SOC

Analystes en assurance qualité des logiciels et testeursProfessions informatiques et mathématiques·SOC 15-1253

Explorateur de fichiers

6 fichiers

SKILL.md

readonly

name

desktop-app-testing

description

desktop-app-testing

Live, end-to-end testing of a built Windows .exe inside the agent-os Windows 11 VM: transfer the build in, launch it, drive every feature, watch for crashes/hangs/error dialogs, review UI/UX, and emit a structured works/doesn't-work report. Companion to web-app-testing and android-app-testing — all three share one report format (references/report-format.md).

Drives the agent-os VM (container agent-os-win11, dockur/windows on dookie) through the agent-os_windows-mcp upstream on the Lab gateway. Builds on the agent-os skill (which is the general VM driver) but adds the testing harness: build transfer, feature enumeration, failure taxonomy, evidence pipeline, and the report.

When to use vs. neighbors

This skill — a test pass + report over a desktop app's features and UX.
agent-os — general driving of the Windows VM (install software, run PowerShell, one-off tasks).
nircmd — the user's PERSONAL Windows on steamy, not the sandbox. Never target steamy here.

⚠️ Destructive-action gate — read first

Every drive action (PowerShell, Click, Type, App, Process, …) is destructive=true and gated by the gateway. An authenticated admin (lab/lab:admin scope) passes — this was fixed in lab commit e87940c0. If drive calls return confirmation_required: "...destructive=true...", the gateway predates the fix; rebuild + redeploy (see references/windows-mcp-calls.md). Read-only Screenshot/Snapshot are never gated, so a UI/UX review works even if the gate isn't lifted.

Fallback — Windows-MCP not reachable

If agent-os_windows-mcp isn't a connected upstream (the Lab gateway is in code-mode and its execute interface isn't exposed to the session, or the server simply isn't wired in), you can still do the whole pass over plain ssh agent-os: launch + screenshot the native GUI via a /it scheduled task in the interactive console session (session 0 over SSH has no window station — a GUI there crashes with os error 1459 and screenshots come back blank), and iterate the frontend in a browser without rebuilding. Full recipe: references/ssh-fallback-capture.md.

Prerequisites

The agent-os VM running on dookie (the skill's preflight starts it if absent).
The Lab gateway reachable with an execute-capable scope (for drive actions).
The built .exe/installer on this host (or a URL the guest can fetch).

Workflow

Preflight.
- VM up? ssh dookie 'docker ps --format "{{.Names}}" | grep agent-os-win11'. If absent: ssh dookie 'cd /home/jmagar/compose/windows && docker compose up -d' (boots existing install, ~5 min cold; Windows-MCP auto-starts via an in-guest scheduled task).
- MCP ready? Call Screenshot {} — an image back means ready. (Do NOT TCP-probe :8765, false negative.)
- Drive gate? Do one cheap destructive call (Process {mode:"list"}) — if it returns data, the gate is open; if confirmation_required, fix the gateway first.
- Create run dir ~/.agents/docs/sessions/<app>-desktop-test/run_<id>/.
Transfer the build into the VM (see references/windows-mcp-calls.md): HTTP-pull via PowerShell Invoke-WebRequest (verified reachable) or SCP to the agent-os guest (scp ... agent-os: / host forward tootie:2222). Unblock-File the copied binary; pre-create a firewall allow rule if it binds a port.
Launch. Prefer PowerShell {command:"Start-Process 'C:\\...\\app.exe'; ...return PID"} (Start-menu App {name} is unreliable for arbitrary binaries). Confirm a PID came back.
Wait for ready, map controls. WaitFor {condition:"active_window", window_name:"<title>"} (short timeout in a retry loop), then Snapshot {} — enumerate menus, buttons, tabs, fields from the UI Automation tree. Build the feature checklist (merge with any user spec). Write plan.md.
Exercise each feature — the act-by-label loop: Snapshot → pick the target element's integer label → Click {label} / Type {label, text} → Snapshot again (UI changed, ids are stale) → repeat. Screenshot between steps for evidence (doesn't invalidate ids). Use Shortcut for keyboard ops. Type/Click REQUIRE loc or label — there is no implicit-focus typing.
Detect failures after each action:
- Crash/exit — Process {mode:"list", name} shows the PID gone; check Get-WinEvent -LogName Application for Error/Critical. → FAIL.
- Hang — WaitFor times out / Snapshot shows "(Not Responding)". → FAIL.
- Error dialog — Snapshot surfaces dialog text; screenshot it. → FAIL/PARTIAL.
- Wrong output / no feedback — expected UI change didn't happen. → PARTIAL/FAIL.
- Can't reach — needs creds/data/license the run lacks. → BLOCKED.
Reset between independent features — Process {mode:"kill", name} then relaunch, so input/ mode state doesn't leak across tests.
UX/a11y pass — score the report-format rubric from snapshots + screenshots. Interactive elements with no accessible name in the UIA tree = accessibility findings.
Write the report → report.md + result.json in the run dir, per references/report-format.md. Save screenshots to evidence/ and index them.

Gotchas (live-validated)

Prefer PowerShell Start-Process over App {name} to launch an arbitrary .exe.
Re-Snapshot after every UI change — label ids are valid only against the latest Snapshot.
Snapshot output overflows the Code Mode envelope (~24KB) — filter/slice the tree text in the sandbox before returning; don't dump the whole tree.
Custom-rendered apps (GPUI, some Electron/canvas) expose little to UI Automation → Snapshot is sparse; fall back to Screenshot + coordinate clicks + SendKeys, and flag reduced confidence.
MCP calls can time out ~120s — chunk long operations; use short WaitFor in a retry loop.
Security prompts stall unattended runs — Unblock-File, SEE_MASK_NOZONECHECKS=1, pre-create firewall rules before expecting hands-off captures.

References

references/windows-mcp-calls.md — verified tool names, params, call patterns, the destructive gate, build-transfer recipes, evidence capture.
references/ssh-fallback-capture.md — SSH-only launch + native capture via a schtasks /it interactive-session task when Windows-MCP isn't connected, plus an in-process-vite + Edge browser dev-loop for iterating Tauri/web frontends against a real backend.
references/report-format.md — shared cross-platform report spec, run-dir layout, verdicts.

Plus depuis ce dépôt

même dépôt

creating-snippets

jmagar/lab

Use when creating, editing, validating, testing, running, explaining, or removing Labby Code Mode snippets; when a user wants a reusable workflow made from gateway MCP tools; or when building schema-backed snippets from upstream tool ids, JSON schemas, params, inputs, defaults, artifacts, and MCP/CLI snippet actions.

This skill should be used when the user wants to automate an iOS Simulator, drive an Android device or emulator via MCP tools, test on multiple devices simultaneously, run a visual regression baseline on mobile, audit accessibility on a real device, release an app to the Play Store, or use the claude-in-mobile CLI or MCP server. Does not apply for structured end-to-end test reports (use android-app-testing or desktop-app-testing for those).

2026-06-151

mcpjam-inspector

jmagar/lab

Interpret and use `mcpjam` probe, doctor, OAuth, apps conformance, tools, resources, and prompts output conservatively against MCP 2025-11-25. Use when interacting with MCP servers, executing tools, triaging findings, performing security reviews, deciding whether a CLI finding is real or overstated, or turning inspection output into an engineer-facing report with severity and confidence.

2026-06-131

repo-status

jmagar/lab

Audit the current Git checkout, open worktrees, local branches, stale or merged cleanup candidates, merge readiness, conflicts, PR/CI/test state, blockers, and safest merge order. Use when the user asks for repo status, branch/worktree cleanup candidates, stale branch review, conflict investigation, merge readiness, or what must be done before open branches can merge.

2026-06-131

using-labby

jmagar/lab

This skill should be used when the user mentions labby, the labby CLI, the Lab gateway, or any Lab operator surface. Triggers include: "run labby doctor", "check labby health", "start the labby MCP server", "configure ~/.lab/.env", "search upstream MCP tools with Code Mode", "use labby gateway to import servers", "manage the Labby marketplace", "reload the gateway", or any request to run labby CLI commands, inspect gateway upstreams, or dispatch an action against a Lab service.

2026-06-131

name

desktop-app-testing

description

desktop-app-testing

When to use vs. neighbors

This skill — a test pass + report over a desktop app's features and UX.
agent-os — general driving of the Windows VM (install software, run PowerShell, one-off tasks).
nircmd — the user's PERSONAL Windows on steamy, not the sandbox. Never target steamy here.

⚠️ Destructive-action gate — read first

Fallback — Windows-MCP not reachable

Prerequisites

The agent-os VM running on dookie (the skill's preflight starts it if absent).
The Lab gateway reachable with an execute-capable scope (for drive actions).
The built .exe/installer on this host (or a URL the guest can fetch).

Workflow

Preflight.
- VM up? ssh dookie 'docker ps --format "{{.Names}}" | grep agent-os-win11'. If absent: ssh dookie 'cd /home/jmagar/compose/windows && docker compose up -d' (boots existing install, ~5 min cold; Windows-MCP auto-starts via an in-guest scheduled task).
- MCP ready? Call Screenshot {} — an image back means ready. (Do NOT TCP-probe :8765, false negative.)
- Drive gate? Do one cheap destructive call (Process {mode:"list"}) — if it returns data, the gate is open; if confirmation_required, fix the gateway first.
- Create run dir ~/.agents/docs/sessions/<app>-desktop-test/run_<id>/.
Transfer the build into the VM (see references/windows-mcp-calls.md): HTTP-pull via PowerShell Invoke-WebRequest (verified reachable) or SCP to the agent-os guest (scp ... agent-os: / host forward tootie:2222). Unblock-File the copied binary; pre-create a firewall allow rule if it binds a port.
Launch. Prefer PowerShell {command:"Start-Process 'C:\\...\\app.exe'; ...return PID"} (Start-menu App {name} is unreliable for arbitrary binaries). Confirm a PID came back.
Wait for ready, map controls. WaitFor {condition:"active_window", window_name:"<title>"} (short timeout in a retry loop), then Snapshot {} — enumerate menus, buttons, tabs, fields from the UI Automation tree. Build the feature checklist (merge with any user spec). Write plan.md.
Exercise each feature — the act-by-label loop: Snapshot → pick the target element's integer label → Click {label} / Type {label, text} → Snapshot again (UI changed, ids are stale) → repeat. Screenshot between steps for evidence (doesn't invalidate ids). Use Shortcut for keyboard ops. Type/Click REQUIRE loc or label — there is no implicit-focus typing.
Detect failures after each action:
- Crash/exit — Process {mode:"list", name} shows the PID gone; check Get-WinEvent -LogName Application for Error/Critical. → FAIL.
- Hang — WaitFor times out / Snapshot shows "(Not Responding)". → FAIL.
- Error dialog — Snapshot surfaces dialog text; screenshot it. → FAIL/PARTIAL.
- Wrong output / no feedback — expected UI change didn't happen. → PARTIAL/FAIL.
- Can't reach — needs creds/data/license the run lacks. → BLOCKED.
Reset between independent features — Process {mode:"kill", name} then relaunch, so input/ mode state doesn't leak across tests.
UX/a11y pass — score the report-format rubric from snapshots + screenshots. Interactive elements with no accessible name in the UIA tree = accessibility findings.
Write the report → report.md + result.json in the run dir, per references/report-format.md. Save screenshots to evidence/ and index them.

Gotchas (live-validated)

Prefer PowerShell Start-Process over App {name} to launch an arbitrary .exe.
Re-Snapshot after every UI change — label ids are valid only against the latest Snapshot.
Snapshot output overflows the Code Mode envelope (~24KB) — filter/slice the tree text in the sandbox before returning; don't dump the whole tree.
Custom-rendered apps (GPUI, some Electron/canvas) expose little to UI Automation → Snapshot is sparse; fall back to Screenshot + coordinate clicks + SendKeys, and flag reduced confidence.
MCP calls can time out ~120s — chunk long operations; use short WaitFor in a retry loop.
Security prompts stall unattended runs — Unblock-File, SEE_MASK_NOZONECHECKS=1, pre-create firewall rules before expecting hands-off captures.

References

references/windows-mcp-calls.md — verified tool names, params, call patterns, the destructive gate, build-transfer recipes, evidence capture.
references/ssh-fallback-capture.md — SSH-only launch + native capture via a schtasks /it interactive-session task when Windows-MCP isn't connected, plus an in-process-vite + Edge browser dev-loop for iterating Tauri/web frontends against a real backend.
references/report-format.md — shared cross-platform report spec, run-dir layout, verdicts.