| name | macos-native-app-qa |
| description | Use when the task requires end-to-end verification of a native macOS app and browser-only tooling is not enough. This skill drives apps with AppleScript, System Events, or CGEvent (via Python Quartz or Swift), verifies behavior through logs, test hooks, files, or SQLite state, captures screenshots as evidence, AND actively maintains test coverage by analyzing code changes and writing new tests. Trigger for Tauri, Electron, AppKit, or other macOS desktop QA, launch flows, menu and dialog interaction, settings changes, packaged-app smoke tests, or any request to prove a macOS app works end to end. |
| allowed-tools | Read, Write, Edit, Glob, Grep, Agent, Bash(osascript, screencapture, sqlite3, rg, grep, cp, pgrep, mkdir, sleep, test, shasum, cksum, sed, head, wc, sdef, find, ls, awk, date, pwd, touch, tr, rm, basename, python3, swift, npx, cargo, npm, cat, tail, stat, kill, nohup) |
macOS Native App QA
Use this skill for native macOS application testing when the proof has to come from more than a screenshot. The core loop is:
analyze code changes -> update/add tests -> run tests -> drive the app -> assert against logs/hooks/db/files -> capture screenshots -> record verdict
If the task is primarily browser DOM automation, use $playwright or $playwright-interactive instead. Use this skill when native window management, menu interaction, system dialogs, packaged apps, or Accessibility-driven UI scripting matter.
What this skill is for
This skill is optimized for:
- analyzing code changes to detect new features, modified behavior, or removed functionality
- writing and updating tests (unit, integration, e2e) to match current code — never skip this step
- launching and focusing native macOS apps
- clicking menus, buttons, dialogs, sheets, and toolbars through AppleScript,
System Events, or CGEvent
- verifying settings changes through logs, plist files, SQLite, or other on-disk state
- verifying flows in Tauri/Electron apps where browser tooling cannot prove native behavior
- collecting screenshots plus non-visual evidence for end-to-end proof
This skill is not optimized for:
- pure browser/webview DOM flows with stable selectors
- pixel-perfect visual regression testing
- OCR-heavy desktop automation
- cross-platform desktop automation
For the current open-source landscape and when to prefer external tools, read references/landscape.md.
Preconditions
Before using the workflow:
- Confirm this is macOS.
- Confirm the app can be launched locally.
- Confirm Terminal or Codex has Accessibility permission for
System Events UI scripting.
- Confirm screenshot capture is allowed if visual evidence is required.
- Identify at least one authoritative assertion source:
- app logs
- test hook / debug endpoint / exported state
- SQLite database
- config or plist file
- output file created by the app
Quick checks:
command -v osascript >/dev/null
command -v screencapture >/dev/null
osascript -e 'tell application "System Events" to get name of first process whose frontmost is true'
Live desktop contention
This skill drives the user's real macOS desktop. System Events, AppleScript
activation, AXUIElement focus, CGEvent keyboard input, and coordinate clicks can
all steal focus from the user's current app or send input to the wrong target if
the user is typing or moving the mouse at the same time.
Default operating rule:
- Run non-interactive verification first:
- code/test updates
- unit/integration/e2e runs
- logs, SQLite, plist, filesystem, and debug-hook assertions
- passive evidence capture that does not require input routing
- Only start live desktop driving after the non-interactive work is done.
- Before any live desktop driving, explicitly warn the user that keyboard and
mouse should stay idle for a short window.
- If the user is actively using the machine and cannot pause, stop before the
interactive phase and report the remaining native proof as pending.
Do not assume the user has stopped using the Mac just because native QA was
requested. Treat shared-desktop coordination as part of the workflow.
Permission fallback paths
Accessibility / System Events blocked:
Do not stop entirely. Fall back to CGEvent for keyboard input (shortcuts, typing) and AXUIElement via Swift for reading UI state. CGEvent keyboard events do not require Accessibility permission when the app is already focused. You cannot click arbitrary UI elements without Accessibility, but you can drive flows via keyboard shortcuts and menu accelerators. State the limitation clearly in the verdict.
Screen Recording blocked:
screencapture will fail or produce a blank image. Use CGDisplayCreateImage via Python Quartz as an alternative (it may succeed where screencapture does not depending on TCC state). If all screenshot methods fail, record the limitation in missing.txt and rely on non-visual assertions. A valid verdict is still possible as "PASS with caveats" if the non-visual assertion passed.
Closed Verification Contract
Never mark a native-app scenario as verified from UI interaction alone.
Each verified scenario needs all applicable parts:
-
Action
The app was driven through a real user path:
- app AppleScript dictionary when available
- otherwise
System Events
- CGEvent keyboard shortcuts or AXUIElement when Accessibility blocks System Events
- coordinate-based fallbacks only when necessary
-
Assertion
A non-visual source confirms the expected behavior:
- log line
- SQLite row or query result
- config or state file change
- explicit test hook output
- process or filesystem side effect
-
Evidence
Capture screenshot evidence for the visible state, plus saved assertion artifacts.
The pass condition is:
user action path completed AND authoritative assertion passed AND evidence was saved
Screenshots support the claim. They do not prove hidden state by themselves.
For a deeper proof model and assertion patterns, read references/assertion-model.md.
Workflow
0. Analyze code and update tests (ALWAYS run this step)
This step is mandatory on every invocation. Code changes constantly — tests must keep up.
0a. Detect what changed
Use the Agent tool to scan for code changes since the last QA run. Compare source file timestamps against the last evidence directory:
LAST_QA=$(ls -td output/native-app-qa/*/ 2>/dev/null | head -1)
find src/ src-tauri/src/ \( -name '*.ts' -o -name '*.tsx' -o -name '*.rs' \) \
-newer "$LAST_QA" 2>/dev/null
If no files changed, still proceed to step 0b to check for coverage gaps.
0b. Analyze code for untested features
Use the Agent tool (subagent_type: Explore) to:
- List all source modules and components — build a feature inventory from the actual code
- List all test files — map which features have tests
- Identify coverage gaps — find features, components, functions, IPC commands, or store actions that lack tests
- Check changed files — for any file that changed, verify its test file covers the new/modified behavior
Key patterns to check:
- Every Tauri IPC command in
commands.rs should have corresponding test coverage
- Every Zustand store action should have a test
- Every React component with user interaction should have a test
- Every Rust module with public functions should have tests
- New keyboard shortcuts or menu items need Workspace test coverage
0c. Write or update tests
Do not skip this. For every gap found in 0b:
- Read the source file to understand the feature behavior
- Read existing test files to understand patterns and conventions used in the project
- Write new tests or update existing ones following the same patterns:
- Frontend: Vitest + @testing-library/react + jsdom (see existing
*.test.tsx files)
- Rust:
#[cfg(test)] modules or separate test binaries (see src-tauri/tests/)
- Run the tests to confirm they pass:
npx vitest run
cargo test
- Check for warnings in both build and test output:
cargo build 2>&1 | grep -i warning
npx tsc --noEmit 2>&1
If any test fails, fix it before proceeding. If any warning appears, fix it.
0d. Verify zero warnings
Before moving to native QA, confirm:
cargo build produces zero warnings
cargo test produces zero warnings and all tests pass
npx vitest run produces zero failures
npx tsc --noEmit produces zero errors
Only proceed to step 1 when all tests pass and all warnings are resolved.
1. Build the verification inventory
Before touching the app, write a compact checklist:
- user-visible claim
- exact UI action path
- assertion source
- evidence to capture
Use a table like:
| Scenario | UI action | Assertion source | Evidence |
|---|
| Open settings | Menu click | app log or state file | screenshot + copied log |
| Toggle provider path | checkbox/button | SQLite row or config file | screenshot + query output |
Do not test vague goals like "looks fine". Convert them into observable checks.
2. Choose the control surface
Use the least brittle control surface that still proves the behavior.
- Prefer the app's own AppleScript dictionary if it has one.
Check:
sdef /Applications/MyApp.app 2>/dev/null | head -n 40
-
If there is no useful script dictionary, use System Events.
-
If System Events is blocked by Accessibility permissions, use CGEvent for keyboard shortcuts and typing (via Python Quartz or Swift). CGEvent can send key-down/key-up events to the frontmost app without Accessibility permission. For reading UI element state, use AXUIElement via Swift.
-
If the UI is canvas/WebGL-heavy and hard to read through Accessibility, still use AppleScript, System Events, or CGEvent for interaction, but rely on logs, DB state, or test hooks for the truth.
-
Only fall back to coordinate or OCR tools when direct scripting, Accessibility, and CGEvent are not enough. See references/landscape.md.
3. Create an evidence directory
Use a dedicated output folder per run:
OUT="output/native-app-qa/$(date +%Y%m%d-%H%M%S)"
mkdir -p "$OUT"
Capture baseline evidence before making changes when useful.
4. Drive the app
Only begin this step after steps 0-3 are complete.
If the step requires live desktop input routing, first tell the user that native
UI driving is about to begin and ask them to avoid keyboard/mouse interaction
until the step finishes. Keep this pause window as short as possible.
Use scripts/run-applescript.sh for one-liners or here-doc scripts.
Activate an app:
.agents/skills/macos-native-app-qa/scripts/run-applescript.sh -e 'tell application "Kartix" to activate'
Use System Events for menu and button flows:
.agents/skills/macos-native-app-qa/scripts/run-applescript.sh <<'APPLESCRIPT'
tell application "Kartix" to activate
tell application "System Events"
tell process "Kartix"
click menu item "Settings..." of menu "Kartix" of menu bar 1
end tell
end tell
APPLESCRIPT
Prefer stable menu names, button labels, and sheet titles over brittle UI indexing. If you must use indexes, note that the flow is fragile.
When System Events is blocked, use CGEvent as a fallback for keyboard input:
Python Quartz (Cmd+comma to open Settings):
python3 -c "
import Quartz, time
src = Quartz.CGEventSourceCreate(Quartz.kCGEventSourceStateHIDSystemState)
# key code 43 = comma
down = Quartz.CGEventCreateKeyboardEvent(src, 43, True)
up = Quartz.CGEventCreateKeyboardEvent(src, 43, False)
Quartz.CGEventSetFlags(down, Quartz.kCGEventFlagMaskCommand)
Quartz.CGEventSetFlags(up, Quartz.kCGEventFlagMaskCommand)
Quartz.CGEventPost(Quartz.kCGHIDEventTap, down)
Quartz.CGEventPost(Quartz.kCGHIDEventTap, up)
"
Swift (Cmd+Q to quit):
swift -e '
import Cocoa
let src = CGEventSource(stateID: .hidSystemState)
let down = CGEvent(keyboardEventSource: src, virtualKey: 12, keyDown: true)!
let up = CGEvent(keyboardEventSource: src, virtualKey: 12, keyDown: false)!
down.flags = .maskCommand; up.flags = .maskCommand
down.post(tap: .cghidEventTap); up.post(tap: .cghidEventTap)
'
CGEvent keyboard input combined with non-visual assertion (logs, DB, files) is a valid action+assertion pattern.
5. Assert against authoritative state
Use backend truth for hidden or important state.
Examples:
.agents/skills/macos-native-app-qa/scripts/log-assert.sh "$HOME/Library/Logs/kartix/app.log" "settings window opened"
.agents/skills/macos-native-app-qa/scripts/sqlite-assert.sh \
"$HOME/Library/Application Support/kartix/kartix.db" \
"select count(*) from app_state;" \
--min 1
sqlite-assert.sh is intentionally strict:
-
the database file must already exist
-
--equals, --min, and --max require exactly one result row
-
queries run in read-only mode
-
Files:
test -f "$HOME/Library/Application Support/kartix/kartix.toml"
When the app under test has debug hooks, prefer them over scraping UI text.
6. Capture evidence
Use scripts/capture-evidence.sh after each important state:
.agents/skills/macos-native-app-qa/scripts/capture-evidence.sh \
--app "Kartix" \
--out "$OUT" \
--log "$HOME/Library/Logs/kartix/app.log" \
--db "$HOME/Library/Application Support/kartix/kartix.db"
This captures:
- a screenshot
- basic metadata
- matching process info
- copies or backups of supplied logs and databases
If screenshot capture is blocked by permissions or the environment, the helper records that in missing.txt instead of discarding the rest of the evidence. Treat that scenario as blocked or evidence-missing, not a clean pass.
The evidence helper also records the permission surface it actually observed:
accessibility_status=ok|blocked
screen_recording_status=ok|blocked|skipped
screenshot_skipped=1 when --skip-screenshot is intentional
- explicit activation and frontmost-app mismatches when the target app could not be focused
Important evidence hygiene rules:
- the screenshot helper captures a full-screen image, not a guaranteed app-only crop
- when
--app is provided, it activates that app first and records whether it actually became frontmost
- raw log and DB copies should be used only for isolated test data or explicitly non-sensitive artifacts
- prefer targeted assertions and exported debug state over copying whole production logs or databases
7. Record the verdict
For each scenario, report:
pass -- action completed, assertion passed, all evidence captured.
pass with caveats -- action completed and non-visual assertion passed, but some evidence was blocked by permissions (e.g., screenshot blocked by Screen Recording, System Events blocked so CGEvent was used instead). State what was limited.
fail -- action or assertion did not produce the expected result.
blocked -- permissions, missing app, or missing paths prevented the test from running at all.
A valid pass (including "pass with caveats") must say what action was performed, what assertion passed, and where the evidence was saved. A "pass with caveats" must also state which evidence or control surface was degraded and why.
The verdict report MUST also include:
- Tests added/updated: list any new or modified test files with test count changes
- Coverage gaps remaining: list any features still lacking test coverage
- Warnings status: confirm zero compiler/linter warnings, or list any remaining
Recommended usage patterns
Launch and smoke test a packaged app
- launch app
- confirm window appears and app becomes frontmost
- assert log startup markers or DB initialization
- capture screenshot
Verify a settings change
- open settings through menu or shortcut
- toggle control
- assert config or DB state changed
- capture screenshot of settings UI
Verify state persistence across relaunch
- create state in app
- assert DB or file state exists
- quit and relaunch
- verify restored state in UI and persistent store
Verify agent or provider detection in a desktop app
- open provider or agent settings UI
- run detection or refresh flow
- assert log lines, DB rows, or config updates
- capture screenshot of visible badges or status
Verify terminal-style apps with canvas output
- use AppleScript or
System Events only for window, tab, menu, and input routing
- do not claim terminal text correctness from canvas pixels alone
- assert via logs, PTY transcripts, SQLite session state, or test hooks instead
Tooling bundled with this skill
scripts/run-applescript.sh
Thin wrapper around osascript for inline or piped AppleScript.
scripts/capture-evidence.sh
Creates a run artifact folder with screenshot, logs, DB backups, and metadata.
scripts/sqlite-assert.sh
Runs a query and enforces a simple expected result.
scripts/log-assert.sh
Checks that a log file contains an expected string or regex and counts total matches.
scripts/self-test.sh
Runs a local smoke test for permissions, assertions, evidence capture, and global resolution prerequisites.
References
Open only what you need:
references/landscape.md
Current open-source AppleScript, MCP, and native desktop automation tooling.
references/assertion-model.md
How to turn native UI flows into trustworthy end-to-end proofs.
references/use-cases.md
Detailed scenario catalog for native macOS app QA.
Guardrails
- NEVER skip step 0 (code analysis and test maintenance). Even if no files changed, check for coverage gaps. The codebase evolves constantly and tests must keep up.
- NEVER report "no changes, skipping" without first analyzing test coverage gaps. A file may not have changed, but its tests may be missing or incomplete.
- Do not claim "end-to-end verified" if you only drove the UI but never asserted state.
- Do not claim terminal or canvas text correctness from screenshots alone.
- Do not silently ignore Accessibility or Screen Recording permission failures.
- Prefer targeted assertions and scoped artifacts over copying full logs or databases.
- When a native app has both web content and native chrome, use this skill for the native parts and
$playwright or $playwright-interactive for the browser parts.
- When writing tests, follow existing project conventions exactly — read existing test files first.
- Always run
cargo build, cargo test, npx vitest run, and npx tsc --noEmit and confirm zero warnings/errors before recording a verdict.