Run any Skill in Manus with one click

macos-native-app-qa

Use when the task requires end-to-end verification of a native macOS app and browser-only tooling is not enough. This skill drives apps with AppleScript, System Events, or CGEvent (via Python Quartz or Swift), verifies behavior through logs, test hooks, files, or SQLite state, captures screenshots as evidence, AND actively maintains test coverage by analyzing code changes and writing new tests. Trigger for Tauri, Electron, AppKit, or other macOS desktop QA, launch flows, menu and dialog interaction, settings changes, packaged-app smoke tests, or any request to prove a macOS app works end to end.

Run Skill in Manus

Stars1

Forks0

UpdatedMarch 11, 2026 at 16:57

Source

FutureAtoms

FutureAtoms/Kartix

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

Useful forSOC

Software Quality Assurance Analysts and TestersComputer and Mathematical Occupations15-1253L4

File Explorer

12 files

SKILL.md

readonly

macOS Native App QA

Use this skill for native macOS application testing when the proof has to come from more than a screenshot. The core loop is:

analyze code changes -> update/add tests -> run tests -> drive the app -> assert against logs/hooks/db/files -> capture screenshots -> record verdict

If the task is primarily browser DOM automation, use $playwright or $playwright-interactive instead. Use this skill when native window management, menu interaction, system dialogs, packaged apps, or Accessibility-driven UI scripting matter.

What this skill is for

This skill is optimized for:

analyzing code changes to detect new features, modified behavior, or removed functionality
writing and updating tests (unit, integration, e2e) to match current code — never skip this step
launching and focusing native macOS apps
clicking menus, buttons, dialogs, sheets, and toolbars through AppleScript, System Events, or CGEvent
verifying settings changes through logs, plist files, SQLite, or other on-disk state
verifying flows in Tauri/Electron apps where browser tooling cannot prove native behavior
collecting screenshots plus non-visual evidence for end-to-end proof

This skill is not optimized for:

pure browser/webview DOM flows with stable selectors
pixel-perfect visual regression testing
OCR-heavy desktop automation
cross-platform desktop automation

For the current open-source landscape and when to prefer external tools, read references/landscape.md.

Preconditions

Before using the workflow:

Confirm this is macOS.
Confirm the app can be launched locally.
Confirm Terminal or Codex has Accessibility permission for System Events UI scripting.
Confirm screenshot capture is allowed if visual evidence is required.
Identify at least one authoritative assertion source:
- app logs
- test hook / debug endpoint / exported state
- SQLite database
- config or plist file
- output file created by the app

Quick checks:

command -v osascript >/dev/null
command -v screencapture >/dev/null
osascript -e 'tell application "System Events" to get name of first process whose frontmost is true'

Live desktop contention

This skill drives the user's real macOS desktop. System Events, AppleScript activation, AXUIElement focus, CGEvent keyboard input, and coordinate clicks can all steal focus from the user's current app or send input to the wrong target if the user is typing or moving the mouse at the same time.

Default operating rule:

Run non-interactive verification first:
- code/test updates
- unit/integration/e2e runs
- logs, SQLite, plist, filesystem, and debug-hook assertions
- passive evidence capture that does not require input routing
Only start live desktop driving after the non-interactive work is done.
Before any live desktop driving, explicitly warn the user that keyboard and mouse should stay idle for a short window.
If the user is actively using the machine and cannot pause, stop before the interactive phase and report the remaining native proof as pending.

Do not assume the user has stopped using the Mac just because native QA was requested. Treat shared-desktop coordination as part of the workflow.

Permission fallback paths

Accessibility / System Events blocked: Do not stop entirely. Fall back to CGEvent for keyboard input (shortcuts, typing) and AXUIElement via Swift for reading UI state. CGEvent keyboard events do not require Accessibility permission when the app is already focused. You cannot click arbitrary UI elements without Accessibility, but you can drive flows via keyboard shortcuts and menu accelerators. State the limitation clearly in the verdict.

Screen Recording blocked: screencapture will fail or produce a blank image. Use CGDisplayCreateImage via Python Quartz as an alternative (it may succeed where screencapture does not depending on TCC state). If all screenshot methods fail, record the limitation in missing.txt and rely on non-visual assertions. A valid verdict is still possible as "PASS with caveats" if the non-visual assertion passed.

Closed Verification Contract

Never mark a native-app scenario as verified from UI interaction alone.

Each verified scenario needs all applicable parts:

Action The app was driven through a real user path:
- app AppleScript dictionary when available
- otherwise System Events
- CGEvent keyboard shortcuts or AXUIElement when Accessibility blocks System Events
- coordinate-based fallbacks only when necessary
Assertion A non-visual source confirms the expected behavior:
- log line
- SQLite row or query result
- config or state file change
- explicit test hook output
- process or filesystem side effect
Evidence Capture screenshot evidence for the visible state, plus saved assertion artifacts.

The pass condition is:

user action path completed AND authoritative assertion passed AND evidence was saved

Screenshots support the claim. They do not prove hidden state by themselves.

For a deeper proof model and assertion patterns, read references/assertion-model.md.

Workflow

0. Analyze code and update tests (ALWAYS run this step)

This step is mandatory on every invocation. Code changes constantly — tests must keep up.

0a. Detect what changed

Use the Agent tool to scan for code changes since the last QA run. Compare source file timestamps against the last evidence directory:

# Find the last QA evidence directory
LAST_QA=$(ls -td output/native-app-qa/*/ 2>/dev/null | head -1)

# Find all source files changed since then
find src/ src-tauri/src/ \( -name '*.ts' -o -name '*.tsx' -o -name '*.rs' \) \
  -newer "$LAST_QA" 2>/dev/null

If no files changed, still proceed to step 0b to check for coverage gaps.

0b. Analyze code for untested features

Use the Agent tool (subagent_type: Explore) to:

List all source modules and components — build a feature inventory from the actual code
List all test files — map which features have tests
Identify coverage gaps — find features, components, functions, IPC commands, or store actions that lack tests
Check changed files — for any file that changed, verify its test file covers the new/modified behavior

Key patterns to check:

Every Tauri IPC command in commands.rs should have corresponding test coverage
Every Zustand store action should have a test
Every React component with user interaction should have a test
Every Rust module with public functions should have tests
New keyboard shortcuts or menu items need Workspace test coverage

0c. Write or update tests

Do not skip this. For every gap found in 0b:

Read the source file to understand the feature behavior
Read existing test files to understand patterns and conventions used in the project
Write new tests or update existing ones following the same patterns:
- Frontend: Vitest + @testing-library/react + jsdom (see existing *.test.tsx files)
- Rust: #[cfg(test)] modules or separate test binaries (see src-tauri/tests/)

Run the tests to confirm they pass:

npx vitest run           # frontend
cargo test               # rust

Check for warnings in both build and test output:

cargo build 2>&1 | grep -i warning
npx tsc --noEmit 2>&1

If any test fails, fix it before proceeding. If any warning appears, fix it.

0d. Verify zero warnings

Before moving to native QA, confirm:

cargo build produces zero warnings
cargo test produces zero warnings and all tests pass
npx vitest run produces zero failures
npx tsc --noEmit produces zero errors

Only proceed to step 1 when all tests pass and all warnings are resolved.

1. Build the verification inventory

Before touching the app, write a compact checklist:

user-visible claim
exact UI action path
assertion source
evidence to capture

Use a table like:

Scenario	UI action	Assertion source	Evidence
Open settings	Menu click	app log or state file	screenshot + copied log
Toggle provider path	checkbox/button	SQLite row or config file	screenshot + query output

Do not test vague goals like "looks fine". Convert them into observable checks.

2. Choose the control surface

Use the least brittle control surface that still proves the behavior.

Prefer the app's own AppleScript dictionary if it has one.

Check:

sdef /Applications/MyApp.app 2>/dev/null | head -n 40

If there is no useful script dictionary, use System Events.
If System Events is blocked by Accessibility permissions, use CGEvent for keyboard shortcuts and typing (via Python Quartz or Swift). CGEvent can send key-down/key-up events to the frontmost app without Accessibility permission. For reading UI element state, use AXUIElement via Swift.
If the UI is canvas/WebGL-heavy and hard to read through Accessibility, still use AppleScript, System Events, or CGEvent for interaction, but rely on logs, DB state, or test hooks for the truth.
Only fall back to coordinate or OCR tools when direct scripting, Accessibility, and CGEvent are not enough. See references/landscape.md.

3. Create an evidence directory

Use a dedicated output folder per run:

OUT="output/native-app-qa/$(date +%Y%m%d-%H%M%S)"
mkdir -p "$OUT"

Capture baseline evidence before making changes when useful.

4. Drive the app

Only begin this step after steps 0-3 are complete.

If the step requires live desktop input routing, first tell the user that native UI driving is about to begin and ask them to avoid keyboard/mouse interaction until the step finishes. Keep this pause window as short as possible.

Use scripts/run-applescript.sh for one-liners or here-doc scripts.

Activate an app:

.agents/skills/macos-native-app-qa/scripts/run-applescript.sh -e 'tell application "Kartix" to activate'

Use System Events for menu and button flows:

.agents/skills/macos-native-app-qa/scripts/run-applescript.sh <<'APPLESCRIPT'
tell application "Kartix" to activate
tell application "System Events"
  tell process "Kartix"
    click menu item "Settings..." of menu "Kartix" of menu bar 1
  end tell
end tell
APPLESCRIPT

Prefer stable menu names, button labels, and sheet titles over brittle UI indexing. If you must use indexes, note that the flow is fragile.

When System Events is blocked, use CGEvent as a fallback for keyboard input:

Python Quartz (Cmd+comma to open Settings):

python3 -c "
import Quartz, time
src = Quartz.CGEventSourceCreate(Quartz.kCGEventSourceStateHIDSystemState)
# key code 43 = comma
down = Quartz.CGEventCreateKeyboardEvent(src, 43, True)
up   = Quartz.CGEventCreateKeyboardEvent(src, 43, False)
Quartz.CGEventSetFlags(down, Quartz.kCGEventFlagMaskCommand)
Quartz.CGEventSetFlags(up,   Quartz.kCGEventFlagMaskCommand)
Quartz.CGEventPost(Quartz.kCGHIDEventTap, down)
Quartz.CGEventPost(Quartz.kCGHIDEventTap, up)
"

Swift (Cmd+Q to quit):

swift -e '
import Cocoa
let src = CGEventSource(stateID: .hidSystemState)
let down = CGEvent(keyboardEventSource: src, virtualKey: 12, keyDown: true)!
let up   = CGEvent(keyboardEventSource: src, virtualKey: 12, keyDown: false)!
down.flags = .maskCommand; up.flags = .maskCommand
down.post(tap: .cghidEventTap); up.post(tap: .cghidEventTap)
'

CGEvent keyboard input combined with non-visual assertion (logs, DB, files) is a valid action+assertion pattern.

5. Assert against authoritative state

Use backend truth for hidden or important state.

Examples:

Logs:

.agents/skills/macos-native-app-qa/scripts/log-assert.sh "$HOME/Library/Logs/kartix/app.log" "settings window opened"

SQLite:

.agents/skills/macos-native-app-qa/scripts/sqlite-assert.sh \
  "$HOME/Library/Application Support/kartix/kartix.db" \
  "select count(*) from app_state;" \
  --min 1

sqlite-assert.sh is intentionally strict:

the database file must already exist
--equals, --min, and --max require exactly one result row
queries run in read-only mode
Files:

test -f "$HOME/Library/Application Support/kartix/kartix.toml"

When the app under test has debug hooks, prefer them over scraping UI text.

6. Capture evidence

Use scripts/capture-evidence.sh after each important state:

.agents/skills/macos-native-app-qa/scripts/capture-evidence.sh \
  --app "Kartix" \
  --out "$OUT" \
  --log "$HOME/Library/Logs/kartix/app.log" \
  --db "$HOME/Library/Application Support/kartix/kartix.db"

This captures:

a screenshot
basic metadata
matching process info
copies or backups of supplied logs and databases

If screenshot capture is blocked by permissions or the environment, the helper records that in missing.txt instead of discarding the rest of the evidence. Treat that scenario as blocked or evidence-missing, not a clean pass.

The evidence helper also records the permission surface it actually observed:

accessibility_status=ok|blocked
screen_recording_status=ok|blocked|skipped
screenshot_skipped=1 when --skip-screenshot is intentional
explicit activation and frontmost-app mismatches when the target app could not be focused

Important evidence hygiene rules:

the screenshot helper captures a full-screen image, not a guaranteed app-only crop
when --app is provided, it activates that app first and records whether it actually became frontmost
raw log and DB copies should be used only for isolated test data or explicitly non-sensitive artifacts
prefer targeted assertions and exported debug state over copying whole production logs or databases

7. Record the verdict

For each scenario, report:

pass -- action completed, assertion passed, all evidence captured.
pass with caveats -- action completed and non-visual assertion passed, but some evidence was blocked by permissions (e.g., screenshot blocked by Screen Recording, System Events blocked so CGEvent was used instead). State what was limited.
fail -- action or assertion did not produce the expected result.
blocked -- permissions, missing app, or missing paths prevented the test from running at all.

A valid pass (including "pass with caveats") must say what action was performed, what assertion passed, and where the evidence was saved. A "pass with caveats" must also state which evidence or control surface was degraded and why.

The verdict report MUST also include:

Tests added/updated: list any new or modified test files with test count changes
Coverage gaps remaining: list any features still lacking test coverage
Warnings status: confirm zero compiler/linter warnings, or list any remaining

Recommended usage patterns

Launch and smoke test a packaged app

launch app
confirm window appears and app becomes frontmost
assert log startup markers or DB initialization
capture screenshot

Verify a settings change

open settings through menu or shortcut
toggle control
assert config or DB state changed
capture screenshot of settings UI

Verify state persistence across relaunch

create state in app
assert DB or file state exists
quit and relaunch
verify restored state in UI and persistent store

Verify agent or provider detection in a desktop app

open provider or agent settings UI
run detection or refresh flow
assert log lines, DB rows, or config updates
capture screenshot of visible badges or status

Verify terminal-style apps with canvas output

use AppleScript or System Events only for window, tab, menu, and input routing
do not claim terminal text correctness from canvas pixels alone
assert via logs, PTY transcripts, SQLite session state, or test hooks instead

Tooling bundled with this skill

scripts/run-applescript.sh Thin wrapper around osascript for inline or piped AppleScript.
scripts/capture-evidence.sh Creates a run artifact folder with screenshot, logs, DB backups, and metadata.
scripts/sqlite-assert.sh Runs a query and enforces a simple expected result.
scripts/log-assert.sh Checks that a log file contains an expected string or regex and counts total matches.
scripts/self-test.sh Runs a local smoke test for permissions, assertions, evidence capture, and global resolution prerequisites.

References

Open only what you need:

references/landscape.md Current open-source AppleScript, MCP, and native desktop automation tooling.
references/assertion-model.md How to turn native UI flows into trustworthy end-to-end proofs.
references/use-cases.md Detailed scenario catalog for native macOS app QA.

Guardrails

NEVER skip step 0 (code analysis and test maintenance). Even if no files changed, check for coverage gaps. The codebase evolves constantly and tests must keep up.
NEVER report "no changes, skipping" without first analyzing test coverage gaps. A file may not have changed, but its tests may be missing or incomplete.
Do not claim "end-to-end verified" if you only drove the UI but never asserted state.
Do not claim terminal or canvas text correctness from screenshots alone.
Do not silently ignore Accessibility or Screen Recording permission failures.
Prefer targeted assertions and scoped artifacts over copying full logs or databases.
When a native app has both web content and native chrome, use this skill for the native parts and $playwright or $playwright-interactive for the browser parts.
When writing tests, follow existing project conventions exactly — read existing test files first.
Always run cargo build, cargo test, npx vitest run, and npx tsc --noEmit and confirm zero warnings/errors before recording a verdict.

name	macos-native-app-qa
description	Use when the task requires end-to-end verification of a native macOS app and browser-only tooling is not enough. This skill drives apps with AppleScript, System Events, or CGEvent (via Python Quartz or Swift), verifies behavior through logs, test hooks, files, or SQLite state, captures screenshots as evidence, AND actively maintains test coverage by analyzing code changes and writing new tests. Trigger for Tauri, Electron, AppKit, or other macOS desktop QA, launch flows, menu and dialog interaction, settings changes, packaged-app smoke tests, or any request to prove a macOS app works end to end.
allowed-tools	Read, Write, Edit, Glob, Grep, Agent, Bash(osascript, screencapture, sqlite3, rg, grep, cp, pgrep, mkdir, sleep, test, shasum, cksum, sed, head, wc, sdef, find, ls, awk, date, pwd, touch, tr, rm, basename, python3, swift, npx, cargo, npm, cat, tail, stat, kill, nohup)

macos-native-app-qa

More from this repository

More from this repository

macOS Native App QA

What this skill is for

Preconditions

Live desktop contention

Permission fallback paths

Closed Verification Contract

Workflow

0. Analyze code and update tests (ALWAYS run this step)

0a. Detect what changed

0b. Analyze code for untested features

0c. Write or update tests

0d. Verify zero warnings

1. Build the verification inventory

2. Choose the control surface

3. Create an evidence directory

4. Drive the app

5. Assert against authoritative state

6. Capture evidence

7. Record the verdict

Recommended usage patterns

Launch and smoke test a packaged app

Verify a settings change

Verify state persistence across relaunch

Verify agent or provider detection in a desktop app

Verify terminal-style apps with canvas output

Tooling bundled with this skill

References

Guardrails

macOS Native App QA

What this skill is for

Preconditions

Live desktop contention

Permission fallback paths

Closed Verification Contract

Workflow

0. Analyze code and update tests (ALWAYS run this step)

0a. Detect what changed

0b. Analyze code for untested features

0c. Write or update tests

0d. Verify zero warnings

1. Build the verification inventory

2. Choose the control surface

3. Create an evidence directory

4. Drive the app

5. Assert against authoritative state

6. Capture evidence

7. Record the verdict

Recommended usage patterns

Launch and smoke test a packaged app

Verify a settings change

Verify state persistence across relaunch

Verify agent or provider detection in a desktop app

Verify terminal-style apps with canvas output

Tooling bundled with this skill

References

Guardrails