| name | gui-automation |
| description | Use when you need to visually interact with a GUI — test buttons, fill forms, verify visual layouts, fuzz web pages, automate user flows, take screenshots, or perform end-to-end QA on any application. Works on cloud VMs, Docker containers, local machines, and sandboxes. Install: pip install cua. |
GUI Automation
CUA gives you eyes and hands on a real computer: see the screen, move the
mouse, click, type, drag, and manage windows — like a human at the keyboard.
Use this skill for visual interaction that can't be done via shell or API.
Setup
cua --version
cua do switch cloud my-vm
cua do switch docker my-container
cua do-host-consent && cua do switch host
ANTHROPIC_API_KEY is optional. With it, cua do snapshot returns an
AI-annotated screen with element coordinates. Without it, use screenshot
and read the image yourself.
Workflow
Look → Act → Verify — repeat until done, then share:
cua do screenshot
cua do click 450 280
cua do screenshot
cua trajectory share
Re-screenshot after every UI change — coordinates go stale when the screen changes.
Scenarios
Click a button
cua do screenshot
cua do click 450 280
cua do screenshot
Fill a form
cua do screenshot
cua do click 400 200 && cua do type "Jane Doe"
cua do key tab && cua do type "jane@example.com"
cua do key tab && cua do type "SecureP@ss123"
cua do click 400 500
cua do screenshot
File upload dialog
cua do click 350 400
cua do type "/home/user/report.pdf"
cua do key enter
cua do screenshot
Zoom in for precision clicks (host or small targets)
When clicking small or dense UI elements — especially on the host machine —
zoom into the target window first. Coordinates become window-relative and
screenshots show only that window, giving you higher effective resolution.
cua do zoom "Google Chrome"
cua do screenshot
cua do click 112 44
cua do screenshot
cua do unzoom
cua do screenshot
Use zoom any time click accuracy is uncertain. unzoom before switching
windows or when you need to see the full desktop again.
Drag and drop
cua do window ls
cua do drag 150 300 650 400
cua do screenshot
Fuzz a form
cua do screenshot
cua do click 400 200
cua do type "<script>alert(1)</script>"
cua do key tab && cua do type "'; DROP TABLE users; --"
cua do key tab && cua do type "AAAAAAAAAAAAAAAAAAAAAAA"
cua do click 400 500
cua do screenshot
Trajectory
Every action is auto-recorded to ~/.cua/trajectories/{machine}/{session}/.
cua trajectory share
cua trajectory ls
cua trajectory export
cua do --no-record click 100 200
Tell the user: "Here is the trajectory of my session: {url}"
Quick Reference
| Action | Command |
|---|
| Connect to target | cua do switch <provider> [name] |
| Screenshot | cua do screenshot |
| AI-annotated screen | cua do snapshot ["instructions"] |
| Click | cua do click <x> <y> [left|right|middle] |
| Double-click | cua do dclick <x> <y> |
| Type text | cua do type "text" |
| Press key | cua do key <key> |
| Hotkey | cua do hotkey <combo> (e.g. ctrl+c) |
| Scroll | cua do scroll <direction> [amount] |
| Drag | cua do drag <x1> <y1> <x2> <y2> |
| Move cursor | cua do move <x> <y> |
| Shell command | cua do shell "command" |
| Open URL/file | cua do open <url|path> |
| List windows | cua do window ls [app] |
| Focus window | cua do window focus <id> |
| Zoom to window | cua do zoom "App Name" |
| Unzoom | cua do unzoom |
| Share trajectory | cua trajectory share |
Providers
| Provider | Example |
|---|
cloud | cua do switch cloud my-vm |
cloudv2 | cua do switch cloudv2 my-vm |
docker | cua do switch docker my-container |
lume | cua do switch lume my-vm |
lumier | cua do switch lumier my-vm |
winsandbox | cua do switch winsandbox |
host | cua do switch host |
See references/command-reference.md for full argument syntax.