| name | agent-computer-use |
| description | REQUIRED for any task that involves operating a desktop application — opening apps, clicking buttons, typing into fields, pressing keys, scrolling, dragging, reading what's on screen, moving or resizing windows, or verifying state after an action. Always use the `agent-cu` CLI commands (open, snapshot, click, type, key, find, scroll, drag, batch, wait-for) instead of falling back to `open -a`, AppleScript, `osascript`, `xdotool`, `System Events`, or any other shell workaround — those can't read state back, don't verify, and are fragile across app updates. agent-cu is the canonical computer-use tool for controlling any macOS / Linux / Windows / Electron app via accessibility APIs. Trigger on prompts like "open Music and play X", "search for Y in Maps", "fill out this form", "compute in Calculator", "send a Slack message", "drag this file", "read what's in the current window", or anything where a human would click/type/look at a desktop app. |
| license | MIT |
| metadata | {"author":"kortix-ai","version":"0.1.2","homepage":"https://github.com/kortix-ai/agent-computer-use"} |
agent-computer-use — Computer use Skill
You have access to agent-cu, a CLI tool that controls desktop applications. You can click buttons, type text, read screens, scroll, drag files, move windows — all from the terminal.
First-run setup (user action)
The first time you run an agent-cu command, Claude Code prompts the user to approve it. The user has three options, in order of preference:
-
Easiest — run agent-cu setup in a terminal. Interactive wizard that writes Bash(agent-cu *) to the right settings file. One-time. If the user seems frustrated by repeated approval prompts, suggest this.
If you (the agent) are running it yourself and can't answer prompts, use the non-interactive flags:
agent-cu setup --yes → unsupervised + global (recommended default)
agent-cu setup --mode <unsupervised|supervised> --scope <global|project> for specific combos
-
Quickest — pick "Yes, and don't ask again for: agent-cu *" in the approval prompt. Covers only that specific subcommand pattern; you'll get prompted again for other subcommands.
-
Manual — add "Bash(agent-cu *)" to ~/.claude/settings.json under permissions.allow.
Do not attempt to modify the user's settings file yourself — direct them to agent-cu setup instead.
How to think
Think like a human sitting at the computer. Before you act, ask yourself: what would I see on screen? What would I click? What would I type?
A human:
- Looks at the screen (snapshot)
- Finds what they need (identify refs)
- Does one action (click, type, key)
- Checks what changed (re-snapshot)
You must do the same. Never skip steps. Never assume the UI didn't change after an action.
Core loop
snapshot → identify → act → verify
agent-cu snapshot -a Music -i -c
agent-cu click @e5
agent-cu snapshot -a Music -i -c
Every action changes the UI. Your previous refs are now stale. Always re-snapshot.
Opening apps
Always wait for the app to be ready before doing anything:
agent-cu open Safari --wait
agent-cu snapshot -a Safari -i -c
Never interact with an app you haven't opened and snapshotted first.
Finding elements
Step 1: Snapshot with -i -c (interactive + compact):
agent-cu snapshot -a Calculator -i -c
This shows only clickable/typeable elements with refs like @e1, @e5, @e12.
Step 2: Read the output. Find the element you need by its name, role, or id.
Step 3: Use the ref. Refs are the fastest and most reliable way to target elements.
If elements are missing, increase depth:
agent-cu snapshot -a Safari -i -c -d 8
Clicking
For buttons, links, menu items — use click:
agent-cu click @e5
agent-cu click @e5 --count 2
click tries AXPress first (background, no focus steal). Only falls back to mouse simulation for double-click or right-click.
For elements with stable IDs (won't change between snapshots):
agent-cu click 'id="play"' -a Music
agent-cu click 'id~="track-123"' -a Music
Typing
With a target element (preferred — uses AXSetValue, headless):
agent-cu type "hello world" -s @e3
Into the focused field (keyboard simulation, needs app focus):
agent-cu type "hello world" -a Safari
Always prefer -s @ref when you have a ref. It's more reliable.
Key presses
agent-cu key Return -a Calculator
agent-cu key cmd+k -a Slack
agent-cu key cmd+a -a TextEdit
agent-cu key Escape -a Safari
Scrolling
agent-cu scroll down -a Music
agent-cu scroll down --amount 10 -a Music
agent-cu scroll-to @e42
Scroll needs the app to be focused. Use scroll-to for headless.
Reading content
agent-cu text -a Calculator
agent-cu get-value @e5
agent-cu get-value 'id="title"' -a Music
Use get-value on specific elements instead of text on large apps.
Window management
agent-cu move-window -a Notes --x 100 --y 100
agent-cu resize-window -a Notes --width 800 --height 600
agent-cu windows -a Finder
These are instant and headless — use AXSetPosition/AXSetSize.
Drag and drop
Drag needs the app to be focused and two visible, non-overlapping areas.
Think like a human: you need to see both the source and destination.
agent-cu move-window -a Finder --x 0 --y 25
agent-cu resize-window -a Finder --width 720 --height 475
agent-cu snapshot -a Finder -i -c -d 8
agent-cu get-value @e32
agent-cu drag @e32 @e50 -a Finder
agent-cu drag --from-x 300 --from-y 55 --to-x 1000 --to-y 200 -a Finder
Selector syntax
Refs (always prefer these)
@e1, @e2, @e3
DSL
'role=button name="Submit"'
'name="Login"'
'id="AllClear"'
'id~="track-123"'
'name~="Clear"'
'button "Submit"'
'"Login"'
'role=button index=2'
'css=".my-button"'
Chains
'id=sidebar >> role=button index=0'
'name="Form" >> button "Submit"'
Electron apps (CDP)
Electron apps (Slack, Cursor, VS Code, Postman, Discord) are automatically detected. agent-cu relaunches them with CDP support on first use.
Everything works headless — no window activation, no mouse, no focus steal:
agent-cu snapshot -a Slack -i -c
agent-cu click @e5
agent-cu key cmd+k -a Slack
agent-cu type "hello" -a Slack
agent-cu scroll down -a Slack
agent-cu text -a Slack
Typing in Electron apps: insertText goes to the focused element. If you need to type into a specific input:
agent-cu snapshot -a Slack -i -c
agent-cu click @e18
agent-cu key cmd+a -a Slack
agent-cu key backspace -a Slack
agent-cu type "your text" -a Slack
Verification
Never assume an action worked. Verify by checking a state-bearing attribute, not just by looking at the tree again.
The id vs name distinction (critical)
Many apps give a button a fixed id (the slot) and a changing name (the current label).
Music's transport button is the canonical example:
id is always "play" — it identifies the button as "the transport button", even when currently playing.
name flips between "play" and "pause" depending on playback state.
To detect state, read name, not id:
agent-cu find 'id="play"' -a Music --compact
The same pattern appears in many apps: bookmark/unbookmark, mute/unmute, expand/collapse, follow/unfollow. When you want to confirm a toggle worked, always read the element's current name after the action.
Inline verification with --expect
agent-cu click @e5 --expect 'name="Dashboard"'
Reading values
agent-cu get-value @e3
agent-cu find 'id="play"' -a Music --compact
agent-cu snapshot -a Safari -i -c
Idempotent typing
agent-cu ensure-text @e3 "hello"
Reading dynamic computed values (e.g., Calculator result)
Some apps don't surface the result as a normal value on a labeled element — it's hidden in a staticText node. Use tree and walk for any node with a value:
agent-cu tree -a Calculator -d 8 --compact | python3 -c "
import json, sys
d = json.load(sys.stdin)
def walk(n):
if n.get('value'): print(n.get('role'), '=', repr(n['value']))
for c in n.get('children', []): walk(c)
walk(d)
"
Locale gotcha: numbers are locale-formatted. Indian locale shows 7^8 = 57,64,801, international shows 5,764,801. They're the same value. Before comparing, strip commas and spaces.
Waiting
When UI takes time to load:
agent-cu wait-for 'name="Dashboard"'
agent-cu wait-for 'role=button' --timeout 15
sleep 2
Batch operations
Chain multiple commands to avoid per-command startup:
echo '[["click","@e5"],["key","Return","-a","Music"]]' | agent-cu batch --bail
Real-world patterns
Search and play a song in Music (verified flow)
agent-cu open Music --wait
agent-cu snapshot -a Music -i -c
agent-cu click @e1 -a Music
agent-cu type "Espresso Sabrina Carpenter" -s 'role=textField' -a Music --submit
sleep 2
agent-cu snapshot -a Music -c | grep -i "espresso" | head -5
agent-cu click 'id="Music.shelfItem.TopSearchLockup[id=top-search-section-top-1744253558,parentId=top-search-section-top]"' -a Music --count 2
sleep 2
agent-cu snapshot -a Music -c | grep "track-lockup" | head -3
agent-cu click @e52 -a Music
agent-cu key Return -a Music
sleep 1
agent-cu find 'id="play"' -a Music --compact
agent-cu click 'id="play"' -a Music
Key lessons from this flow:
- Refs (
@e53) can go stale between snapshots separated by major UI changes. Prefer full id="..." for cross-snapshot targeting.
type -s 'role=textField' --submit combines typing, clearing, and pressing Return reliably.
- Double-click on a search result often opens the item, not plays it. Drill into the detail view, then trigger playback.
- Always verify playback via the
name of the transport button (id="play" is the slot; name holds state).
- If
Return doesn't trigger playback, click the transport play button as a fallback. Have a plan B.
Compute a multi-step calculation in Calculator (verified flow)
agent-cu open Calculator --wait
agent-cu snapshot -a Calculator -i -c
echo '[
["click","id=\"AllClear\"","-a","Calculator"],
["click","id=\"Seven\"","-a","Calculator"],
["click","id=\"Multiply\"","-a","Calculator"],
["click","id=\"Seven\"","-a","Calculator"],
["click","id=\"Multiply\"","-a","Calculator"],
["click","id=\"Seven\"","-a","Calculator"],
["click","id=\"Multiply\"","-a","Calculator"],
["click","id=\"Seven\"","-a","Calculator"],
["click","id=\"Multiply\"","-a","Calculator"],
["click","id=\"Seven\"","-a","Calculator"],
["click","id=\"Multiply\"","-a","Calculator"],
["click","id=\"Seven\"","-a","Calculator"],
["click","id=\"Multiply\"","-a","Calculator"],
["click","id=\"Seven\"","-a","Calculator"],
["click","id=\"Multiply\"","-a","Calculator"],
["click","id=\"Seven\"","-a","Calculator"],
["click","id=\"Equals\"","-a","Calculator"]
]' | agent-cu batch
agent-cu tree -a Calculator -d 8 --compact | python3 -c "
import json, sys
d = json.load(sys.stdin)
for n in [d]:
def walk(n):
if n.get('value'): print(n.get('role'), '=', repr(n['value']))
for c in n.get('children', []): walk(c)
walk(n)
"
Key lessons:
- When an app exposes stable
id= on every interactive element (Calculator does), skip snapshotting between clicks — just batch them.
- The result isn't on a labeled element; walk the tree for nodes with a
value.
- Numbers come locale-formatted. Strip commas/spaces before parsing.
Open a DM in Slack and send a message
agent-cu key cmd+k -a Slack
sleep 1
agent-cu snapshot -a Slack -i -c
agent-cu click @e18
agent-cu key cmd+a -a Slack
agent-cu key backspace -a Slack
agent-cu type "Vukasin" -a Slack
sleep 1
agent-cu key Return -a Slack
sleep 2
agent-cu type "hey, check this out" -a Slack
agent-cu key Return -a Slack
Check calendar events
agent-cu open Calendar --wait
agent-cu snapshot -a Calendar -i -c -d 6
agent-cu text -a Calendar
agent-cu click @e3
sleep 1
agent-cu text -a Calendar
Fill a web form
agent-cu open Safari --wait
agent-cu snapshot -a Safari -i -c
agent-cu type "https://example.com/form" -s @e34
agent-cu key Return -a Safari
sleep 3
agent-cu snapshot -a Safari -i -c -d 8
agent-cu type "John Doe" -s @e5
agent-cu type "john@example.com" -s @e6
agent-cu type "Hello world" -s @e7
agent-cu click @e8
agent-cu snapshot -a Safari -i -c
Drag a file between Finder windows
agent-cu move-window -a Finder --x 0 --y 25
agent-cu snapshot -a Finder -i -c -d 8
agent-cu click @e32
agent-cu drag --from-x 300 --from-y 55 --to-x 1000 --to-y 200 -a Finder
Browse App Store
agent-cu open "App Store" --wait
agent-cu snapshot -a "App Store" -i -c -d 10
agent-cu click 'id="AppStore.tabBar.discover"' -a "App Store"
sleep 2
agent-cu scroll down --amount 5 -a "App Store"
agent-cu snapshot -a "App Store" -i -c -d 10
agent-cu text -a "App Store"
Rules
- Always snapshot before acting. You cannot interact with what you cannot see.
- Always re-snapshot after acting. The UI changed. Your refs are stale.
- Refs for the next step, IDs for the long haul. Refs (
@e5) are fastest for the immediate next action after a snapshot. Full id="..." selectors survive across snapshots and are the right choice for anything you'll return to after other UI changes.
- Use
-i -c on snapshots. Interactive + compact reduces noise by 10x.
- Prefer
id= over name= when the app provides stable ids. IDs don't change with state; names often do.
- For state detection, read the
name attribute — not id. Many apps keep id as a fixed slot and flip name to reflect current state (play/pause, mute/unmute, bookmark/unbookmark).
- Wait after navigation.
wait-for a known landmark first; fall back to sleep 2-3 if there's none.
- One action, then verify. Don't chain multiple mutations without checking state between them.
- Use
type -s @ref over type -a App. The selector path uses AXSetValue (reliable). The app path uses keyboard simulation (fragile, needs focus).
- Use
scroll-to @ref when you know the element. It's headless. scroll down needs focus.
- Never hardcode app-specific logic. Don't write a helper like
play_song_in_music(). Use the same primitives (snapshot, click, type, key, find) for every app. The scene tree tells you what to do — read it, act on it, verify.
- Have a fallback for every action. If a
click doesn't produce the expected state change, escalate: double-click → scroll-to + click → keyboard shortcut → direct action button (e.g., transport id="play" instead of track-row Return). See the Recovery playbook.
- Batch when ids are stable and numerous. Calculator, forms, keypad flows benefit from piping a JSON array into
agent-cu batch — one process start instead of N.
Recovery playbook
When something doesn't work, the principle is: diagnose the exact failure mode, then apply the matching fallback. Never retry the same command blindly.
Element not found (stale ref)
Symptom: error: element not found matching SelectorChain { ... } on a @ref that came from an earlier snapshot.
Cause: refs resolve to a specific tree path; major UI changes (navigation, scroll, modal open) invalidate them.
Fix: re-snapshot and use the new ref, or switch to a stable id=:
agent-cu snapshot -a Safari -i -c
agent-cu click 'id="the-stable-id"' -a Safari
Prefer id="..." for any target you'll act on across multiple snapshots. Refs are for single-step actions only.
Ambiguous selector
Symptom: error: ambiguous selector: found N elements matching.
Fix: add index= or narrow by role/name:
agent-cu click 'role=button index=0' -a Music
agent-cu click 'role=button name="Save"' -a TextEdit
Click succeeded but nothing happened
Symptom: {"success": true, "message": "clicked at (X,Y)"} but the expected state change (new page, playback, menu) did not occur.
Diagnose in order:
- Was AXPress used or did it fall back to coordinates?
pressed "Name" at (X,Y) = AXPress succeeded (best).
clicked at (X,Y) = coordinate click; the element might have moved or be obscured.
- Is the element's state actually what you think? Read the
name attribute of the state-bearing element (e.g., Music's transport button), not just the tree.
- Does the action require focus? Some apps (especially web UI) need the app to be frontmost.
agent-cu open App --wait activates it.
Fallback ladder:
agent-cu click @e5 --count 2
agent-cu scroll-to @e5
agent-cu click @e5
agent-cu snapshot -a Music -c | grep -E "play|pause|Next"
agent-cu click 'id="play"' -a Music
agent-cu key Return -a Music
agent-cu key space -a Music
Type didn't land
Symptom: text didn't appear in the field.
Causes + fixes:
agent-cu type "hello" -a Safari
agent-cu type "hello" -s @e3
agent-cu click @e3
agent-cu type "hello" -s @e3
agent-cu type "new" -s @e3 --append
agent-cu type "new" -s @e3
Search submitted but results aren't in the tree yet
Symptom: snapshot right after --submit shows the old page.
Fix: wait, then re-snapshot. Prefer polling over sleep:
agent-cu type "query" -s 'role=textField' -a Music --submit
agent-cu wait-for 'name~="Top Results"' --timeout 5
agent-cu snapshot -a Music -i -c
If wait-for doesn't know a stable landmark, fall back to sleep 2.
Snapshot is too shallow — can't find the element
Bump depth:
agent-cu snapshot -a Safari -i -c -d 8
agent-cu snapshot -a Safari -i -c -d 12
Ref points to the wrong thing
Symptom: @e5 clicks something but it's not what you expected.
Cause: the snapshot re-ran silently (you took two) and refs got renumbered.
Fix: always inspect the ref's resolved element before acting on anything important:
agent-cu get-value @e5 -a Safari
Electron app not using CDP
agent-cu auto-detects Electron apps. If CDP isn't working:
agent-cu snapshot -a Slack -i -c -v
First run takes ~5s as the app relaunches with --remote-debugging-port. Every subsequent run uses a cached connection (~15ms).
DSL selector parse error
Symptom: unknown filter key: 'id_contains'.
Fix: the syntax is id~= (tilde-equals) for partial match, not id_contains=:
agent-cu click 'id~="track-"' -a Music
agent-cu click 'name~="Submit"' -a Safari
Output format
All output is JSON:
{"success": true, "message": "pressed \"7\" at (453, 354)"}
{"error": true, "type": "element_not_found", "message": "..."}
{"role": "button", "name": "Submit", "value": null, "position": {"x": 450, "y": 320}}