Run any Skill in Manus with one click

$pwd:

computer-use

Name: Computer Use
Author: kortix-ai

// Control desktop applications on the user's machine using agent-click. Use when you need to click buttons, type text, read screens, scroll, drag files, move/resize windows, open/quit apps, interact with UI elements, or automate desktop workflows. Triggers: 'click on', 'open app', 'type into', 'scroll down', 'drag file', 'take screenshot', 'read the screen', 'interact with UI', 'desktop automation', 'computer use', 'agent-click'. Built on agent-click (https://github.com/kortix-ai/agent-click, https://www.agent-click.dev/) — an open-source computer use CLI by Kortix. Right now only works on macOS.

Run Skill in Manus

$ git log --oneline --stat

stars:19,777

forks:3,410

updated:April 3, 2026 at 18:25

SKILL.md

readonly

related-skills.json

same repository

replicate.md

from "kortix-ai/suna"

Discover, compare, and run AI models using Replicate's API. Use this skill whenever the task involves AI-generated media — images (text-to-image, style transfer, editing, upscaling, background removal), video, audio, or any other ML model output. Requires a REPLICATE_API_TOKEN — ask the user for it if not already set.

2026-04-0519.8k

website-building.md

from "kortix-ai/suna"

Use for distinctive production-grade websites, landing pages, and interactive web experiences with strong design and QA discipline.

2026-04-0419.8k

whisper.md

from "kortix-ai/suna"

Transcribe any audio or video file to text using Whisper (Groq or OpenAI). Use when the agent receives voice messages, audio files, video messages, or any media with speech. Triggers on: 'transcribe', 'what does this say', 'voice message', 'speech to text', 'audio', any file path ending in .ogg .mp3 .mp4 .wav .webm .m4a .flac .oga .oga

2026-04-0419.8k

agent-browser.md

from "kortix-ai/suna"

Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.

2026-04-0319.8k

presentations.md

from "kortix-ai/suna"

Create, manage, validate, preview, and export HTML presentation slides (1920x1080). Load this skill when you need to build a slide deck, export to PDF/PPTX, or preview slides in a browser.

2026-04-0219.8k

agent-tunnel.md

from "kortix-ai/suna"

Interact with the user's local machine via Agent Tunnel. Use when you need to read/write files on their computer, run shell commands locally, take screenshots, click, type, control the mouse/keyboard, manage windows and apps, read/write the clipboard, or use the accessibility tree to interact with UI elements. Triggers: 'access my machine', 'local file', 'on my computer', 'my desktop', 'take a screenshot of my screen', 'run this on my machine', 'click on', 'open an app', 'accessibility tree'.

2026-04-0219.8k

package.json

"author": "kortix-ai"

"repository": "kortix-ai/suna"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name

computer-use

description

Control desktop applications on the user's machine using agent-click. Use when you need to click buttons, type text, read screens, scroll, drag files, move/resize windows, open/quit apps, interact with UI elements, or automate desktop workflows. Triggers: 'click on', 'open app', 'type into', 'scroll down', 'drag file', 'take screenshot', 'read the screen', 'interact with UI', 'desktop automation', 'computer use', 'agent-click'. Built on agent-click (https://github.com/kortix-ai/agent-click, https://www.agent-click.dev/) — an open-source computer use CLI by Kortix. Right now only works on macOS.

Computer Use — agent-click

Control desktop applications on the user's machine via agent-click, an open-source CLI tool by Kortix for macOS desktop automation. All commands run through Agent Tunnel on the user's local machine.

GitHub: https://github.com/kortix-ai/agent-click
Docs: https://www.agent-click.dev/

Setup

Before using agent-click, verify it's installed on the user's machine. Run these via Agent Tunnel:

TUNNEL=$OPENCODE_CONFIG_DIR/skills/KORTIX-system/agent-tunnel/tunnel.ts

# 1. Check tunnel connection
bun run "$TUNNEL" status

# 2. Check if agent-click is installed
bun run "$TUNNEL" shell '{"command":"agent-click","args":["--version"]}'

# 3. If not installed, install it
bun run "$TUNNEL" shell '{"command":"npm","args":["install","-g","agent-click"],"timeout":60000}'

# 4. Verify installation
bun run "$TUNNEL" shell '{"command":"agent-click","args":["--version"]}'

Always check and install before first use in a session. Do not skip this step.

Running agent-click Commands

All agent-click commands are executed via the tunnel's shell command:

bun run "$TUNNEL" shell '{"command":"agent-click","args":["<subcommand>","<arg1>","<arg2>"]}'

Example — take a snapshot of Safari:

bun run "$TUNNEL" shell '{"command":"agent-click","args":["snapshot","-a","Safari","-i","-c"]}'

Core Loop

Think like a human sitting at the computer:

snapshot → identify → act → verify

Look at the screen (snapshot)
Find what you need (identify refs from output)
Do one action (click, type, key)
Check what changed (re-snapshot)

Every action changes the UI. Previous refs become stale. Always re-snapshot after acting.

Commands Reference

Opening Apps

bun run "$TUNNEL" shell '{"command":"agent-click","args":["open","Safari","--wait"]}'
bun run "$TUNNEL" shell '{"command":"agent-click","args":["snapshot","-a","Safari","-i","-c"]}'

Always wait for the app to be ready, then snapshot before interacting.

Snapshots — See the Screen

# Standard snapshot (interactive + compact)
bun run "$TUNNEL" shell '{"command":"agent-click","args":["snapshot","-a","Music","-i","-c"]}'

# Deeper snapshot if elements are missing
bun run "$TUNNEL" shell '{"command":"agent-click","args":["snapshot","-a","Safari","-i","-c","-d","8"]}'

Always use -i -c flags. Interactive + compact reduces noise by 10x.

Clicking

# Single click by ref
bun run "$TUNNEL" shell '{"command":"agent-click","args":["click","@e5"]}'

# Double-click (opens files, plays songs)
bun run "$TUNNEL" shell '{"command":"agent-click","args":["click","@e5","--count","2"]}'

# Click by stable ID
bun run "$TUNNEL" shell '{"command":"agent-click","args":["click","id=\"play\"","-a","Music"]}'

Typing

# With target element (preferred — uses AXSetValue, headless)
bun run "$TUNNEL" shell '{"command":"agent-click","args":["type","hello world","-s","@e3"]}'

# Into focused field (keyboard simulation, needs app focus)
bun run "$TUNNEL" shell '{"command":"agent-click","args":["type","hello world","-a","Safari"]}'

Always prefer -s @ref when you have a ref. It's more reliable.

Key Presses

bun run "$TUNNEL" shell '{"command":"agent-click","args":["key","Return","-a","Calculator"]}'
bun run "$TUNNEL" shell '{"command":"agent-click","args":["key","cmd+k","-a","Slack"]}'
bun run "$TUNNEL" shell '{"command":"agent-click","args":["key","cmd+a","-a","TextEdit"]}'

Scrolling

bun run "$TUNNEL" shell '{"command":"agent-click","args":["scroll","down","-a","Music"]}'
bun run "$TUNNEL" shell '{"command":"agent-click","args":["scroll","down","--amount","10","-a","Music"]}'
bun run "$TUNNEL" shell '{"command":"agent-click","args":["scroll-to","@e42"]}'

Use scroll-to for headless operation (no focus needed).

Reading Content

bun run "$TUNNEL" shell '{"command":"agent-click","args":["text","-a","Calculator"]}'
bun run "$TUNNEL" shell '{"command":"agent-click","args":["get-value","@e5"]}'

Use get-value on specific elements instead of text on large apps.

Window Management

bun run "$TUNNEL" shell '{"command":"agent-click","args":["move-window","-a","Notes","--x","100","--y","100"]}'
bun run "$TUNNEL" shell '{"command":"agent-click","args":["resize-window","-a","Notes","--width","800","--height","600"]}'
bun run "$TUNNEL" shell '{"command":"agent-click","args":["windows","-a","Finder"]}'

Drag and Drop

Drag needs the app focused and both source and destination visible.

# Set up windows side by side first
bun run "$TUNNEL" shell '{"command":"agent-click","args":["move-window","-a","Finder","--x","0","--y","25"]}'
bun run "$TUNNEL" shell '{"command":"agent-click","args":["resize-window","-a","Finder","--width","720","--height","475"]}'

# Snapshot to find elements
bun run "$TUNNEL" shell '{"command":"agent-click","args":["snapshot","-a","Finder","-i","-c","-d","8"]}'

# Drag by refs
bun run "$TUNNEL" shell '{"command":"agent-click","args":["drag","@e32","@e50","-a","Finder"]}'

# Or by coordinates
bun run "$TUNNEL" shell '{"command":"agent-click","args":["drag","--from-x","300","--from-y","55","--to-x","1000","--to-y","200","-a","Finder"]}'

Waiting

bun run "$TUNNEL" shell '{"command":"agent-click","args":["wait-for","name=\"Dashboard\""]}'
bun run "$TUNNEL" shell '{"command":"agent-click","args":["wait-for","role=button","--timeout","15"]}'

Verification

# After clicking — re-snapshot
bun run "$TUNNEL" shell '{"command":"agent-click","args":["snapshot","-a","Safari","-i","-c"]}'

# After typing — check value
bun run "$TUNNEL" shell '{"command":"agent-click","args":["get-value","@e3"]}'

# Click with inline verification
bun run "$TUNNEL" shell '{"command":"agent-click","args":["click","@e5","--expect","name=\"Done\""]}'

# Idempotent typing
bun run "$TUNNEL" shell '{"command":"agent-click","args":["ensure-text","@e3","hello"]}'

Selector Syntax

Refs (always prefer these)

@e1, @e2, @e3    — from the most recent snapshot

DSL Selectors

role=button name="Submit"     — role + exact name
id="AllClear"                 — exact id (most stable)
id~="track-123"               — id contains (case-insensitive)
name~="Clear"                 — name contains (case-insensitive)
button "Submit"               — shorthand: role name
"Login"                       — shorthand: just name
role=button index=2            — 3rd match (0-based)

Chains

id=sidebar >> role=button index=0    — first button inside sidebar
name="Form" >> button "Submit"       — submit button inside form

Electron Apps (CDP)

Electron apps (Slack, Cursor, VS Code, Discord) are auto-detected. Everything works headless:

bun run "$TUNNEL" shell '{"command":"agent-click","args":["snapshot","-a","Slack","-i","-c"]}'
bun run "$TUNNEL" shell '{"command":"agent-click","args":["click","@e5"]}'
bun run "$TUNNEL" shell '{"command":"agent-click","args":["type","hello","-a","Slack"]}'

Typing in Electron goes to the focused element. To type into a specific input:

# Click to focus the input first
bun run "$TUNNEL" shell '{"command":"agent-click","args":["click","@e18"]}'
bun run "$TUNNEL" shell '{"command":"agent-click","args":["key","cmd+a","-a","Slack"]}'
bun run "$TUNNEL" shell '{"command":"agent-click","args":["key","backspace","-a","Slack"]}'
bun run "$TUNNEL" shell '{"command":"agent-click","args":["type","your text","-a","Slack"]}'

Real-World Patterns

Search and Play a Song

bun run "$TUNNEL" shell '{"command":"agent-click","args":["open","Music","--wait"]}'
bun run "$TUNNEL" shell '{"command":"agent-click","args":["snapshot","-a","Music","-i","-c"]}'
# find search — click it
bun run "$TUNNEL" shell '{"command":"agent-click","args":["click","@e1"]}'
bun run "$TUNNEL" shell '{"command":"agent-click","args":["snapshot","-a","Music","-i","-c"]}'
# type search query
bun run "$TUNNEL" shell '{"command":"agent-click","args":["type","Kiss of Life","-s","@e31"]}'
bun run "$TUNNEL" shell '{"command":"agent-click","args":["key","Return","-a","Music"]}'
sleep 3
bun run "$TUNNEL" shell '{"command":"agent-click","args":["snapshot","-a","Music","-i","-c","-d","8"]}'
# double-click track to play
bun run "$TUNNEL" shell '{"command":"agent-click","args":["click","id~=\"604771089\"","-a","Music","--count","2"]}'
sleep 2
bun run "$TUNNEL" shell '{"command":"agent-click","args":["get-value","id=\"title\"","-a","Music"]}'

Send a Slack DM

bun run "$TUNNEL" shell '{"command":"agent-click","args":["key","cmd+k","-a","Slack"]}'
sleep 1
bun run "$TUNNEL" shell '{"command":"agent-click","args":["snapshot","-a","Slack","-i","-c"]}'
bun run "$TUNNEL" shell '{"command":"agent-click","args":["click","@e18"]}'
bun run "$TUNNEL" shell '{"command":"agent-click","args":["key","cmd+a","-a","Slack"]}'
bun run "$TUNNEL" shell '{"command":"agent-click","args":["key","backspace","-a","Slack"]}'
bun run "$TUNNEL" shell '{"command":"agent-click","args":["type","Vukasin","-a","Slack"]}'
sleep 1
bun run "$TUNNEL" shell '{"command":"agent-click","args":["key","Return","-a","Slack"]}'
sleep 2
bun run "$TUNNEL" shell '{"command":"agent-click","args":["type","hey, check this out","-a","Slack"]}'
bun run "$TUNNEL" shell '{"command":"agent-click","args":["key","Return","-a","Slack"]}'

Fill a Web Form

bun run "$TUNNEL" shell '{"command":"agent-click","args":["open","Safari","--wait"]}'
bun run "$TUNNEL" shell '{"command":"agent-click","args":["snapshot","-a","Safari","-i","-c"]}'
bun run "$TUNNEL" shell '{"command":"agent-click","args":["type","https://example.com/form","-s","@e34"]}'
bun run "$TUNNEL" shell '{"command":"agent-click","args":["key","Return","-a","Safari"]}'
sleep 3
bun run "$TUNNEL" shell '{"command":"agent-click","args":["snapshot","-a","Safari","-i","-c","-d","8"]}'
bun run "$TUNNEL" shell '{"command":"agent-click","args":["type","John Doe","-s","@e5"]}'
bun run "$TUNNEL" shell '{"command":"agent-click","args":["type","john@example.com","-s","@e6"]}'
bun run "$TUNNEL" shell '{"command":"agent-click","args":["click","@e8"]}'
bun run "$TUNNEL" shell '{"command":"agent-click","args":["snapshot","-a","Safari","-i","-c"]}'

Rules

Always snapshot before acting. You cannot interact with what you cannot see.
Always re-snapshot after acting. The UI changed. Your refs are stale.
Use refs, not selectors. Refs are fast and unambiguous.
Use -i -c on snapshots. Interactive + compact reduces noise by 10x.
Use id= selectors when available. IDs are the most stable across UI changes.
Wait after navigation. sleep 2-3 after opening pages, switching tabs, submitting forms.
Verify after typing. Use get-value to confirm the text was set correctly.
One action at a time. Don't chain multiple actions without checking state between them.
Use type -s @ref over type -a App. Selector uses AXSetValue (reliable). App path uses keyboard simulation (fragile).
Use scroll-to @ref when you know the element. It's headless. scroll down needs focus.

Troubleshooting

Problem	Fix
Element not found	Re-snapshot. Your refs are stale.
Ambiguous selector	Use refs, or add `index=` to the selector.
Click didn't work	Try `--count 2` (double-click). Or `scroll-to` first if offscreen.
Type didn't work	Use `-s @ref` instead of `-a App`.
Electron app not using CDP	First run auto-relaunches with CDP. Takes ~5s.
agent-click not found	Install: `npm install -g agent-click` via tunnel shell.

Output Format

All agent-click output is JSON:

{"success": true, "message": "pressed \"7\" at (453, 354)"}
{"error": true, "type": "element_not_found", "message": "..."}
{"role": "button", "name": "Submit", "value": null, "position": {"x": 450, "y": 320}}

computer-use

More from this repository

Computer Use — agent-click

Setup

Running agent-click Commands

Core Loop

Commands Reference

Opening Apps

Snapshots — See the Screen

Clicking

Typing

Key Presses

Scrolling

Reading Content

Window Management

Drag and Drop

Waiting

Verification

Selector Syntax

Refs (always prefer these)

DSL Selectors

Chains

Electron Apps (CDP)

Real-World Patterns

Search and Play a Song

Send a Slack DM

Fill a Web Form

Rules

Troubleshooting

Output Format

Computer Use — agent-click

Setup

Running agent-click Commands

Core Loop

Commands Reference

Opening Apps

Snapshots — See the Screen

Clicking

Typing

Key Presses

Scrolling

Reading Content

Window Management

Drag and Drop

Waiting

Verification

Selector Syntax

Refs (always prefer these)

DSL Selectors

Chains

Electron Apps (CDP)

Real-World Patterns

Search and Play a Song

Send a Slack DM

Fill a Web Form

Rules

Troubleshooting

Output Format

More from this repository