ワンクリックで
macos-computer-use
You have a `computer_use` tool that drives the Mac in the **background**.
Codex または Claude でインストール この Prompt をコピーして Codex、Claude、または他のアシスタントに貼り付けると、Skill ページを確認してインストールできます。
メニュー
You have a `computer_use` tool that drives the Mac in the **background**.
Codex または Claude でインストール この Prompt をコピーして Codex、Claude、または他のアシスタントに貼り付けると、Skill ページを確認してインストールできます。
引用三验 — 参考文献是否存在(L1) + 引用是否得当(L2) + 引用是否全面(L3)。三位一体验证管线,从DOI验真到语义审查到遗漏检测。
**触发条件**: 对一批论文(10-34 篇)批量处理 `step_quality_check.md` 中的 quality_score 并写入 `state.json`。
子skill | NotebookLM CLI全功能指南 — Q&A知识提取、内容生成(报告/视频/音频/信息图/幻灯片)、文献检索。响应paper-pipeline的P1阶段调用。
生产力工具 — Airtable、Google Workspace、Linear、Notion、Jupyter等。
Complete paper pipeline: retrieval, extraction, quality review, analysis, and publication.
双循环进化:内部反思(P0) + 外部吸收(P1)。Cross-project absorption methodology — multi-round cross-project comparison, active project tracking, self-expanding keyword discovery. 动灵驱动吸收(Entelechy-Driven Absorption v4.3).
| name | macos-computer-use |
| description | You have a `computer_use` tool that drives the Mac in the **background**. |
| version | 1.0.0 |
| license | MIT |
| author | Synthos |
| metadata | {"synthos":{"signature":"task_desc: str, params: dict -> result: dict","atom_type":"skill","priority":"P2","related_skills":[]}} |
task: str, app: str — 用户请求描述、上下文信息result: dict — macOS操作结果对应原则:P2(机械原子暴露输入输出规范)
You have a computer_use tool that drives the Mac in the background.
Your actions do NOT move the user's cursor, steal keyboard focus, or switch
Spaces. The user can keep typing in their editor while you click around in
Safari in another Space. This is the opposite of pyautogui-style automation.
Everything here works with any tool-capable model — Claude, GPT, Gemini, or an open model running through a local OpenAI-compatible endpoint. There is no Anthropic-native schema to learn.
Step 1 — Capture first. Almost every task starts with:
computer_use(action="capture", mode="som", app="Safari")
Returns a screenshot with numbered overlays on every interactable element AND an AX-tree index like:
#1 AXButton 'Back' @ (12, 80, 28, 28) [Safari]
#2 AXTextField 'Address and Search' @ (80, 80, 900, 32) [Safari]
#7 AXLink 'Sign In' @ (900, 420, 80, 24) [Safari]
...
Step 2 — Click by element index. This is the single most important habit:
computer_use(action="click", element=7)
Much more reliable than pixel coordinates for every model. Claude was trained on both; other models are often only reliable with indices.
Step 3 — Verify. After any state-changing action, re-capture. You can save a round-trip by asking for the post-action capture inline:
computer_use(action="click", element=7, capture_after=True)
mode | Returns | Best for |
|---|---|---|
som (default) | Screenshot + numbered overlays + AX index | Vision models; preferred default |
vision | Plain screenshot | When SOM overlay interferes with what you want to verify |
ax | AX tree only, no image | Text-only models, or when you don't need to see pixels |
capture mode=som|vision|ax app=… (default: current app)
click element=N OR coordinate=[x, y]
double_click element=N OR coordinate=[x, y]
right_click element=N OR coordinate=[x, y]
middle_click element=N OR coordinate=[x, y]
drag from_element=N, to_element=M (or from/to_coordinate)
scroll direction=up|down|left|right amount=3 (ticks)
type text="…"
key keys="cmd+s" | "return" | "escape" | "ctrl+alt+t"
wait seconds=0.5
list_apps
focus_app app="Safari" raise_window=false (default: don't raise)
All actions accept optional capture_after=True to get a follow-up
screenshot in the same tool call.
All actions that target an element accept modifiers=["cmd","shift"] for
held keys.
raise_window=True unless the user explicitly asked you to
bring a window to front. Input routing works without raising.app="Safari") — less noisy, fewer
elements, doesn't leak other windows the user has open.type sends whatever string you give it, respecting the current layout.
Unicode works.key with +-joined names:
cmd+s savecmd+t new tabcmd+w close tabreturn / escape / tab / spacecmd+shift+g go to path (Finder)up, down, left, right, optionally with modifiers.Prefer element indices:
computer_use(action="drag", from_element=3, to_element=17)
For a rubber-band selection on empty canvas, use coordinates:
computer_use(action="drag",
from_coordinate=[100, 200],
to_coordinate=[400, 500])
Scroll the viewport under an element (most common):
computer_use(action="scroll", direction="down", amount=5, element=12)
Or at a specific point:
computer_use(action="scroll", direction="down", amount=3, coordinate=[500, 400])
list_apps returns running apps with bundle IDs, PIDs, and window counts.
focus_app routes input to an app without raising it. You rarely need to
focus explicitly — passing app=... to capture / click / type will
target that app's frontmost window automatically.
When the user is on a messaging platform (Telegram, Discord, etc.) and you
took a screenshot they should see, save it somewhere durable and use
MEDIA:/absolute/path.png in your reply. cua-driver's screenshots are
PNG bytes; write them out with write_file or the terminal (base64 -d).
On CLI, you can just describe what you see — the screenshot data stays in your conversation context.
type. You'll see an
error if the guard fires.hermes tools and enable Computer
Use; the setup will install cua-driver via its upstream script. Requires
macOS + Accessibility + Screen Recording permissions.capture call.
If the UI shifted (new tab opened, dialog appeared), re-capture before
clicking.escape or click the close button) before retrying.type a shell command
that matches the dangerous-pattern block list (curl ... | bash,
sudo rm -rf, etc.). Break the command up or reconsider.computer_usebrowser_* tools — those use a real
headless Chromium and are more reliable than driving the user's GUI
browser. Reach for computer_use specifically when the task needs the
user's actual Mac apps (native Mail, Messages, Finder, Figma, Logic,
games, anything non-web).read_file / write_file / patch, not type into
an editor window.terminal, not type into Terminal.app.