一键在 Manus 中运行任何 Skill

desktop-automation

Drive the NixOS + Hyprland desktop programmatically — window focus, keyboard input, screenshots, clipboard, VNC, and Chromium via CDP. Use when a task requires headful UI interaction, visual verification, or browser automation.

在 Manus 中运行

星标1

分支1

更新时间2026年5月29日 20:11

来源

joeledwardson

joeledwardson/dev-setup

打开 GitHub 仓库查看创作者相关仓库

安装命令

下载

在 Manus 中运行

适用职业SOC

软件开发工程师计算机与数学类职业15-1252L4

SKILL.md

readonly

name	desktop-automation
description	Drive the NixOS + Hyprland desktop programmatically — window focus, keyboard input, screenshots, clipboard, VNC, and Chromium via CDP. Use when a task requires headful UI interaction, visual verification, or browser automation.

desktop-automation

What: Tools and patterns for driving the NixOS + Hyprland desktop without human input. Why: Some tasks genuinely need a real UI — testing click flows, verifying visual output, pasting into a running TUI. This is a yolo box; treat the desktop like any other tool.

Session env — set this first if SSH'd in

WAYLAND_DISPLAY and HYPRLAND_INSTANCE_SIGNATURE are unset in SSH sessions. Everything below fails silently without them.

export WAYLAND_DISPLAY="$(ls /run/user/$(id -u)/ | grep -E '^wayland-[0-9]+$' | head -1)"
export HYPRLAND_INSTANCE_SIGNATURE="$(ls /run/user/$(id -u)/hypr/ | head -1)"

The loop — always observe → act → verify

Fire-and-forget doesn't work. Every desktop action follows this cycle:

Observe — hyprctl activewindow -j or grim /tmp/out.png + Read the PNG
Act — send keys, click, navigate
Verify — screenshot again, diff against expectation

If step 3 doesn't show the expected change, re-observe before retrying. Don't blindly retry — the window focus, a modal, or the element position may have changed.

Toolchain

Tool	Use for	Notes
`hyprctl dispatch sendshortcut KEY,class:^(app)$`	Key chord to a specific window — bypasses focus	Preferred over wtype when you know the target class
`wtype`	Free-form text or chords to focused window	`wtype -M ctrl -p v -m ctrl` = Ctrl+V
`ydotool`	Mouse clicks and moves	Needs `ydotoold` running + user in `ydotool` group
`hyprctl clients -j`	List all windows with class, title, geometry	Always locate before acting
`grim -g "X,Y WxH" /tmp/out.png`	Screenshot a region	Pull geometry from `hyprctl clients -j`
`hyprctl dispatch exec "[float; size W H; center 1] cmd"`	Launch floating window centred	Good for browser UI tests

If ydotool fails with permission denied: check id for ydotool group membership. For keys-only, fall back to wtype or sendshortcut instead.

Screenshots — avoid the 2000px context limit

Multiple full-screen screenshots accumulate in context and Claude rejects requests once cumulative dimensions exceed ~2000px.

Default to region grabs: grim -g "X,Y WxH" — pull the rect from hyprctl clients -j
One screenshot per step — if you confirmed the state, don't screenshot again "just to be sure"
For browser UI: use CDP Page.captureScreenshot with a clip region instead of grim

Clipboard

wl-copy --type image/png < file.png   # load image with correct MIME type
wl-paste -l                           # verify what's on clipboard
wl-copy --clear                       # teardown

WAYLAND_DISPLAY must be set (see above).

VNC (wayvnc) — viewing the desktop remotely

wayvnc does not autostart. Check first:

ss -tlnp | grep 5900

If nothing listening, start in a tmux pane:

export WAYLAND_DISPLAY=wayland-1 && export XDG_RUNTIME_DIR=/run/user/$(id -u)
wayvnc -L info 0.0.0.0 5900

Auth gotchas — config at ~/.config/wayvnc/config:

Want no auth: set enable_auth=false
Remmina + wayvnc 0.9+ = "Unknown authentication" error — use TigerVNC (vncviewer host::5900) or switch Remmina to GTK-VNC backend

Chromium via CDP (preferred for web UI)

CDP (Chrome DevTools Protocol) beats wtype + screenshots for browser testing — no focus handling, no pixel coordinates, direct DOM access.

Launch:

brave --remote-debugging-port=9998 --user-data-dir=/tmp/brave-test &
# verify: curl -s http://localhost:9998/json/version

Use a fresh --user-data-dir if a normal instance is already running — otherwise the flag is silently ignored.

Drive (Node 22+ stdlib WebSocket):

const targets = await fetch('http://localhost:9998/json/list').then(r => r.json());
const ws = new WebSocket(targets.find(t => t.type === 'page').webSocketDebuggerUrl);
// send({method, params}) helper — see unattended skill for full pattern
await send('Page.navigate', { url: 'http://localhost:9585' });
const { result } = await send('Runtime.evaluate', { expression: 'document.title' });

Methods you'll actually use:

Page.navigate — go to URL
Runtime.evaluate — run JS, get result back (covers 90% of clicking/scraping)
Page.captureScreenshot with clip — region screenshot without grim

Don't connect to the user's logged-in browser profile.

When not to use this

Don't automate the user's accounts, messaging apps, or anything touching their identity
If 3 observe→act→verify cycles fail on the same step — stop, log, the UI model is wrong
Don't steal focus while the user is actively using the machine

同仓库更多 Skills

同仓库

unattended

joeledwardson/dev-setup

Switch to unattended/solo mode for research and long-running tasks. Push through friction instead of pausing. Only invoked on explicit /unattended — never auto-load.

2026-05-311

documentation

joeledwardson/dev-setup

Writing, structuring, and maintaining project documentation. Auto-loaded when creating or updating docs, mkdocs sites, architecture pages, or any markdown documentation.

2026-05-311

review-docs

joeledwardson/dev-setup

Visual and structural documentation review. Screenshots live rendered pages, passes to Gemini vision for readability/colour/diagram analysis. Separate pass for structure, acronyms, assumed knowledge, and factual accuracy (broken links, stale code refs, image/claim mismatches). Persona is a junior researcher with no domain context.

2026-05-311

review-code

joeledwardson/dev-setup

Junior-dev comprehension check on changed code. Per-file design review (role, assumed knowledge, wheel-reinvention), per-unit logic review (WHAT/WHY/LOGIC/SIMPLER), then a holistic diff review. Does NOT modify code.

2026-05-311

diagram

joeledwardson/dev-setup

Generate visual diagrams (flowcharts, matrices, architecture, org charts, quadrants, mind maps, etc.) with a render-and-review feedback loop. Use when the user asks for a diagram, chart, visualization, flowchart, or "visual". Picks the right tool for the data shape (Mermaid / D2 / raw SVG / Graphviz / quadrantChart) and iterates until the rendered output is actually readable.

2026-05-291

tmux-cowork

joeledwardson/dev-setup

Run long-running or interactive commands in a tmux session the user can watch and interact with. Use this for any command expected to take >30s, dev servers, watch modes, builds, tests, REPLs, migrations, or anything the user might need to Ctrl+C or send input to. Skip for quick one-shot commands (<30s, no prompts).

2026-05-241

name	desktop-automation
description	Drive the NixOS + Hyprland desktop programmatically — window focus, keyboard input, screenshots, clipboard, VNC, and Chromium via CDP. Use when a task requires headful UI interaction, visual verification, or browser automation.

desktop-automation

Session env — set this first if SSH'd in

WAYLAND_DISPLAY and HYPRLAND_INSTANCE_SIGNATURE are unset in SSH sessions. Everything below fails silently without them.

export WAYLAND_DISPLAY="$(ls /run/user/$(id -u)/ | grep -E '^wayland-[0-9]+$' | head -1)"
export HYPRLAND_INSTANCE_SIGNATURE="$(ls /run/user/$(id -u)/hypr/ | head -1)"

The loop — always observe → act → verify

Fire-and-forget doesn't work. Every desktop action follows this cycle:

Observe — hyprctl activewindow -j or grim /tmp/out.png + Read the PNG
Act — send keys, click, navigate
Verify — screenshot again, diff against expectation

If step 3 doesn't show the expected change, re-observe before retrying. Don't blindly retry — the window focus, a modal, or the element position may have changed.

Toolchain

Tool	Use for	Notes
`hyprctl dispatch sendshortcut KEY,class:^(app)$`	Key chord to a specific window — bypasses focus	Preferred over wtype when you know the target class
`wtype`	Free-form text or chords to focused window	`wtype -M ctrl -p v -m ctrl` = Ctrl+V
`ydotool`	Mouse clicks and moves	Needs `ydotoold` running + user in `ydotool` group
`hyprctl clients -j`	List all windows with class, title, geometry	Always locate before acting
`grim -g "X,Y WxH" /tmp/out.png`	Screenshot a region	Pull geometry from `hyprctl clients -j`
`hyprctl dispatch exec "[float; size W H; center 1] cmd"`	Launch floating window centred	Good for browser UI tests

If ydotool fails with permission denied: check id for ydotool group membership. For keys-only, fall back to wtype or sendshortcut instead.

Screenshots — avoid the 2000px context limit

Multiple full-screen screenshots accumulate in context and Claude rejects requests once cumulative dimensions exceed ~2000px.

Default to region grabs: grim -g "X,Y WxH" — pull the rect from hyprctl clients -j
One screenshot per step — if you confirmed the state, don't screenshot again "just to be sure"
For browser UI: use CDP Page.captureScreenshot with a clip region instead of grim

Clipboard

wl-copy --type image/png < file.png   # load image with correct MIME type
wl-paste -l                           # verify what's on clipboard
wl-copy --clear                       # teardown

WAYLAND_DISPLAY must be set (see above).

VNC (wayvnc) — viewing the desktop remotely

wayvnc does not autostart. Check first:

ss -tlnp | grep 5900

If nothing listening, start in a tmux pane:

export WAYLAND_DISPLAY=wayland-1 && export XDG_RUNTIME_DIR=/run/user/$(id -u)
wayvnc -L info 0.0.0.0 5900

Auth gotchas — config at ~/.config/wayvnc/config:

Want no auth: set enable_auth=false
Remmina + wayvnc 0.9+ = "Unknown authentication" error — use TigerVNC (vncviewer host::5900) or switch Remmina to GTK-VNC backend

Chromium via CDP (preferred for web UI)

CDP (Chrome DevTools Protocol) beats wtype + screenshots for browser testing — no focus handling, no pixel coordinates, direct DOM access.

Launch:

brave --remote-debugging-port=9998 --user-data-dir=/tmp/brave-test &
# verify: curl -s http://localhost:9998/json/version

Use a fresh --user-data-dir if a normal instance is already running — otherwise the flag is silently ignored.

Drive (Node 22+ stdlib WebSocket):

const targets = await fetch('http://localhost:9998/json/list').then(r => r.json());
const ws = new WebSocket(targets.find(t => t.type === 'page').webSocketDebuggerUrl);
// send({method, params}) helper — see unattended skill for full pattern
await send('Page.navigate', { url: 'http://localhost:9585' });
const { result } = await send('Runtime.evaluate', { expression: 'document.title' });

Methods you'll actually use:

Page.navigate — go to URL
Runtime.evaluate — run JS, get result back (covers 90% of clicking/scraping)
Page.captureScreenshot with clip — region screenshot without grim

Don't connect to the user's logged-in browser profile.

When not to use this

Don't automate the user's accounts, messaging apps, or anything touching their identity
If 3 observe→act→verify cycles fail on the same step — stop, log, the UI model is wrong
Don't steal focus while the user is actively using the machine