원클릭으로
desktop-control
Advanced desktop automation with mouse, keyboard, and screen control
Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.
메뉴
Advanced desktop automation with mouse, keyboard, and screen control
Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.
SOC 직업 분류 기준
| name | desktop-control |
| description | Advanced desktop automation with mouse, keyboard, and screen control |
| whenToUse | User needs desktop automation: mouse clicks, keyboard input, screenshots, or window management |
The most advanced desktop automation skill for OpenClaw. Provides pixel-perfect mouse control, lightning-fast keyboard input, screen capture, window management, and clipboard operations.
First, install required dependencies:
pip install pyautogui pillow opencv-python pygetwindow
from skills.desktop_control import DesktopController
# Initialize controller
dc = DesktopController(failsafe=True)
# Mouse operations
dc.move_mouse(500, 300) # Move to coordinates
dc.click() # Left click at current position
dc.click(100, 200, button="right") # Right click at position
# Keyboard operations
dc.type_text("Hello from OpenClaw!")
dc.hotkey("ctrl", "c") # Copy
dc.press("enter")
# Screen operations
screenshot = dc.screenshot()
position = dc.get_mouse_position()
move_mouse(x, y, duration=0, smooth=True)Move mouse to absolute screen coordinates.
Parameters:
x (int): X coordinate (pixels from left)y (int): Y coordinate (pixels from top)duration (float): Movement time in seconds (0 = instant, 0.5 = smooth)smooth (bool): Use bezier curve for natural movementExample:
# Instant movement
dc.move_mouse(1000, 500)
# Smooth 1-second movement
dc.move_mouse(1000, 500, duration=1.0)
move_relative(x_offset, y_offset, duration=0)Move mouse relative to current position.
Parameters:
x_offset (int): Pixels to move horizontally (positive = right)y_offset (int): Pixels to move vertically (positive = down)duration (float): Movement time in secondsExample:
# Move 100px right, 50px down
dc.move_relative(100, 50, duration=0.3)
click(x=None, y=None, button='left', clicks=1, interval=0.1)Perform mouse click.
Parameters:
x, y (int, optional): Coordinates to click (None = current position)button (str): 'left', 'right', 'middle'clicks (int): Number of clicks (1 = single, 2 = double)interval (float): Delay between multiple clicksExample:
# Simple left click
dc.click()
# Double-click at specific position
dc.click(500, 300, clicks=2)
# Right-click
dc.click(button='right')
drag(start_x, start_y, end_x, end_y, duration=0.5, button='left')Drag and drop operation.
Parameters:
start_x, start_y (int): Starting coordinatesend_x, end_y (int): Ending coordinatesduration (float): Drag durationbutton (str): Mouse button to useExample:
# Drag file from desktop to folder
dc.drag(100, 100, 500, 500, duration=1.0)
scroll(clicks, direction='vertical', x=None, y=None)Scroll mouse wheel.
Parameters:
clicks (int): Scroll amount (positive = up/left, negative = down/right)direction (str): 'vertical' or 'horizontal'x, y (int, optional): Position to scroll atExample:
# Scroll down 5 clicks
dc.scroll(-5)
# Scroll up 10 clicks
dc.scroll(10)
# Horizontal scroll
dc.scroll(5, direction='horizontal')
get_mouse_position()Get current mouse coordinates.
Returns: (x, y) tuple
Example:
x, y = dc.get_mouse_position()
print(f"Mouse is at: {x}, {y}")
type_text(text, interval=0, wpm=None)Type text with configurable speed.
Parameters:
text (str): Text to typeinterval (float): Delay between keystrokes (0 = instant)wpm (int, optional): Words per minute (overrides interval)Example:
# Instant typing
dc.type_text("Hello World")
# Human-like typing at 60 WPM
dc.type_text("Hello World", wpm=60)
# Slow typing with 0.1s between keys
dc.type_text("Hello World", interval=0.1)
press(key, presses=1, interval=0.1)Press and release a key.
Parameters:
key (str): Key name (see Key Names section)presses (int): Number of times to pressinterval (float): Delay between pressesExample:
# Press Enter
dc.press('enter')
# Press Space 3 times
dc.press('space', presses=3)
# Press Down arrow
dc.press('down')
hotkey(*keys, interval=0.05)Execute keyboard shortcut.
Parameters:
*keys (str): Keys to press togetherinterval (float): Delay between key pressesExample:
# Copy (Ctrl+C)
dc.hotkey('ctrl', 'c')
# Paste (Ctrl+V)
dc.hotkey('ctrl', 'v')
# Open Run dialog (Win+R)
dc.hotkey('win', 'r')
# Save (Ctrl+S)
dc.hotkey('ctrl', 's')
# Select All (Ctrl+A)
dc.hotkey('ctrl', 'a')
key_down(key) / key_up(key)Manually control key state.
Example:
# Hold Shift
dc.key_down('shift')
dc.type_text("hello") # Types "HELLO"
dc.key_up('shift')
# Hold Ctrl and click (for multi-select)
dc.key_down('ctrl')
dc.click(100, 100)
dc.click(200, 100)
dc.key_up('ctrl')
screenshot(region=None, filename=None)Capture screen or region.
Parameters:
region (tuple, optional): (left, top, width, height) for partial capturefilename (str, optional): Path to save imageReturns: PIL Image object
Example:
# Full screen
img = dc.screenshot()
# Save to file
dc.screenshot(filename="screenshot.png")
# Capture specific region
img = dc.screenshot(region=(100, 100, 500, 300))
get_pixel_color(x, y)Get color of pixel at coordinates.
Returns: RGB tuple (r, g, b)
Example:
r, g, b = dc.get_pixel_color(500, 300)
print(f"Color at (500, 300): RGB({r}, {g}, {b})")
find_on_screen(image_path, confidence=0.8)Find image on screen (requires OpenCV).
Parameters:
image_path (str): Path to template imageconfidence (float): Match threshold (0-1)Returns: (x, y, width, height) or None
Example:
# Find button on screen
location = dc.find_on_screen("button.png")
if location:
x, y, w, h = location
# Click center of found image
dc.click(x + w//2, y + h//2)
get_screen_size()Get screen resolution.
Returns: (width, height) tuple
Example:
width, height = dc.get_screen_size()
print(f"Screen: {width}x{height}")
get_all_windows()List all open windows.
Returns: List of window titles
Example:
windows = dc.get_all_windows()
for title in windows:
print(f"Window: {title}")
activate_window(title_substring)Bring window to front by title.
Parameters:
title_substring (str): Part of window title to matchExample:
# Activate Chrome
dc.activate_window("Chrome")
# Activate VS Code
dc.activate_window("Visual Studio Code")
get_active_window()Get currently focused window.
Returns: Window title (str)
Example:
active = dc.get_active_window()
print(f"Active window: {active}")
copy_to_clipboard(text)Copy text to clipboard.
Example:
dc.copy_to_clipboard("Hello from OpenClaw!")
get_from_clipboard()Get text from clipboard.
Returns: str
Example:
text = dc.get_from_clipboard()
print(f"Clipboard: {text}")
'a' through 'z'
'0' through '9'
'f1' through 'f24'
'enter' / 'return''esc' / 'escape''space' / 'spacebar''tab''backspace''delete' / 'del''insert''home''end''pageup' / 'pgup''pagedown' / 'pgdn''up' / 'down' / 'left' / 'right''ctrl' / 'control''shift''alt''win' / 'winleft' / 'winright''cmd' / 'command' (Mac)'capslock''numlock''scrolllock''.' / ',' / '?' / '!' / ';' / ':''[' / ']' / '{' / '}''(' / ')''+' / '-' / '*' / '/' / '='Move mouse to any corner of the screen to abort all automation.
# Enable failsafe (enabled by default)
dc = DesktopController(failsafe=True)
# Pause all automation for 2 seconds
dc.pause(2.0)
# Check if automation is safe to proceed
if dc.is_safe():
dc.click(500, 500)
Require user confirmation before actions:
dc = DesktopController(require_approval=True)
# This will ask for confirmation
dc.click(500, 500) # Prompt: "Allow click at (500, 500)? [y/n]"
dc = DesktopController()
# Click name field
dc.click(300, 200)
dc.type_text("John Doe", wpm=80)
# Tab to next field
dc.press('tab')
dc.type_text("john@example.com", wpm=80)
# Tab to password
dc.press('tab')
dc.type_text("SecurePassword123", wpm=60)
# Submit form
dc.press('enter')
# Capture specific area
region = (100, 100, 800, 600) # left, top, width, height
img = dc.screenshot(region=region)
# Save with timestamp
import datetime
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
img.save(f"capture_{timestamp}.png")
# Hold Ctrl and click multiple files
dc.key_down('ctrl')
dc.click(100, 200) # First file
dc.click(100, 250) # Second file
dc.click(100, 300) # Third file
dc.key_up('ctrl')
# Copy selected files
dc.hotkey('ctrl', 'c')
# Activate Calculator
dc.activate_window("Calculator")
time.sleep(0.5)
# Type calculation
dc.type_text("5+3=", interval=0.2)
time.sleep(0.5)
# Take screenshot of result
dc.screenshot(filename="calculation_result.png")
# Drag file from source to destination
dc.drag(
start_x=200, start_y=300, # File location
end_x=800, end_y=500, # Folder location
duration=1.0 # Smooth 1-second drag
)
duration=0get_screen_size() to confirm dimensionsinterval for reliabilityDesktopController(failsafe=False)Install all:
pip install pyautogui pillow opencv-python pygetwindow
Built for OpenClaw - The ultimate desktop automation companion 🦞
Headless browser automation CLI optimized for AI agents with accessibility tree snapshots and ref-based element selection
Automatically update Clawdbot and all installed skills once daily. Runs via cron, checks for updates, applies them, and messages the user with a summary of what changed.
Design and implement automation workflows to save time and scale operations as a solopreneur. Use when identifying repetitive tasks to automate, building workflows across tools, setting up triggers and actions, or optimizing existing automations. Covers automation opportunity identification, workflow design, tool selection (Zapier, Make, n8n), testing, and maintenance. Trigger on "automate", "automation", "workflow automation", "save time", "reduce manual work", "automate my business", "no-code automation".
Search the web using Baidu AI Search Engine (BDSE). Use for live information, documentation, or research topics.
Web search and content extraction via Brave Search API. Use for searching documentation, facts, or any web content. Lightweight, no browser required.
Multi-source deep research agent. Searches the web, synthesizes findings, and delivers cited reports. No API keys required.