with one click
open-interpreter
// Desktop GUI automation via OpenInterpreter — mouse, keyboard, screenshot,
// Desktop GUI automation via OpenInterpreter — mouse, keyboard, screenshot,
[HINT] Download the complete skill directory including SKILL.md and all related files
| name | open-interpreter |
| description | Desktop GUI automation via OpenInterpreter — mouse, keyboard, screenshot, |
Desktop control for Claude Code via OpenInterpreter (62k stars, AGPL-3.0). Mouse, keyboard, screenshot, and OCR primitives backed by pyautogui + pytesseract.
| Mode | LLM | Script | Best For |
|---|---|---|---|
| Library | Claude Code (native) | Individual scripts below | Surgical GUI actions — Claude sees screenshots, reasons, dispatches actions |
| OS subprocess | Claude API (via OI) | oi_os_mode.py | Full autonomous computer use — delegate entire GUI tasks |
| Local agent | Ollama (offline) | oi_os_mode.py --local | Offline computer use, no API costs, privacy-sensitive tasks |
Use Library mode by default. Use OS subprocess to delegate self-contained GUI tasks. Use Local agent when offline or to avoid API costs.
Run once:
~/.claude/skills/open-interpreter/scripts/oi_install.sh
Installs open-interpreter[os] via uv, verifies pyautogui and tesseract, checks macOS permissions.
macOS permissions (one-time, manual):
Verify permissions:
python3 ~/.claude/skills/open-interpreter/scripts/oi_permission_check.py
The core pattern for GUI automation:
1. Take screenshot → oi_screenshot.py
2. Read PNG → Claude Read tool (native vision)
3. Decide action → Claude reasoning
4. Execute action → oi_click.py / oi_type.py
5. Verify → Take another screenshot
6. Loop until done
oi_screenshot.py — Capture screen, return file path with Retina metadata
python3 ~/.claude/skills/open-interpreter/scripts/oi_screenshot.py
python3 ~/.claude/skills/open-interpreter/scripts/oi_screenshot.py --region 0,0,800,600
python3 ~/.claude/skills/open-interpreter/scripts/oi_screenshot.py --active-window
Output (3 lines):
/tmp/oi_screenshot_1708789200.png
SCALE_FACTOR=2
SCREEN_SIZE=1512x982
oi_click.py — Mouse click by coordinates or OCR text
python3 ~/.claude/skills/open-interpreter/scripts/oi_click.py --x 450 --y 300
python3 ~/.claude/skills/open-interpreter/scripts/oi_click.py --x 900 --y 600 --image-coords
python3 ~/.claude/skills/open-interpreter/scripts/oi_click.py --text "Submit"
python3 ~/.claude/skills/open-interpreter/scripts/oi_click.py --x 450 --y 300 --double
python3 ~/.claude/skills/open-interpreter/scripts/oi_click.py --x 450 --y 300 --right
--image-coords: auto-divides by Retina scale factor (use when coordinates come from screenshot image pixels)--text: OCR-based — screenshots, finds text via pytesseract, clicks center of matchoi_type.py — Keyboard input
python3 ~/.claude/skills/open-interpreter/scripts/oi_type.py --text "hello world"
python3 ~/.claude/skills/open-interpreter/scripts/oi_type.py --key enter
python3 ~/.claude/skills/open-interpreter/scripts/oi_type.py --hotkey command space
python3 ~/.claude/skills/open-interpreter/scripts/oi_type.py --text "search" --method typewrite
--method typewrite: character-by-character (use when clipboard is needed for other purposes)--hotkey: AppleScript on macOS for reliable modifier key handlingoi_find_text.py — OCR screen reading
python3 ~/.claude/skills/open-interpreter/scripts/oi_find_text.py --text "Submit"
python3 ~/.claude/skills/open-interpreter/scripts/oi_find_text.py --text "Price" --screenshot /tmp/ss.png
Returns JSON array: [{"text": "Submit", "x": 450, "y": 300, "w": 80, "h": 24, "confidence": 95}]
oi_computer.py — Unified dispatch for all actions
python3 ~/.claude/skills/open-interpreter/scripts/oi_computer.py screenshot
python3 ~/.claude/skills/open-interpreter/scripts/oi_computer.py click --x 450 --y 300
python3 ~/.claude/skills/open-interpreter/scripts/oi_computer.py type --text "hello"
python3 ~/.claude/skills/open-interpreter/scripts/oi_computer.py find --text "Submit"
python3 ~/.claude/skills/open-interpreter/scripts/oi_computer.py scroll --clicks 3
python3 ~/.claude/skills/open-interpreter/scripts/oi_computer.py mouse-position
python3 ~/.claude/skills/open-interpreter/scripts/oi_computer.py screen-size
macOS Retina displays render at 2x (or 3x) scaling. Screenshot image pixels differ from screen coordinates:
| Metric | Example (14" MBP) |
|---|---|
| Image pixels (screenshot) | 3024 x 1964 |
| Screen coordinates (pyautogui) | 1512 x 982 |
| Scale factor | 2x |
When estimating click targets from a screenshot image, use --image-coords on oi_click.py to auto-divide by the scale factor. The oi_screenshot.py output includes SCALE_FACTOR metadata.
For self-contained GUI tasks, delegate to OI's full agent loop:
python3 ~/.claude/skills/open-interpreter/scripts/oi_os_mode.py "Open Calculator and compute 2+2"
python3 ~/.claude/skills/open-interpreter/scripts/oi_os_mode.py --provider anthropic "Change the desktop wallpaper"
OI runs its own screenshot → analyze → act loop using the Claude API. Requires ANTHROPIC_API_KEY.
Run OI with a local vision model via Ollama:
python3 ~/.claude/skills/open-interpreter/scripts/oi_os_mode.py --local "What apps are open?"
Prerequisites:
ollama serveollama pull llama3.2-visionLimitation: Local models use OI's classic code-execution mode, not the screenshot-driven OS Mode (which requires Claude 3.5 Sonnet). Local mode generates and executes code to accomplish GUI tasks rather than using pixel-level screenshot analysis.
python3 scripts/oi_type.py --hotkey command space
sleep 0.5
python3 scripts/oi_type.py --text "Calculator"
sleep 0.3
python3 scripts/oi_type.py --key enter
python3 scripts/oi_screenshot.py > /tmp/ss_meta.txt
python3 scripts/oi_find_text.py --text "Total" --screenshot "$(head -1 /tmp/ss_meta.txt)"
python3 scripts/oi_click.py --text "Save"
python3 scripts/oi_click.py --text "Email"
python3 scripts/oi_type.py --text "user@example.com"
python3 scripts/oi_type.py --key tab
python3 scripts/oi_type.py --text "password123"
pyautogui.FailSafeException (enabled by default)[oi] click at (450, 300) button=left| Symptom | Fix |
|---|---|
oi_screenshot.py returns black image | Grant Screen Recording permission to terminal app |
oi_click.py / oi_type.py no effect | Grant Accessibility permission to terminal app |
| OCR finds no text | Verify tesseract: which tesseract && tesseract --version |
| Retina coordinates off by 2x | Use --image-coords flag on oi_click.py |
oi_find_text.py low confidence | Try larger text, ensure screen is not obstructed |
| OS Mode hangs | Verify ANTHROPIC_API_KEY is set, check OI stderr output |
| Local mode fails | Verify Ollama running (ollama list) and model pulled |
| File | Contents |
|---|---|
references/computer-api.md | OI Computer API reference — mouse, keyboard, display, clipboard |
references/os-mode.md | OS Mode usage, provider configuration, agent loop architecture |
references/safety-and-permissions.md | macOS permissions guide, safety model, failsafe configuration |