Run any Skill in Manus with one click

$pwd:

open-browser

Name: Open Browser
Author: softpudding

// Drive a real Chrome browser through the local OpenBrowser service for interactive website tasks that require rendered-page inspection, clicking, typing, scrolling, dialog handling, or multi-step navigation. Use when Claude Code needs to open websites, fill forms, scrape JS-rendered content, reproduce browser-only issues, verify a frontend change end-to-end, or complete UI workflows. Prefer direct HTTP/API tools for simple fetches, downloads, or non-visual integrations.

Run Skill in Manus

$ git log --oneline --stat

stars:182

forks:18

updated:April 18, 2026 at 02:21

File Explorer

6 files

SKILL.md

readonly

related-skills.json

same repository

ob-routines.md

from "softpudding/OpenBrowser"

Record, compile, and replay Browser Routines — saved, named browser workflows. (Alias for openbrowser-routines.) Supports subcommands: "list [query]" to list/search routines, "new" to record a new routine, "execute <name>" to replay a saved routine. Use when the user says "list routines", "record a routine", "replay X", "execute X", or "/ob-routines <subcommand>".

2026-04-24182

open-browser.md

from "softpudding/OpenBrowser"

Drive a real Chrome browser through the local OpenBrowser service for interactive website tasks that require rendered-page inspection, clicking, typing, scrolling, dialog handling, or multi-step navigation. Use when Codex needs to open websites, fill forms, scrape JS-rendered content, reproduce browser-only issues, or complete end-to-end UI workflows. Prefer direct HTTP/API tools for simple fetches, downloads, or non-visual integrations.

2026-03-31182

open-browser.md

from "softpudding/OpenBrowser"

Visual AI browser automation via OpenBrowser Agent. Use when the user asks to "automate browser", "control Chrome", "browse website with AI", "use OpenBrowser", "run browser automation", or mentions web scraping, form filling, UI testing. Advantages over Browser Relay based on evaluation with human-like interactive web tasks (multi-step workflows, form interactions, agent dialogs): (1) 100% pass rate vs 85.7%, (2) Isolated context prevents overflow, (3) Handles complex tasks that Browser Relay fails. Prefer for complex multi-step workflows; simple page visits can use Browser Relay.

2026-03-20182

package.json

"author": "softpudding"

"repository": "softpudding/OpenBrowser"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name

open-browser

description

Drive a real Chrome browser through the local OpenBrowser service for interactive website tasks that require rendered-page inspection, clicking, typing, scrolling, dialog handling, or multi-step navigation. Use when Claude Code needs to open websites, fill forms, scrape JS-rendered content, reproduce browser-only issues, verify a frontend change end-to-end, or complete UI workflows. Prefer direct HTTP/API tools for simple fetches, downloads, or non-visual integrations.

OpenBrowser (Claude flavor)

Use OpenBrowser as a dedicated browser-operation agent when the task depends on a live Chrome session and visual page state. This flavor of the skill is tuned for Claude Code: foreground SSE streaming by default, no message truncation, conversation reuse for follow-up turns, and the noisy system prompt suppressed from the log.

Decide Fast

Use this skill for:

Multi-step UI navigation
Form filling and browser interactions
JS-rendered pages that need a real browser
Visual verification of a frontend change you just made
Browser bug reproduction or manual flow verification

Do not use this skill for:

Simple API calls — use curl or the WebFetch tool
Static downloads — use curl -O
Local file transformations — use the standard editing tools

Preconditions

Before sending a browser task, confirm all of the following:

The OpenBrowser server is reachable at http://127.0.0.1:8765
The Chrome extension is connected
The OpenBrowser UI already has a valid LLM configuration
A browser UUID is available through OPENBROWSER_CHROME_UUID or --chrome-uuid

Run this first:

python3 ~/.claude/skills/open-browser/scripts/check_status.py --chrome-uuid "$OPENBROWSER_CHROME_UUID"

If readiness fails, read references/setup.md or references/troubleshooting.md.

Standard Workflow

Run check_status.py.
If the server is down, start it from the repo root with uv run local-chrome-server serve. Use the Bash tool's run_in_background: true parameter so Claude Code surfaces its completion natively.
If the extension, UUID, or API key is missing, pause and ask the user to complete the manual Chrome/UI steps from references/setup.md.
Submit the task with send_task.py in foreground mode. The full SSE stream lands directly in your conversation log — no truncation, no separate file to tail.
Read the live SSE output to monitor actions, observations, and the final assistant message in real time.
For very long tasks, run the same send_task.py invocation through the Bash tool with run_in_background: true rather than the --background flag. Claude Code will notify you on completion; never sleep-poll.
Summarize the final browser outcome, any failures, and the conversation ID when useful so a follow-up turn can reuse it.

Run Tasks

Foreground is the default and almost always the right choice in Claude Code, because the SSE stream becomes part of your conversation context without any extra plumbing:

python3 ~/.claude/skills/open-browser/scripts/send_task.py \
  "Open https://example.com and report the page title" \
  --chrome-uuid "$OPENBROWSER_CHROME_UUID"

What you will see in the log:

[conversation] <id> and [task] <text> once at the start
[system_prompt] suppressed (…) instead of a 12 KB prompt dump
[action] click element_id=A1H style lines as the agent acts
[observation:ok] … after each action
[message:assistant] <full text> for any assistant message — no trailing …, no character cap
[usage] model=… cost_rmb=… tokens=… at the end
[complete] Conversation completed

If you need to inspect the OpenBrowser system prompt for debugging, pass --show-system-prompt to disable suppression.

Long-running tasks

If you expect a task to run for several minutes and don't want to keep the foreground tool busy, prefer Claude Code's native background mode. Invoke the same send_task.py command via the Bash tool with run_in_background: true. Claude Code will notify you when it completes — do not sleep, poll, or tail -f the log.

The legacy --background --output /tmp/openbrowser.log flag still works as a fallback for shell-only environments, but it should not be your default in Claude Code.

Attaching images to a task

OpenBrowser uses multimodal LLMs, so you can send an image (screenshot, UI mockup, reference photo) alongside the text prompt. Use this when the task is "recreate this design", "why does my UI look wrong compared to this screenshot", or "find the element that matches this picture".

Pass --image PATH once per image. Images are read from disk, base64 encoded, and sent as data URIs — no upload endpoint or static server is required. Limit: 10 MB per image, up to 8 images per message.

python3 ~/.claude/skills/open-browser/scripts/send_task.py \
  "Open the local dashboard and tell me which section looks different from this screenshot." \
  --image /tmp/reference.png \
  --chrome-uuid "$OPENBROWSER_CHROME_UUID"

Typical use cases:

Visual regression check: screenshot the expected UI, send it with "compare this to the current page at http://localhost:3000 and list differences".
Reproducing a bug from a user report: drop in the user's screenshot and ask "navigate to the page shown here and confirm you see the same error".
Asset-matching: send a design mockup and ask the agent to pick the closest live element from the current page.

The conversation history saves an [image attached: name, NkB] marker (not the bytes) so replays don't balloon to megabytes.

Follow-up turns on the same browser session

To send a second instruction to the same conversation (so the agent keeps its prior screenshots and observations), reuse the conversation ID from the previous run:

python3 ~/.claude/skills/open-browser/scripts/send_task.py \
  "Now click the 'Sign in' button you just identified" \
  --chrome-uuid "$OPENBROWSER_CHROME_UUID" \
  --conversation-id 1b32b26a-1a7e-4b6c-9599-139fc6b9c89b

This is the right tool for: stepwise debugging of a web flow, breaking a long task into smaller verifiable chunks, or re-asking the agent to report a value it already saw.

Working Directory

The skill's scripts live at ~/.claude/skills/open-browser/ so they work from any project's current working directory. The OpenBrowser server itself must still be started from the repo root (uv run local-chrome-server serve in ~/git/OpenBrowser).

Use --cwd when the browser task should operate with context from another workspace:

python3 ~/.claude/skills/open-browser/scripts/send_task.py \
  "Open the local app and verify the login flow" \
  --cwd /absolute/path/to/project \
  --chrome-uuid "$OPENBROWSER_CHROME_UUID"

Verifying frontend changes you just made

A common Claude Code workflow on this repo: edit frontend/index.html, then ask the OpenBrowser agent to load the page and confirm the change visually. Tips that improve the success rate:

Be specific about the route to take and the element to interact with ("click the entry whose id starts with 38322447", not "click any recording").
Ask the agent to scroll the specific pane rather than the document, and to report what it sees afterwards. This is the only reliable way to verify nested-scroll bugs without DevTools.
For pure layout questions (what computed style, what scroll height), the agent can't run JavaScript — fall back to asking the user to check, or rebuild the page locally and re-test.
End the prompt with an explicit "Be concise — 3-5 sentences" so the final assistant message is short enough to read at a glance.

Failure Handling

If the task does not start or fails immediately:

Re-run check_status.py
Verify that the browser UUID is still valid
Inspect the live foreground stream first (it is in your conversation log — no need to open any file)
If you used background mode, inspect /tmp/openbrowser.log or your chosen log file
Read references/troubleshooting.md

If you need lower-level control or want to inspect conversations directly, read references/api_reference.md.

Proxy gotcha

If a global HTTP_PROXY / HTTPS_PROXY is set in the environment, prefix Bash invocations with NO_PROXY="127.0.0.1,localhost" so requests to 127.0.0.1:8765 bypass the proxy. The Python scripts use stdlib urllib and respect the same variables.

References

references/setup.md: Read when OpenBrowser is not ready yet
references/troubleshooting.md: Read when connectivity, UUID, or task execution fails
references/api_reference.md: Read when scripting against the HTTP API or inspecting conversation state

open-browser

More from this repository

More from this repository

OpenBrowser (Claude flavor)

Decide Fast

Preconditions

Standard Workflow

Run Tasks

Long-running tasks

Attaching images to a task

Follow-up turns on the same browser session

Working Directory

Verifying frontend changes you just made

Failure Handling

Proxy gotcha

References

OpenBrowser (Claude flavor)

Decide Fast

Preconditions

Standard Workflow

Run Tasks

Long-running tasks

Attaching images to a task

Follow-up turns on the same browser session

Working Directory

Verifying frontend changes you just made

Failure Handling

Proxy gotcha

References