| name | browser |
| description | Browser automation with persistent page state. Use when users ask to navigate websites, fill forms, take screenshots, extract web data, test web apps, or automate browser workflows. Trigger phrases include "go to [url]", "click on", "fill out the form", "take a screenshot", "scrape", "automate", "test the website", "log into", or any browser interaction request. |
Browser Automation
Browser automation that maintains page state across command executions. Write small, focused commands to accomplish tasks incrementally.
Choosing Your Approach
- Local/source-available sites: Read the source code first to write selectors directly
- Unknown page layouts: Use
snapshot to discover elements, then select-ref to interact
- Visual debugging: Take
screenshot to see current page state
Prerequisites
curl -s http://localhost:9222/ | head -1 || echo "SERVER_NOT_RUNNING"
Running Commands
All commands use client.py from the skill directory:
uv run skills/browser/client.py <command> [arguments]
⚠️ IMPORTANT: Always use uv run client.py, NOT uv run python client.py. The uv run command automatically handles Python and dependencies from pyproject.toml. Adding python breaks dependency resolution.
Workflow Loop
Follow this pattern for complex tasks:
- Run a command to perform one action
- Observe the output
- Evaluate - did it work? What's the current state?
- Decide - is the task complete or do we need another command?
- Repeat until task is done
No TypeScript in Browser Context
Code passed to page.evaluate() runs in the browser, which doesn't understand TypeScript:
const text = await page.evaluate(() => {
return document.body.innerText;
});
const text = await page.evaluate(() => {
const el: HTMLElement = document.body;
return el.innerText;
});
Waiting
uv run skills/browser/client.py wait-load main
uv run skills/browser/client.py wait-selector main ".results"
uv run skills/browser/client.py wait-url main "**/success"
Scraping Data
For large datasets, intercept and replay API requests rather than scrolling DOM. See refs/scraping.md for the complete guide covering request capture, schema discovery, and paginated API replay.
Inspecting Page State
Screenshots
uv run skills/browser/client.py screenshot main screenshot.png
uv run skills/browser/client.py screenshot main full.png --full-page
ARIA Snapshot (Element Discovery)
Use snapshot to discover page elements. Returns YAML-formatted accessibility tree:
- banner:
- link "Hacker News" [ref=e1]
- navigation:
- link "new" [ref=e2]
- main:
- heading "Products" [ref=e3] [level=1]
- list:
- listitem:
- link "Article Title" [ref=e4]
- button "Add to Cart" [ref=e5]
- listitem:
- link "Another Article" [ref=e6]
- button "Add to Cart" [ref=e7] [nth=1]
- contentinfo:
- textbox [ref=e8]
- /placeholder: "Search"
Interpreting refs:
[ref=eN] - Element reference for interaction
[nth=N] - Nth duplicate element with same role+name (0-indexed, first one omitted)
[checked], [disabled], [expanded] - Element states
[level=N] - Heading level
/url:, /placeholder: - Element properties
Interacting with refs:
uv run skills/browser/client.py snapshot main
uv run skills/browser/client.py snapshot main -i
uv run skills/browser/client.py select-ref main e2 click
uv run skills/browser/client.py select-ref main e7 click
uv run skills/browser/client.py select-ref main e8 fill "search term"
Error Recovery
Page state persists after failures. Debug with:
uv run skills/browser/client.py screenshot main debug.png
uv run skills/browser/client.py info main
uv run skills/browser/client.py text main "body"
Command Reference
Page Management
uv run skills/browser/client.py list
uv run skills/browser/client.py create main
uv run skills/browser/client.py create main "https://..."
uv run skills/browser/client.py goto main "https://..."
uv run skills/browser/client.py close main
uv run skills/browser/client.py info main
uv run skills/browser/client.py release main
Element Interaction
uv run skills/browser/client.py click main "button.submit"
uv run skills/browser/client.py fill main "input#email" "test@example.com"
uv run skills/browser/client.py hover main ".dropdown"
uv run skills/browser/client.py keyboard main "Enter"
uv run skills/browser/client.py text main "h1"
JavaScript Execution
uv run skills/browser/client.py evaluate main "document.title"
uv run skills/browser/client.py evaluate main "document.querySelectorAll('.item').length"
Python Script (Advanced)
For complex tasks requiring loops or page.on() event handlers, use heredoc with BrowserClient:
cd skills/browser && uv run python <<'EOF'
from client import BrowserClient
client = BrowserClient()
page = client.get_playwright_page("main")
page.goto("https://example.com")
page.click("button")
page.on("response", lambda r: print(r.url))
EOF
The page object is a standard Playwright Page.
Important: Release Page After All Operations
⚠️ CRITICAL: Call release command when you finish ALL operations on a page!
When browser skill operates on a page, Max UI shows an "Agent Operating" indicator and locks user interaction. You must call release when your entire operation sequence is complete:
uv run skills/browser/client.py create main "https://example.com"
uv run skills/browser/client.py wait-load main
uv run skills/browser/client.py snapshot main
uv run skills/browser/client.py click main "button.submit"
uv run skills/browser/client.py wait-load main
uv run skills/browser/client.py release main
Why this matters:
- Without
release, the indicator stays for 30 seconds (timeout fallback)
- User interaction is blocked during operations
- Calling
release immediately unlocks user control
When to call release:
- After completing a task (e.g., filled a form, extracted data)
- Before reporting results to the user
- NOT after each individual command - only when the entire operation sequence is done
For Python scripts:
from client import BrowserClient
client = BrowserClient()
try:
page = client.get_playwright_page("main")
finally:
client.release_page("main")
client.disconnect()