| name | browser-use |
| description | Browser automation CLI for AI agents using Chrome DevTools Protocol. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction. Also use for Browser Use Cloud API operations. |
| allowed-tools | Bash(browser:*) |
Browser Automation with browser
Core Workflow
Every browser automation follows this pattern:
- Start:
browser start (launch Chrome) or browser connect <url> (attach to existing)
- Navigate:
browser open <url>
- Inspect:
browser ax-tree or browser screenshot to understand the page
- Interact: Use CSS selectors or x,y coordinates to click, type, etc.
- Verify:
browser exists <sel>, browser assert <expr>, or browser screenshot
browser start
browser open https://example.com/form
browser ax-tree --depth 3
browser click 'input[name="email"]'
browser input 'input[name="email"]' 'user@example.com'
browser input 'input[name="password"]' 'secret123'
browser click 'button[type="submit"]'
browser wait-load
browser get title
Command Chaining
Commands can be chained with && in a single shell invocation. The browser persists between commands (Chrome runs as a separate process), so chaining is safe and efficient.
browser open https://example.com && browser wait-load && browser get title
browser input '#email' 'user@example.com' && browser input '#pass' 'secret' && browser click '#submit'
browser open https://example.com && browser wait-load && browser screenshot page.png
When to chain: Use && when you don't need intermediate output. Run commands separately when you need to parse output first (e.g., ax-tree to discover elements, then interact).
Essential Commands
browser start
browser start --show
browser start -k
browser stop
browser connect <host:port>
browser connect <https://url>
browser connect <index>
browser status
browser open <url>
browser back
browser forward
browser reload
browser reload --hard
browser get url
browser get title
browser get html
browser get html <selector>
browser get text <selector>
browser get attr <selector> <name>
browser get value <selector>
browser get box <selector>
browser get styles <sel> [prop...]
browser click <selector>
browser dblclick <selector>
browser rightclick <selector>
browser input <selector> <text>
browser type <selector> <text>
browser press <key>
browser clear <selector>
browser select <selector> <value>
browser submit <selector>
browser hover <selector>
browser focus <selector>
browser check <selector>
browser uncheck <selector>
browser scrollintoview <selector>
browser click <x> <y>
browser dblclick <x> <y>
browser rightclick <x> <y>
browser input <x> <y> <text>
browser type <x> <y> <text>
browser hover <x> <y>
browser scroll <x> <y> <delta>
browser drag <x1> <y1> <x2> <y2>
browser element-at <x> <y>
browser keyboard type <text>
browser keyboard inserttext <text>
browser file <selector> <path|->
browser download <selector> [file]
browser download <selector> -
browser wait <selector>
browser wait-load
browser wait-stable
browser wait-idle
browser sleep <seconds>
browser screenshot
browser screenshot file.png
browser screenshot ./file.png
browser screenshot --full file.png
browser pdf page.pdf
browser tabs
browser switch <index>
browser new-tab [url]
browser close-tab [index]
browser exists <selector>
browser count <selector>
browser visible <selector>
browser assert <js-expr>
browser assert <js-expr> <expected>
browser ax-tree
browser ax-tree --depth 3
browser ax-tree --json
browser ax-tree --with-coords
browser ax-tree --selectors
browser ax-tree --no-refs
browser ax-find --name "Submit"
browser ax-find --role button
browser ax-node <selector>
browser ax-node <selector> --json
browser cloud login <api-key>
browser cloud logout
browser cloud GET /browsers
browser cloud POST /browsers '{}'
browser cloud PATCH /browsers/<id> '{"action":"stop"}'
browser cloud poll <task-id>
browser cloud --help
Common Patterns
Form Submission
browser start
browser open https://example.com/signup
browser ax-tree --depth 3
browser input 'input[name="name"]' 'Jane Doe'
browser input 'input[name="email"]' 'jane@example.com'
browser select '#country' 'US'
browser click 'button[type="submit"]'
browser wait-load
browser get title
Coordinate-Based Interaction
When CSS selectors are unreliable (e.g., canvas apps, complex SPAs), use coordinates. Take a screenshot first to identify positions:
browser screenshot current.png
browser click 450 320
browser element-at 450 320
browser input 300 200 'hello'
browser scroll 640 360 500
browser drag 100 100 400 400
Data Extraction
browser open https://example.com/products
browser get text 'h1'
browser get attr 'a.product' 'href'
browser eval 'document.querySelectorAll(".price").length'
browser eval 'JSON.stringify(Array.from(document.querySelectorAll(".item")).map(e => e.textContent))'
Ref-Based Interaction (Recommended)
Use ax-tree to inspect the page — interactive elements get short refs you can use directly in commands:
browser open https://example.com
browser ax-tree --depth 4
browser click i2
browser get text i3
Refs are assigned to interactive elements (links, buttons, inputs, etc.) and named content elements (headings, list items). They're refreshed every time ax-tree runs.
Accessibility-Driven Interaction
Use the accessibility tree to understand page structure without relying on implementation details:
browser ax-find --role link
browser ax-find --name "Submit"
browser ax-node 'button#save'
Tab Management
browser new-tab https://site-a.com
browser new-tab https://site-b.com
browser tabs
browser switch 1
browser get title
browser close-tab 0
Assertions and Testing
browser exists 'h1'
browser visible '.modal'
browser count '.list-item'
browser assert 'document.title === "Expected Title"'
browser assert 'document.title' 'Expected Title'
browser assert 'document.querySelectorAll(".item").length > 0'
browser assert 'location.pathname' '/dashboard' -m 'should be on dashboard'
Cloud Browser Automation
browser cloud login "$BROWSER_USE_API_KEY"
browser cloud POST /browsers '{"headless": true}'
browser connect https://<uuid>.cdp0.browser-use.com
browser open https://example.com
browser screenshot page.png
browser get title
browser stop
browser cloud POST /tasks '{"url":"https://example.com","task":"Extract the main heading"}'
browser cloud poll <task-id>
File Upload and Download
browser file 'input[type="file"]' ./document.pdf
cat image.png | browser file 'input[type="file"]' -
browser download 'a.download-link' ./output.pdf
browser download 'img.logo' -
Multiple Browsers
browser start
browser open https://site-a.com
browser status
browser connect localhost:9333
browser status
browser connect 0
browser connect 1
Exit Codes
| Code | Meaning |
|---|
| 0 | Success (also: exists/visible/assert passed) |
| 1 | Check failed (exists/visible/assert returned false, ax-find no matches) |
| 2 | Error (bad arguments, no browser, timeout, network failure) |
JavaScript Evaluation
The eval command auto-wraps expressions in () => { return (expr); }, so you can write concise expressions:
browser eval 'document.title'
browser eval '2 + 2'
browser eval 'document.querySelectorAll("a").length'
browser eval 'location.href'
browser eval 'JSON.stringify({url: location.href, title: document.title})'
Output formatting: strings are printed unquoted, numbers/booleans are raw, objects/arrays are pretty-printed as JSON.
Environment Variables
| Variable | Description | Default |
|---|
BROWSER_HOME | State directory (also controls output dir: $BROWSER_HOME/tmp/) | ~/.browser |
BROWSER_TIMEOUT | Command timeout in seconds | 30 |
BROWSER_CHROME_BIN | Chrome binary path | auto-detect |
BROWSER_USE_API_KEY | Cloud API key | none |
Persistent State
The browser runs as a separate process that survives CLI exit. State is stored in ~/.browser/state.json:
- Browser registry: Multiple browsers can be tracked, one is active
- Active page: Which tab is currently targeted
- Data dir: Chrome user data directory for local browsers
- Output dir: Screenshots, PDFs, and downloads default to
~/.browser/tmp/
Always browser stop when done to clean up. Use browser status to check what's running.