| name | agent-browser |
| description | Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages. |
Browser Automation with agent-browser
Quick start
agent-browser open <url>
agent-browser snapshot -i
agent-browser click @e1
agent-browser fill @e2 "text"
agent-browser close
Core workflow
- Navigate:
agent-browser open <url>
- Snapshot:
agent-browser snapshot -i (returns elements with refs like @e1, @e2)
- Interact using refs from the snapshot
- Re-snapshot after navigation or significant DOM changes
Commands
Navigation
agent-browser open <url>
agent-browser back
agent-browser forward
agent-browser reload
agent-browser close
Snapshot (page analysis)
agent-browser snapshot
agent-browser snapshot -i
agent-browser snapshot -i -C
agent-browser snapshot -c
agent-browser snapshot -d 3
agent-browser snapshot -s "#main"
agent-browser snapshot -i -c -d 5
The -C flag is useful for modern web apps that use custom clickable elements (divs, spans) instead of standard buttons/links.
Interactions (use @refs from snapshot)
agent-browser click @e1
agent-browser dblclick @e1
agent-browser focus @e1
agent-browser fill @e2 "text"
agent-browser type @e2 "text"
agent-browser keyboard type "text"
agent-browser keyboard inserttext "text"
agent-browser press Enter
agent-browser press Control+a
agent-browser keydown Shift
agent-browser keyup Shift
agent-browser hover @e1
agent-browser check @e1
agent-browser uncheck @e1
agent-browser select @e1 "value"
agent-browser scroll down 500
agent-browser scrollintoview @e1
agent-browser drag @e1 @e2
agent-browser upload @e1 file.pdf
Get information
agent-browser get text @e1
agent-browser get html @e1
agent-browser get value @e1
agent-browser get attr @e1 href
agent-browser get title
agent-browser get url
agent-browser get count ".item"
agent-browser get box @e1
agent-browser get styles @e1
Check state
agent-browser is visible @e1
agent-browser is enabled @e1
agent-browser is checked @e1
Screenshots & PDF
agent-browser screenshot
agent-browser screenshot path.png
agent-browser screenshot --full
agent-browser screenshot --annotate
agent-browser pdf output.pdf
Annotated screenshots overlay numbered labels [N] on interactive elements. Each label corresponds to ref @eN, so refs work for both visual and text workflows:
agent-browser screenshot --annotate ./page.png
agent-browser click @e2
Video recording
agent-browser record start ./demo.webm
agent-browser click @e1
agent-browser record stop
agent-browser record restart ./take2.webm
Recording creates a fresh context but preserves cookies/storage from your session.
Wait
agent-browser wait @e1
agent-browser wait 2000
agent-browser wait --text "Success"
agent-browser wait --url "**/dashboard"
agent-browser wait --load networkidle
agent-browser wait --fn "window.ready"
Load states: load, domcontentloaded, networkidle
Mouse control
agent-browser mouse move 100 200
agent-browser mouse down left
agent-browser mouse up left
agent-browser mouse wheel 100
Semantic locators (alternative to refs)
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find placeholder "Search..." fill "query"
agent-browser find alt "Logo" click
agent-browser find title "Close" click
agent-browser find testid "submit-btn" click
agent-browser find first ".item" click
agent-browser find last ".item" click
agent-browser find nth 2 "a" text
Actions: click, fill, type, hover, focus, check, uncheck, text
Options: --name <name> (filter role by accessible name), --exact (require exact text match)
Browser settings
agent-browser set viewport 1920 1080
agent-browser set device "iPhone 14"
agent-browser set geo 37.7749 -122.4194
agent-browser set offline on
agent-browser set headers '{"X-Key":"v"}'
agent-browser set credentials user pass
agent-browser set media dark
Cookies & Storage
agent-browser cookies
agent-browser cookies set name value
agent-browser cookies clear
agent-browser storage local
agent-browser storage local key
agent-browser storage local set k v
agent-browser storage local clear
agent-browser storage session
Network
agent-browser network route <url>
agent-browser network route <url> --abort
agent-browser network route <url> --body '{}'
agent-browser network unroute [url]
agent-browser network requests
agent-browser network requests --filter api
Tabs & Windows
agent-browser tab
agent-browser tab new [url]
agent-browser tab 2
agent-browser tab close
agent-browser window new
Frames
agent-browser frame "#iframe"
agent-browser frame main
Dialogs
agent-browser dialog accept [text]
agent-browser dialog dismiss
Diff (compare snapshots, screenshots, URLs)
agent-browser diff snapshot
agent-browser diff snapshot --baseline before.txt
agent-browser diff snapshot --selector "#main" --compact
agent-browser diff screenshot --baseline before.png
agent-browser diff screenshot --baseline b.png -o d.png
agent-browser diff screenshot --baseline b.png -t 0.2
agent-browser diff url https://v1.com https://v2.com
agent-browser diff url https://v1.com https://v2.com --screenshot
agent-browser diff url https://v1.com https://v2.com --selector "#main"
JavaScript
agent-browser eval "document.title"
agent-browser eval -b "base64code"
agent-browser eval --stdin
Debug & Profiling
agent-browser console
agent-browser console --clear
agent-browser errors
agent-browser errors --clear
agent-browser highlight @e1
agent-browser trace start
agent-browser trace stop trace.zip
agent-browser profiler start
agent-browser profiler stop profile.json
State management
agent-browser state save auth.json
agent-browser state load auth.json
agent-browser state list
agent-browser state show <file>
agent-browser state rename <old> <new>
agent-browser state clear [name]
agent-browser state clear --all
agent-browser state clean --older-than <days>
Setup
agent-browser install
agent-browser install --with-deps
Global Options
| Option | Description |
|---|
--session <name> | Isolated browser session (AGENT_BROWSER_SESSION env) |
--session-name <name> | Auto-save/restore session state (AGENT_BROWSER_SESSION_NAME env) |
--profile <path> | Persistent browser profile (AGENT_BROWSER_PROFILE env) |
--state <path> | Load storage state from JSON file (AGENT_BROWSER_STATE env) |
--headers <json> | HTTP headers scoped to URL's origin |
--executable-path <path> | Custom browser binary (AGENT_BROWSER_EXECUTABLE_PATH env) |
--extension <path> | Load browser extension (repeatable; AGENT_BROWSER_EXTENSIONS env) |
--args <args> | Browser launch args (AGENT_BROWSER_ARGS env) |
--user-agent <ua> | Custom User-Agent (AGENT_BROWSER_USER_AGENT env) |
--proxy <url> | Proxy server (AGENT_BROWSER_PROXY env) |
--proxy-bypass <hosts> | Hosts to bypass proxy (AGENT_BROWSER_PROXY_BYPASS env) |
--ignore-https-errors | Ignore HTTPS certificate errors |
--allow-file-access | Allow file:// URLs to access local files |
-p, --provider <name> | Cloud browser provider (AGENT_BROWSER_PROVIDER env) |
--device <name> | iOS device name (AGENT_BROWSER_IOS_DEVICE env) |
--json | Machine-readable JSON output |
--full, -f | Full page screenshot |
--annotate | Annotated screenshot with numbered labels (AGENT_BROWSER_ANNOTATE env) |
--headed | Show browser window (AGENT_BROWSER_HEADED env) |
--cdp <port|wss://url> | Connect via Chrome DevTools Protocol |
--auto-connect | Auto-discover running Chrome (AGENT_BROWSER_AUTO_CONNECT env) |
--color-scheme <scheme> | Color scheme: dark, light, no-preference (AGENT_BROWSER_COLOR_SCHEME env) |
--download-path <path> | Default download directory (AGENT_BROWSER_DOWNLOAD_PATH env) |
--native | [Experimental] Use native Rust daemon (AGENT_BROWSER_NATIVE env) |
--config <path> | Custom config file (AGENT_BROWSER_CONFIG env) |
--debug | Debug output |
Security options
| Option | Description |
|---|
--content-boundaries | Wrap page output in boundary markers (AGENT_BROWSER_CONTENT_BOUNDARIES env) |
--max-output <chars> | Truncate page output to N characters (AGENT_BROWSER_MAX_OUTPUT env) |
--allowed-domains <list> | Comma-separated allowed domain patterns (AGENT_BROWSER_ALLOWED_DOMAINS env) |
--action-policy <path> | Path to action policy JSON file (AGENT_BROWSER_ACTION_POLICY env) |
--confirm-actions <list> | Action categories requiring confirmation (AGENT_BROWSER_CONFIRM_ACTIONS env) |
Configuration file
Create agent-browser.json for persistent defaults (no need to repeat flags):
Locations (lowest to highest priority):
~/.agent-browser/config.json — user-level defaults
./agent-browser.json — project-level overrides
AGENT_BROWSER_* environment variables
- CLI flags override everything
{
"headed": true,
"proxy": "http://localhost:8080",
"profile": "./browser-data",
"native": true
}
Example: Form submission
agent-browser open https://example.com/form
agent-browser snapshot -i
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i
Example: Authentication with saved state
agent-browser open https://app.example.com/login
agent-browser snapshot -i
agent-browser fill @e1 "username"
agent-browser fill @e2 "password"
agent-browser click @e3
agent-browser wait --url "**/dashboard"
agent-browser state save auth.json
agent-browser state load auth.json
agent-browser open https://app.example.com/dashboard
Header-based Auth (Skip login flows)
agent-browser open api.example.com --headers '{"Authorization": "Bearer <token>"}'
agent-browser open other-site.com
agent-browser set headers '{"X-Custom-Header": "value"}'
Authentication Vault
echo "pass" | agent-browser auth save github --url https://github.com/login --username user --password-stdin
agent-browser auth login github
Sessions & Persistent Profiles
Sessions (parallel browsers)
agent-browser --session test1 open site-a.com
agent-browser --session test2 open site-b.com
agent-browser session list
Session persistence (auto-save/restore)
agent-browser --session-name twitter open twitter.com
Persistent Profiles
Persists cookies, localStorage, IndexedDB, service workers, cache, login sessions across browser restarts.
agent-browser --profile ~/.myapp-profile open myapp.com
AGENT_BROWSER_PROFILE=~/.myapp-profile agent-browser open myapp.com
JSON output (for parsing)
Add --json for machine-readable output:
agent-browser snapshot -i --json
agent-browser get text @e1 --json
Local files
agent-browser --allow-file-access open file:///path/to/document.pdf
agent-browser --allow-file-access open file:///path/to/page.html
CDP Mode
agent-browser connect 9222
agent-browser --cdp 9222 snapshot
agent-browser --cdp "wss://browser-service.com/cdp?token=..." snapshot
agent-browser --auto-connect snapshot
Cloud providers
BROWSERBASE_API_KEY="key" BROWSERBASE_PROJECT_ID="id" agent-browser -p browserbase open example.com
BROWSER_USE_API_KEY="key" agent-browser -p browseruse open example.com
KERNEL_API_KEY="key" agent-browser -p kernel open example.com
iOS Simulator
agent-browser device list
agent-browser -p ios --device "iPhone 16 Pro" open example.com
agent-browser -p ios snapshot -i
agent-browser -p ios tap @e1
agent-browser -p ios swipe up
agent-browser -p ios close
Native Mode (Experimental)
Pure Rust daemon using direct CDP — no Node.js/Playwright required:
agent-browser --native open example.com
Install: bun add -g agent-browser && agent-browser install. Run agent-browser --help for all commands. Repo: https://github.com/vercel-labs/agent-browser