| name | agent-browser |
| description | Browser automation for AI agents (Linux/macOS/Windows). Use when the user needs to interact with websites, navigate pages, fill forms, click buttons, take screenshots, extract data, test web apps, or automate browser tasks. Triggers include "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data", "test this web app", "login to a site", or any task requiring programmatic web interaction. |
| allowed-tools | ["Bash"] |
| writes-to | .artifacts/browser/ |
| hard-guards | ["Always re-snapshot after navigation or DOM changes","Close browser session when done","Use content boundaries for untrusted pages"] |
Browser Automation with agent-browser
The agent-browser CLI automates Chrome/Chromium via CDP (Chrome DevTools Protocol). Install via npm i -g agent-browser, brew install agent-browser, or cargo install agent-browser.
Installation
Linux (most common)
npm install -g agent-browser
cargo install agent-browser
agent-browser install
macOS
npm install -g agent-browser
brew install agent-browser
agent-browser install
Windows (WSL2 or native)
npm install -g agent-browser
agent-browser install
Verify Installation
agent-browser --version
agent-browser install
Dependencies (Linux)
If Chromium fails to launch, you may need system libraries:
sudo apt install libnss3 libnspr4 libatk1.0-0 libatk-bridge2.0-0 libcups2 libdrm2 libxkbcommon0 libxcomposite1 libxdamage1 libxfixes3 libxrandr2 libgbm1 libasound2
sudo dnf install nss nspr cups-libs libXcomposite libXdamage libXrandr libXScrnSaver alsa-lib
Quick Start
agent-browser open https://example.com
agent-browser snapshot -i
agent-browser click @e1
agent-browser fill @e2 "text"
agent-browser screenshot output.png
agent-browser close
Core Workflow
Every browser automation follows this pattern:
- Navigate:
agent-browser open <url>
- Snapshot:
agent-browser snapshot -i (get element refs like @e1, @e2)
- Interact: Use refs to click, fill, select
- Re-snapshot: After navigation or DOM changes, get fresh refs
agent-browser open https://example.com/form
agent-browser snapshot -i
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i
Command Chaining
Chain commands with && for efficiency when you don't need intermediate output:
agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser screenshot page.png
agent-browser fill @e1 "text" && agent-browser click @e2
Essential Commands
Navigation
agent-browser open <url>
agent-browser close
agent-browser back
agent-browser forward
Snapshot
agent-browser snapshot -i
agent-browser snapshot -s "#selector"
agent-browser snapshot --json
agent-browser snapshot --json > out.json
Interaction
agent-browser click @e1
agent-browser click @e1 --new-tab
agent-browser fill @e2 "text"
agent-browser type @e2 "text"
agent-browser select @e3 "option"
agent-browser check @e4
agent-browser press Enter
agent-browser keyboard type "text"
agent-browser scroll down 500
Get Information
agent-browser get text @e1
agent-browser get url
agent-browser get title
agent-browser get text body > page.txt
Wait
agent-browser wait @e1
agent-browser wait --load networkidle
agent-browser wait --url "**/page"
agent-browser wait 2000
agent-browser wait --text "Welcome"
Capture
agent-browser screenshot
agent-browser screenshot page.png
agent-browser screenshot --full
agent-browser screenshot --annotate
agent-browser pdf output.pdf
Network
agent-browser network requests
agent-browser network requests --type xhr,fetch
agent-browser network route "**/api/*" --abort
agent-browser network har start
agent-browser network har stop ./capture.har
Device & Viewport
agent-browser set viewport 1920 1080
agent-browser set viewport 1920 1080 2
agent-browser set device "iPhone 14"
agent-browser set media dark
State Persistence
agent-browser state save ./auth.json
agent-browser state load ./auth.json
agent-browser --session myapp open ...
Authentication Patterns
Option 1: Import from Running Browser
agent-browser --auto-connect state save ./auth.json
agent-browser --state ./auth.json open https://app.example.com
Option 2: Auth Vault (Recommended)
echo "$PASSWORD" | agent-browser auth save myapp --url https://app.example.com --username user --password-stdin
agent-browser auth login myapp
agent-browser auth list
Option 3: Session Persistence
agent-browser --session-name myapp open https://app.example.com/login
agent-browser close
agent-browser --session-name myapp open https://app.example.com/dashboard
Common Patterns
Form Submission
agent-browser open https://example.com/signup
agent-browser snapshot -i
agent-browser fill @e1 "Jane Doe"
agent-browser fill @e2 "jane@example.com"
agent-browser select @e3 "California"
agent-browser check @e4
agent-browser click @e5
agent-browser wait --load networkidle
Data Extraction
agent-browser open https://example.com/products
agent-browser snapshot -i
agent-browser get text @e5
agent-browser get text body > page.txt
agent-browser snapshot --json > data.json
Visual Verification (Diff)
agent-browser snapshot -i
agent-browser click @e2
agent-browser diff snapshot
agent-browser diff url https://staging.example.com https://prod.example.com --screenshot
Parallel Sessions
agent-browser --session site1 open https://site-a.com
agent-browser --session site2 open https://site-b.com
agent-browser --session site1 snapshot -i
agent-browser --session site2 snapshot -i
Connect to Existing Chrome
agent-browser --auto-connect open https://example.com
agent-browser --cdp 9222 snapshot
Security
Content Boundaries (Recommended)
Wrap page output in markers to distinguish from tool output:
export AGENT_BROWSER_CONTENT_BOUNDARIES=1
agent-browser snapshot
Domain Allowlist
export AGENT_BROWSER_ALLOWED_DOMAINS="example.com,*.example.com"
Output Limits
export AGENT_BROWSER_MAX_OUTPUT=50000
Batch Execution
Run multiple commands efficiently:
echo '[
["open", "https://example.com"],
["snapshot", "-i"],
["click", "@e1"],
["screenshot", "result.png"]
]' | agent-browser batch --json
Advanced: JavaScript Evaluation
agent-browser eval 'document.title'
agent-browser eval 'document.querySelectorAll("img").length'
agent-browser eval --stdin <<'EVALEOF'
JSON.stringify(Array.from(document.querySelectorAll("a")).map(a => a.href))
EVALEOF
iOS Simulator (macOS)
agent-browser device list
agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
agent-browser -p ios snapshot -i
agent-browser -p ios tap @e1
agent-browser -p ios screenshot mobile.png
Requirements: macOS with Xcode and Appium (npm install -g appium && appium driver install xcuitest)
Important: Ref Lifecycle
Refs (@e1, @e2) are invalidated when the page changes. Always re-snapshot after:
- Clicking links or buttons that navigate
- Form submissions
- Dynamic content loading
agent-browser click @e5
agent-browser snapshot -i
agent-browser click @e1
Session Cleanup
Always close when done:
agent-browser close
For ephemeral environments, auto-shutdown after inactivity:
AGENT_BROWSER_IDLE_TIMEOUT_MS=60000 agent-browser open example.com
Reference Commands
agent-browser console
agent-browser errors
agent-browser inspect
agent-browser record start demo.webm
agent-browser record stop
Output Artifacts
Save screenshots and snapshots to .artifacts/browser/ for organized output:
mkdir -p .artifacts/browser/screenshots
agent-browser screenshot .artifacts/browser/screenshots/page.png
agent-browser snapshot --json > .artifacts/browser/snapshot.json