| name | agent-browser |
| description | Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", or any task requiring web interaction.
|
Browser Automation with agent-browser
agent-browser is a CLI tool for browser automation. It runs a persistent
browser daemon and provides commands to navigate, snapshot, and interact
with web pages.
Prerequisites
npm install -g agent-browser
Core Workflow
Every browser automation follows this pattern:
- Navigate:
agent-browser open <url>
- Snapshot:
agent-browser snapshot -i (get element refs like @e1, @e2)
- Interact: Use refs to click, fill, select
- Re-snapshot: After navigation or DOM changes, get fresh refs
agent-browser open https://example.com/form
agent-browser snapshot -i
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i
Essential Commands
agent-browser open <url>
agent-browser close
agent-browser snapshot -i
agent-browser snapshot -i -C
agent-browser click @e1
agent-browser fill @e2 "text"
agent-browser type @e2 "text"
agent-browser select @e1 "option"
agent-browser check @e1
agent-browser press Enter
agent-browser scroll down 500
agent-browser get text @e1
agent-browser get url
agent-browser get title
agent-browser wait @e1
agent-browser wait --load networkidle
agent-browser wait --url "**/page"
agent-browser wait 2000
agent-browser screenshot
agent-browser screenshot --full
agent-browser pdf output.pdf
Common Patterns
Form Submission
agent-browser open https://example.com/signup
agent-browser snapshot -i
agent-browser fill @e1 "Jane Doe"
agent-browser fill @e2 "jane@example.com"
agent-browser select @e3 "California"
agent-browser check @e4
agent-browser click @e5
agent-browser wait --load networkidle
Data Extraction
agent-browser open https://example.com/products
agent-browser snapshot -i
agent-browser get text @e5
agent-browser get text body > page.txt
agent-browser snapshot -i --json
Authentication with State Persistence
agent-browser open https://app.example.com/login
agent-browser snapshot -i
agent-browser fill @e1 "$USERNAME"
agent-browser fill @e2 "$PASSWORD"
agent-browser click @e3
agent-browser wait --url "**/dashboard"
agent-browser state save auth.json
agent-browser state load auth.json
agent-browser open https://app.example.com/dashboard
Command Chaining
Commands can be chained when you don't need intermediate output:
agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
Run commands separately when you need to read output first (e.g., snapshot
to discover refs, then interact using those refs).
Ref Lifecycle (Important)
Refs (@e1, @e2, etc.) are invalidated when the page changes. Always
re-snapshot after:
- Clicking links or buttons that navigate
- Form submissions
- Dynamic content loading (dropdowns, modals)
agent-browser click @e5
agent-browser snapshot -i
agent-browser click @e1
Session Management
agent-browser --session site1 open https://site-a.com
agent-browser --session site2 open https://site-b.com
agent-browser close
Security
All security features are opt-in:
export AGENT_BROWSER_ALLOWED_DOMAINS="example.com,*.example.com"
export AGENT_BROWSER_CONTENT_BOUNDARIES=1
export AGENT_BROWSER_MAX_OUTPUT=50000