| name | agent-browser |
| description | Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages. |
| allowed-tools | Bash(agent-browser:*) |
| metadata | {"version":"1.0.0","tags":"browser, automation, testing"} |
Browser Automation with agent-browser
Contract
Inputs:
- Target URL or existing browser context
- Browser task: inspect, click, fill, screenshot, test, record, or extract
- Optional selectors, element refs, viewport, credentials, or file paths
Outputs:
- Page state, extracted values, screenshots, PDFs, recordings, or test observations
- List of actions performed
- Blockers such as missing elements, auth walls, or network failures
Creates/Modifies:
- Local screenshot, PDF, or video files only when a path is provided
- Browser cookies, local storage, form fields, and page state during interaction
External Side Effects:
- Navigates websites and may submit forms or trigger app actions
- May upload files or change data in the target application if directed
Confirmation Required:
- Before submitting forms that create, update, purchase, publish, send, or delete data
- Before entering secrets or credentials into non-local sites
- Before interacting with production admin or billing surfaces
Delegates To:
critique for design review
audit for technical quality checks
qa-reviewer for final verification of generated work
Quick start
agent-browser open <url>
agent-browser snapshot -i
agent-browser click @e1
agent-browser fill @e2 "text"
agent-browser close
Core workflow
- Navigate:
agent-browser open <url>
- Snapshot:
agent-browser snapshot -i (returns elements with refs like @e1, @e2)
- Interact using refs from the snapshot
- Re-snapshot after navigation or significant DOM changes
Commands
Navigation
agent-browser open <url>
agent-browser back
agent-browser forward
agent-browser reload
agent-browser close
Snapshot (page analysis)
agent-browser snapshot
agent-browser snapshot -i
agent-browser snapshot -c
agent-browser snapshot -d 3
agent-browser snapshot -s "#main"
Interactions (use @refs from snapshot)
agent-browser click @e1
agent-browser dblclick @e1
agent-browser focus @e1
agent-browser fill @e2 "text"
agent-browser type @e2 "text"
agent-browser press Enter
agent-browser press Control+a
agent-browser keydown Shift
agent-browser keyup Shift
agent-browser hover @e1
agent-browser check @e1
agent-browser uncheck @e1
agent-browser select @e1 "value"
agent-browser scroll down 500
agent-browser scrollintoview @e1
agent-browser drag @e1 @e2
agent-browser upload @e1 file.pdf
Get information
agent-browser get text @e1
agent-browser get html @e1
agent-browser get value @e1
agent-browser get attr @e1 href
agent-browser get title
agent-browser get url
agent-browser get count ".item"
agent-browser get box @e1
Check state
agent-browser is visible @e1
agent-browser is enabled @e1
agent-browser is checked @e1
Screenshots & PDF
agent-browser screenshot
agent-browser screenshot path.png
agent-browser screenshot --full
agent-browser pdf output.pdf
Video recording
agent-browser record start ./demo.webm
agent-browser click @e1
agent-browser record stop
agent-browser record restart ./take2.webm
Wait
agent-browser wait @e1
agent-browser wait 2000
agent-browser wait --text "Success"
agent-browser wait --url "**/dashboard"
agent-browser wait --load networkidle
agent-browser wait --fn "window.ready"
Mouse control
agent-browser mouse move 100 200
agent-browser mouse down left
agent-browser mouse up left
agent-browser mouse wheel 100
Semantic locators (alternative to refs)
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find first ".item" click
agent-browser find nth 2 "a" text
Browser settings
agent-browser set viewport 1920 1080
agent-browser set device "iPhone 14"
agent-browser set geo 37.7749 -122.4194
agent-browser set offline on
agent-browser set headers '{"X-Key":"v"}'
agent-browser set credentials user pass
agent-browser set media dark
Cookies & Storage
agent-browser cookies
agent-browser cookies set name value
agent-browser cookies clear
agent-browser storage local
agent-browser storage local key
agent-browser storage local set k v
agent-browser storage local clear
Network
agent-browser network route <url>
agent-browser network route <url> --abort
agent-browser network route <url> --body '{}'
agent-browser network unroute [url]
agent-browser network requests
agent-browser network requests --filter api
Tabs & Windows
agent-browser tab
agent-browser tab new [url]
agent-browser tab 2
agent-browser tab close
agent-browser window new
Frames
agent-browser frame "#iframe"
agent-browser frame main
Dialogs
agent-browser dialog accept [text]
agent-browser dialog dismiss
JavaScript
agent-browser eval "document.title"
Sessions (parallel browsers)
agent-browser --session test1 open site-a.com
agent-browser --session test2 open site-b.com
agent-browser session list
JSON output (for parsing)
Add --json for machine-readable output:
agent-browser snapshot -i --json
agent-browser get text @e1 --json
Debugging
agent-browser open example.com --headed
agent-browser console
agent-browser console --clear
agent-browser errors
agent-browser errors --clear
agent-browser highlight @e1
agent-browser trace start
agent-browser trace stop trace.zip
agent-browser --cdp 9222 snapshot
QA Testing Examples
Form Validation Testing
agent-browser open https://app.example.com/signup
agent-browser snapshot -i
agent-browser click @e5
agent-browser snapshot -i
agent-browser fill @e1 "invalid-email"
agent-browser click @e5
agent-browser get text @e1
agent-browser fill @e1 "valid@email.com"
agent-browser fill @e2 "ValidPass123!"
agent-browser click @e5
agent-browser wait --url "**/dashboard"
Visual Regression Testing
agent-browser open https://app.example.com
agent-browser set viewport 1920 1080
agent-browser screenshot desktop.png --full
agent-browser set device "iPhone 14"
agent-browser reload
agent-browser screenshot mobile.png --full
Authentication Flow Testing
agent-browser open https://app.example.com/login
agent-browser snapshot -i
agent-browser fill @e1 "testuser"
agent-browser fill @e2 "testpass"
agent-browser click @e3
agent-browser wait --url "**/dashboard"
agent-browser state save auth.json
agent-browser state load auth.json
agent-browser open https://app.example.com/protected
Multi-Step Checkout Testing
agent-browser open https://shop.example.com/cart
agent-browser snapshot -i
agent-browser click @e3
agent-browser wait @e1
agent-browser snapshot -i
agent-browser fill @e1 "123 Test St"
agent-browser fill @e2 "Test City"
agent-browser select @e3 "CA"
agent-browser fill @e4 "90210"
agent-browser click @e5
agent-browser wait --text "Payment"
agent-browser snapshot -i
agent-browser fill @e1 "4111111111111111"
agent-browser fill @e2 "12/28"
agent-browser fill @e3 "123"
agent-browser click @e4
agent-browser wait --text "Order confirmed"
agent-browser screenshot order-confirmation.png
API Response Mocking
agent-browser network route "**/api/data" --body '{"items":[]}'
agent-browser open https://app.example.com
agent-browser snapshot -i
agent-browser network route "**/analytics/**" --abort
See references/commands.md for full command documentation.