| name | agent-browser |
| description | CLI-based browser automation with persistent page state using ref-based element interaction. Use when users ask to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages. |
Agent Browser — CLI Browser Automation
Command-line browser automation tool with persistent page state and ref-based element interaction.
Quick Start
agent-browser open <url>
agent-browser snapshot -i
agent-browser click @e1
agent-browser fill @e2 "text"
agent-browser close
Core Workflow
- Navigate:
agent-browser open <url>
- Snapshot:
agent-browser snapshot -i (returns elements with refs like @e1, @e2)
- Interact using refs from the snapshot
- Re-snapshot after navigation or significant DOM changes
Commands
Navigation
agent-browser open <url>
agent-browser back
agent-browser forward
agent-browser reload
agent-browser close
Snapshot (Page Analysis)
agent-browser snapshot
agent-browser snapshot -i
agent-browser snapshot -c
agent-browser snapshot -d 3
agent-browser snapshot -s "#main"
Interactions (Use @refs from Snapshot)
agent-browser click @e1
agent-browser dblclick @e1
agent-browser focus @e1
agent-browser fill @e2 "text"
agent-browser type @e2 "text"
agent-browser press Enter
agent-browser press Control+a
agent-browser keydown Shift
agent-browser keyup Shift
agent-browser hover @e1
agent-browser check @e1
agent-browser uncheck @e1
agent-browser select @e1 "value"
agent-browser scroll down 500
agent-browser scrollintoview @e1
agent-browser drag @e1 @e2
agent-browser upload @e1 file.pdf
Get Information
agent-browser get text @e1
agent-browser get html @e1
agent-browser get value @e1
agent-browser get attr @e1 href
agent-browser get title
agent-browser get url
agent-browser get count ".item"
agent-browser get box @e1
Check State
agent-browser is visible @e1
agent-browser is enabled @e1
agent-browser is checked @e1
Screenshots & PDF
agent-browser screenshot
agent-browser screenshot path.png
agent-browser screenshot --full
agent-browser pdf output.pdf
Video Recording
agent-browser record start ./demo.webm
agent-browser click @e1
agent-browser record stop
agent-browser record restart ./take2.webm
Wait
agent-browser wait @e1
agent-browser wait 2000
agent-browser wait --text "Success"
agent-browser wait --url "**/dashboard"
agent-browser wait --load networkidle
agent-browser wait --fn "window.ready"
Mouse Control
agent-browser mouse move 100 200
agent-browser mouse down left
agent-browser mouse up left
agent-browser mouse wheel 100
Semantic Locators (Alternative to Refs)
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find first ".item" click
agent-browser find nth 2 "a" text
Browser Settings
agent-browser set viewport 1920 1080
agent-browser set device "iPhone 14"
agent-browser set geo 37.7749 -122.4194
agent-browser set offline on
agent-browser set headers '{"X-Key":"v"}'
agent-browser set credentials user pass
agent-browser set media dark
Cookies & Storage
agent-browser cookies
agent-browser cookies set name value
agent-browser cookies clear
agent-browser storage local
agent-browser storage local key
agent-browser storage local set k v
agent-browser storage local clear
agent-browser storage session
Network
agent-browser network route <url>
agent-browser network route <url> --abort
agent-browser network route <url> --body '{}'
agent-browser network unroute [url]
agent-browser network requests
agent-browser network requests --filter api
Tabs & Windows
agent-browser tab
agent-browser tab new [url]
agent-browser tab 2
agent-browser tab close
agent-browser window new
Frames & Dialogs
agent-browser frame "#iframe"
agent-browser frame main
agent-browser dialog accept [text]
agent-browser dialog dismiss
JavaScript
agent-browser eval "document.title"
Global Options
| Option | Description |
|---|
--session <name> | Isolated browser session |
--profile <path> | Persistent browser profile |
--headers <json> | HTTP headers scoped to URL's origin |
--json | Machine-readable JSON output |
--headed | Show browser window (not headless) |
--debug | Debug output |
Example: Form Submission
agent-browser open https://example.com/form
agent-browser snapshot -i
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i
Example: Authentication with Saved State
agent-browser open https://app.example.com/login
agent-browser snapshot -i
agent-browser fill @e1 "username"
agent-browser fill @e2 "password"
agent-browser click @e3
agent-browser wait --url "**/dashboard"
agent-browser state save auth.json
agent-browser state load auth.json
agent-browser open https://app.example.com/dashboard
Sessions & Persistent Profiles
Sessions (Parallel Browsers)
agent-browser --session test1 open site-a.com
agent-browser --session test2 open site-b.com
agent-browser session list
Persistent Profiles
Persists cookies, localStorage, IndexedDB, service workers across browser restarts.
agent-browser --profile ~/.myapp-profile open myapp.com
Debugging
agent-browser open example.com --headed
agent-browser console
agent-browser errors
agent-browser record start ./debug.webm
agent-browser record stop
agent-browser highlight @e1
agent-browser trace start
agent-browser trace stop trace.zip
Installation
npm install -g agent-browser
npx playwright install chromium
agent-browser open https://example.com --headed
Run agent-browser --help for all commands. Repo: https://github.com/vercel-labs/agent-browser
Inspired by: oh-my-opencode agent-browser skill