| name | agent-browser |
| description | Browser automation CLI for AI agents — navigate pages, fill forms, click, screenshot, scrape data, and test web apps. Use whenever the user wants to drive a website programmatically, automate any browser task, log into a site, or run a browser in the cloud (Kernel, Browserbase, Browserless). Also covers human-in-the-loop "live view" takeover, where the agent drives and a person steps in from their phone for logins, CAPTCHAs, or MFA, then the agent resumes. |
| allowed-tools | Bash(agent-browser:*) |
Browser Automation with agent-browser
Core Workflow
Authentication Policy:
- User-Centric Automation: When performing tasks using the user's actual accounts (e.g., checking email, managing a personal task list), ALWAYS use the profile name
default (e.g., --session default) to ensure persistent session state across interactions.
- Application Testing: When testing a specific application or feature (e.g., verifying a signup flow on a dev server), use a dedicated session name related to that app (e.g.,
--session test-myapp) to avoid polluting the default user profile.
Every browser automation follows this pattern:
- Navigate:
agent-browser open <url>
- Snapshot:
agent-browser snapshot -i (get element refs like @e1, @e2)
- Interact: Use refs to click, fill, select
- Re-snapshot: After navigation or DOM changes, get fresh refs
agent-browser open https://example.com/form
agent-browser snapshot -i
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i
Essential Commands
agent-browser open <url>
agent-browser close
agent-browser snapshot -i
agent-browser snapshot -i -C
agent-browser snapshot -s "#selector"
agent-browser click @e1
agent-browser fill @e2 "text"
agent-browser type @e2 "text"
agent-browser select @e1 "option"
agent-browser check @e1
agent-browser press Enter
agent-browser scroll down 500
agent-browser get text @e1
agent-browser get url
agent-browser get title
agent-browser wait @e1
agent-browser wait --load networkidle
agent-browser wait --url "**/page"
agent-browser wait 2000
agent-browser screenshot
agent-browser screenshot --full
agent-browser pdf output.pdf
Common Patterns
Form Submission
agent-browser open https://example.com/signup
agent-browser snapshot -i
agent-browser fill @e1 "Jane Doe"
agent-browser fill @e2 "jane@example.com"
agent-browser select @e3 "California"
agent-browser check @e4
agent-browser click @e5
agent-browser wait --load networkidle
Authentication with State Persistence
agent-browser open https://app.example.com/login
agent-browser snapshot -i
agent-browser fill @e1 "$USERNAME"
agent-browser fill @e2 "$PASSWORD"
agent-browser click @e3
agent-browser wait --url "**/dashboard"
agent-browser state save auth.json
agent-browser state load auth.json
agent-browser open https://app.example.com/dashboard
Session Persistence
agent-browser --session-name myapp open https://app.example.com/login
agent-browser close
agent-browser --session-name myapp open https://app.example.com/dashboard
export AGENT_BROWSER_ENCRYPTION_KEY=$(openssl rand -hex 32)
agent-browser --session-name secure open https://app.example.com
agent-browser state list
agent-browser state show myapp-default.json
agent-browser state clear myapp
agent-browser state clean --older-than 7
Data Extraction
agent-browser open https://example.com/products
agent-browser snapshot -i
agent-browser get text @e5
agent-browser get text body > page.txt
agent-browser snapshot -i --json
agent-browser get text @e1 --json
Parallel Sessions
agent-browser --session site1 open https://site-a.com
agent-browser --session site2 open https://site-b.com
agent-browser --session site1 snapshot -i
agent-browser --session site2 snapshot -i
agent-browser session list
Connect to Existing Chrome
agent-browser --auto-connect open https://example.com
agent-browser --auto-connect snapshot
agent-browser --cdp 9222 snapshot
Visual Browser (Debugging)
agent-browser --headed open https://example.com
agent-browser highlight @e1
agent-browser record start demo.webm
Local Files (PDFs, HTML)
agent-browser --allow-file-access open file:///path/to/document.pdf
agent-browser --allow-file-access open file:///path/to/page.html
agent-browser screenshot output.png
iOS Simulator (Mobile Safari)
agent-browser device list
agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
agent-browser -p ios snapshot -i
agent-browser -p ios tap @e1
agent-browser -p ios fill @e2 "text"
agent-browser -p ios swipe up
agent-browser -p ios screenshot mobile.png
agent-browser -p ios close
Requirements: macOS with Xcode, Appium (npm install -g appium && appium driver install xcuitest)
Real devices: Works with physical iOS devices if pre-configured. Use --device "<UDID>" where UDID is from xcrun xctrace list devices.
Cloud Browsers & Human Takeover (Kernel, Browserbase, …)
Run the browser in the cloud instead of locally, and optionally hand control to a
human via a live-view URL (open on any device, including a phone) — the
"agent drives, human steps in for logins/CAPTCHAs/MFA, agent resumes" pattern.
Select a provider with -p <name> or AGENT_BROWSER_PROVIDER
(kernel, browserbase, browserless, browseruse, agentcore). Everything else in
this skill (snapshot -i, @refs, click/fill, screenshot, eval) works unchanged
once a provider is set.
export AGENT_BROWSER_PROVIDER=kernel KERNEL_HEADLESS=false KERNEL_TIMEOUT_SECONDS=1800
./templates/kernel-takeover.sh https://example.com
agent-browser snapshot -i && agent-browser click @e1
Critical gotcha: agent-browser launches Kernel headless by default, and headless
sessions have no live view. For human takeover, set KERNEL_HEADLESS=false before
the first open (you can't flip an existing session). Full setup, the SDK-free way to
build the takeover URL, persistent-login profiles, and cleanup (cloud browsers keep
billing after close --all until you delete them) are in
references/cloud-providers.md.
Ref Lifecycle (Important)
Refs (@e1, @e2, etc.) are invalidated when the page changes. Always re-snapshot after:
- Clicking links or buttons that navigate
- Form submissions
- Dynamic content loading (dropdowns, modals)
agent-browser click @e5
agent-browser snapshot -i
agent-browser click @e1
Semantic Locators (Alternative to Refs)
When refs are unavailable or unreliable, use semantic locators:
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find role button click --name "Submit"
agent-browser find placeholder "Search" type "query"
agent-browser find testid "submit-btn" click
JavaScript Evaluation (eval)
Use eval to run JavaScript in the browser context. Shell quoting can corrupt complex expressions -- use --stdin or -b to avoid issues.
agent-browser eval 'document.title'
agent-browser eval 'document.querySelectorAll("img").length'
agent-browser eval --stdin <<'EVALEOF'
JSON.stringify(
Array.from(document.querySelectorAll("img"))
.filter(i => !i.alt)
.map(i => ({ src: i.src.split("/").pop(), width: i.width }))
)
EVALEOF
agent-browser eval -b "$(echo -n 'Array.from(document.querySelectorAll("a")).map(a => a.href)' | base64)"
Why this matters: When the shell processes your command, inner double quotes, ! characters (history expansion), backticks, and $() can all corrupt the JavaScript before it reaches agent-browser. The --stdin and -b flags bypass shell interpretation entirely.
Rules of thumb:
- Single-line, no nested quotes -> regular
eval 'expression' with single quotes is fine
- Nested quotes, arrow functions, template literals, or multiline -> use
eval --stdin <<'EVALEOF'
- Programmatic/generated scripts -> use
eval -b with base64
Deep-Dive Documentation
| Reference | When to Use |
|---|
| references/commands.md | Full command reference with all options |
| references/snapshot-refs.md | Ref lifecycle, invalidation rules, troubleshooting |
| references/session-management.md | Parallel sessions, state persistence, concurrent scraping |
| references/authentication.md | Login flows, OAuth, 2FA handling, state reuse |
| references/video-recording.md | Recording workflows for debugging and documentation |
| references/proxy-support.md | Proxy configuration, geo-testing, rotating proxies |
| references/cloud-providers.md | Cloud browsers (Kernel/Browserbase), live-view human takeover, profiles, cleanup |
Ready-to-Use Templates
./templates/form-automation.sh https://example.com/form
./templates/authenticated-session.sh https://app.example.com/login
./templates/capture-workflow.sh https://example.com ./output