| name | agent-browser |
| description | Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", "ブラウザで確認", "Chromeで開いて", "スクリーンショット", verifying UI, or any task requiring programmatic web interaction. Also use in plan mode when planning browser-based verification to apply correct data extraction strategies and avoid common anti-patterns. |
| allowed-tools | Bash(npx agent-browser:*), Bash(agent-browser:*) |
Browser Automation with agent-browser
The CLI uses Chrome/Chromium via CDP directly. Install via npm i -g agent-browser, brew install agent-browser, or cargo install agent-browser. Run agent-browser install to download Chrome. Existing Chrome, Brave, Playwright, and Puppeteer installations are detected automatically. Run agent-browser upgrade to update to the latest version.
Default Flags
This environment runs agent-browser in state-import + headless mode by default. Pass two flags on every invocation:
--session "claude-$PPID" — use a daemon isolated to this Claude session. $PPID in a Bash subshell points at the Claude main process, so all calls within one Claude session (main + any subagents) share the same daemon. Different Claude sessions (separate terminals) get different $PPID values and therefore separate daemons, so they cannot interfere with each other.
--state "$HOME/.agent-browser-state/main.json" — load the plaintext state file produced by ab-state-refresh. Required on the first call of the session (whichever subcommand starts the daemon). Subsequent calls within the same session can omit it because the daemon already has the state loaded.
agent-browser --session "claude-$PPID" --state "$HOME/.agent-browser-state/main.json" tab new <url>
agent-browser --session "claude-$PPID" snapshot -i
agent-browser --session "claude-$PPID" click @e1
agent-browser --session "claude-$PPID" record start <path>
agent-browser --session "claude-$PPID" screenshot <path>
The browser is independent and headless; the user's live Chrome window is never touched. No --auto-connect flag is needed in normal commands.
Why these flags
agent-browser CLI 0.26 hardcodes an env var name for state auto-load that, if exported, makes the daemon navigate to origins[0] on the first command and reject any subsequent --state flag with ⚠ --state ignored: daemon already running. To stay clear of that behavior, never export AGENT_BROWSER_STATE in this environment, and pass --state explicitly on the call that may start the daemon.
The host's AGENT_BROWSER_STATE_PATH env var is intentionally not referenced from these flags. Past iterations relied on it, but it does not propagate reliably from the user's zsh through Claude's shell-snapshot mechanism into Bash subshells. Using a literal path eliminates that uncertainty.
Cleanup
Daemon idle timeout is disabled by default (AGENT_BROWSER_IDLE_TIMEOUT_MS), so each Claude session's daemon stays up until you close it. Close at the end of the workflow:
agent-browser --session "claude-$PPID" close
agent-browser doctor --fix
For richer session management options (named sessions for parallel scraping, state persistence patterns) see references/session-management.md.
Initial setup (one-time)
Before the first agent-browser call, the user must populate the state file by running the host's ab-state-refresh zsh function. This:
- Creates
~/.agent-browser-state/ with mode 700.
- Connects to the user's running Chrome via the CDP WebSocket discovered from
DevToolsActivePort (one-shot) and saves cookies + localStorage to the state file with mode 600.
Important: agent-browser state save only captures localStorage for the focused tab plus its iframes (single Page Target constraint of CDP attach). Cookies are browser-context-wide, but localStorage is origin-scoped. Focus the target app's tab before running ab-state-refresh, pass URLs as arguments — ab-state-refresh URL1 URL2 ... — or pick interactively from open tabs with ab-state-refresh -i when multiple origins are needed (see Step 1 of authentication.md).
If agent-browser --session "claude-$PPID" --state "$HOME/.agent-browser-state/main.json" <cmd> fails with No such file or directory: .../main.json, the state file has not been created yet. Stop and tell the user: Run \ab-state-refresh` to import auth state from your Chrome.`
Sharing one daemon across main + subagents
Within a single Claude session, the main agent and any subagents (Agent tool / Task tool) all see the same $PPID (the Claude main process), so they share one claude-$PPID daemon. This is intended for qa-planner Mode C: Lead and QA Tester both interact with the same authenticated browser. The implication is that neither side should close until the workflow is complete — close tears down the daemon for everyone.
Tab-Safe Navigation
agent-browser open <url> navigates the active tab of the headless instance — but since the headless instance is not the user's live Chrome, "active tab" simply means the daemon's current tab. To keep multiple URLs in the same daemon session without overwriting:
agent-browser tab new <url>
agent-browser tab new <url> --label work
Tabs use stable string ids (t1, t2, ...). Pass --label <name> at creation time to reference the tab by a memorable name.
Escape hatch: --auto-connect
--auto-connect (CDP attach to the user's live Chrome) is retained for the rare cases where state-import + headless cannot reproduce the target behavior:
- Real-time interaction with the user's logged-in browser (e.g., observing in-flight OAuth popups they trigger manually).
- Sites whose state lives in IndexedDB / Service Worker / Web SQL that the cookie+localStorage state file does not capture.
- The
state save operation itself (used internally by ab-state-refresh).
When using --auto-connect, expect to share the user's window — coordinate with them to avoid collisions. Prefer state-import for everything else.
Core Workflow
Every browser automation follows this pattern:
- Navigate:
agent-browser tab new <url> (always opens in a new tab to avoid overwriting existing tabs)
- Wait (if needed):
agent-browser wait 2000 or agent-browser wait <selector> for SPAs. Note: open already waits for the load event; additional waiting is only needed for async content
- Snapshot:
agent-browser snapshot -i (get element refs like @e1, @e2)
- Interact: Use refs to click, fill, select
- Re-snapshot: After navigation or DOM changes, get fresh refs
agent-browser tab new https://example.com/form
agent-browser wait 2000
agent-browser snapshot -i
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait 2000
agent-browser snapshot -i
Command Chaining
Prefer batch over && chaining for 2+ sequential commands. See Batch Execution section below. && chaining still works but batch is more efficient.
Commands can be chained with && in a single shell invocation. The browser persists between commands via a background daemon, so chaining is safe and more efficient than separate calls.
agent-browser open https://example.com && agent-browser snapshot -i
agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "password123" && agent-browser click @e3
agent-browser open https://example.com && agent-browser screenshot
When to chain: Use && when you don't need to read the output of an intermediate command before proceeding (e.g., open + screenshot). Run commands separately when you need to parse the output first (e.g., snapshot to discover refs, then interact using those refs).
Batch Execution
ALWAYS use batch when running 2+ commands in sequence. Batch executes commands in order, so dependent commands (like navigate then screenshot) work correctly. Each quoted argument is a separate command.
agent-browser batch "open https://example.com" "snapshot -i"
agent-browser batch "open https://example.com" "snapshot -i" "screenshot"
agent-browser batch "click @e1" "wait 1000" "screenshot"
agent-browser batch --bail "open https://example.com" "click @e1" "screenshot"
Only use a single command (not batch) when you need to read the output before deciding the next command. For example, you must run snapshot -i as a single command when you need to read the refs to decide what to click. After reading the snapshot, batch the remaining steps.
Stdin mode is also supported for programmatic use:
echo '[["open","https://example.com"],["screenshot"]]' | agent-browser batch --json
agent-browser batch --bail < commands.json
Efficiency Strategies
These patterns minimize tool calls and token usage.
Use --urls to avoid re-navigation. When you need to visit links from a page, use snapshot -i --urls to get all href URLs upfront. Then open each URL directly instead of clicking refs and navigating back.
Snapshot once, act many times. Never re-snapshot the same page. Extract all needed info (refs, URLs, text) from a single snapshot, then batch the remaining actions.
Multi-page workflow (e.g. "visit N sites and screenshot each"):
agent-browser batch "open https://news.ycombinator.com" "snapshot -i --urls"
agent-browser batch "open https://github.com/example/repo" "screenshot"
agent-browser batch "open https://example.com/article" "screenshot"
agent-browser batch "open https://other.com/page" "screenshot"
This approach uses 4 tool calls instead of 14+. Never go back to the listing page between visits.
Handling Authentication
This environment uses state-import as the default authentication strategy (see Default Flags). The host's ab-state-refresh function captures the user's Chrome auth state once into a plaintext JSON file at ~/.agent-browser-state/main.json (mode 600), and each command loads it via --state "$HOME/.agent-browser-state/main.json" on the first call of the Claude session.
Other authentication options remain available for special cases:
Option 1: Persistent Chrome profile (when state file isn't enough)
If a target site depends on IndexedDB / Service Worker (which the state file does not capture), use a persistent profile:
agent-browser --profile ~/.myapp open https://app.example.com/login
agent-browser --profile ~/.myapp open https://app.example.com/dashboard
Option 2: Chrome profile reuse (zero setup)
agent-browser profiles
agent-browser --profile Default open https://gmail.com
Option 3: Persistent profile (for recurring tasks)
agent-browser --profile ~/.myapp open https://app.example.com/login
agent-browser --profile ~/.myapp open https://app.example.com/dashboard
Option 4: Session name (auto-save/restore cookies + localStorage)
agent-browser --session-name myapp open https://app.example.com/login
agent-browser close
agent-browser --session-name myapp open https://app.example.com/dashboard
Option 5: Auth vault (credentials stored encrypted, login by name)
echo "$PASSWORD" | agent-browser auth save myapp --url https://app.example.com/login --username user --password-stdin
agent-browser auth login myapp
auth login navigates with load and then waits for login form selectors to appear before filling/clicking, which is more reliable on delayed SPA login screens.
Option 6: State file (manual save/load)
agent-browser state save ./auth.json
agent-browser state load ./auth.json
agent-browser open https://app.example.com/dashboard
See references/authentication.md for OAuth, 2FA, cookie-based auth, and token refresh patterns.
Essential Commands
agent-browser batch "open https://example.com" "snapshot -i"
agent-browser batch "open https://example.com" "screenshot"
agent-browser batch "click @e1" "wait 1000" "screenshot"
agent-browser open <url>
agent-browser close
agent-browser close --all
agent-browser snapshot -i
agent-browser snapshot -i --urls
agent-browser snapshot -s "#selector"
agent-browser click @e1
agent-browser click @e1 --new-tab
agent-browser fill @e2 "text"
agent-browser type @e2 "text"
agent-browser select @e1 "option"
agent-browser check @e1
agent-browser press Enter
agent-browser keyboard type "text"
agent-browser keyboard inserttext "text"
agent-browser scroll down 500
agent-browser scroll down 500 --selector "div.content"
agent-browser get text @e1
agent-browser get url
agent-browser get title
agent-browser get cdp-url
agent-browser wait @e1
agent-browser wait 2000
agent-browser wait --url "**/page"
agent-browser wait --text "Welcome"
agent-browser wait --load networkidle
agent-browser wait --fn "!document.body.innerText.includes('Loading...')"
agent-browser wait "#spinner" --state hidden
agent-browser download @e1 ./file.pdf
agent-browser wait --download ./output.zip
agent-browser --download-path ./downloads open <url>
agent-browser tab list
agent-browser tab new
agent-browser tab new https://example.com
agent-browser tab new https://example.com --label work
agent-browser tab t2
agent-browser tab work
agent-browser tab close
agent-browser tab close t2
agent-browser network requests
agent-browser network requests --type xhr,fetch
agent-browser network requests --method POST
agent-browser network requests --status 2xx
agent-browser network request <requestId>
agent-browser network route "**/api/*" --abort
agent-browser network har start
agent-browser network har stop ./capture.har
agent-browser set viewport 1920 1080
agent-browser set viewport 1920 1080 2
agent-browser set device "iPhone 14"
agent-browser screenshot
agent-browser screenshot --full
agent-browser screenshot --annotate
agent-browser screenshot --screenshot-dir ./shots
agent-browser screenshot --screenshot-format jpeg --screenshot-quality 80
agent-browser pdf output.pdf
agent-browser stream enable
agent-browser stream enable --port 9223
agent-browser stream status
agent-browser stream disable
agent-browser clipboard read
agent-browser clipboard write "Hello, World!"
agent-browser clipboard copy
agent-browser clipboard paste
agent-browser dialog accept
agent-browser dialog accept "my input"
agent-browser dialog dismiss
agent-browser dialog status
agent-browser diff snapshot
agent-browser diff snapshot --baseline before.txt
agent-browser diff screenshot --baseline before.png
agent-browser diff url <url1> <url2>
agent-browser diff url <url1> <url2> --wait-until networkidle
agent-browser diff url <url1> <url2> --selector "#main"
agent-browser chat "open google.com and search for cats"
agent-browser chat
agent-browser -q chat "summarize this page"
agent-browser -v chat "fill in the login form"
agent-browser --model openai/gpt-4o chat "take a screenshot"
Streaming
Every session automatically starts a WebSocket stream server on an OS-assigned port. Use agent-browser stream status to see the bound port and connection state. Use stream disable to tear it down, and stream enable --port <port> to re-enable on a specific port.
Common Patterns
Site keybind conflicts (mitigation, not guarantee)
agent-browser scroll <direction> <px> sends OS keyboard events (PageDown / arrow keys). Sites that bind single-letter shortcuts to navigation — GitHub (the gd shortcut goes to the Dashboard, gh to Home, gn to Notifications), Gmail (j / k → next/prev message), Linear, Discord — interpret those events as their own shortcuts and silently navigate away mid-action. The CLI cannot tell whether the page navigated; only get url or tab list reveals the drift.
No alternative is 100% safe on these sites. Recommendations in priority order:
- Avoid scrolling. For content capture (summarization, README inspection, repository overview), use
pdf to write the entire scrollable document to a single file. No scroll, no keybind collision, one artifact.
scrollintoview @ref (alias scrollinto). Targets a snapshot ref; appears to be pure-JS but the internals are not documented in the CLI reference. Treat URL drift the same as eval scrollTo if observed.
eval window.scrollTo(x, y). Pure JS, no synthetic keyboard events — but the recording session at ~/.claude/projects/-Users-wadackel-dotfiles/70a2d283-...jsonl (line 296) observed dashboard navigation after eval scrollTo followed by a long wait. Root cause unknown. Verify URL with get url immediately after.
Sandwich verification with batch
agent-browser batch is sequential-only — it cannot branch on intermediate output (see L144 below). Place get url and tab list inside the batch as recordings, then read the batch's combined stdout afterward to decide whether the URL drifted:
SESSION="claude-$PPID"
agent-browser --session "$SESSION" batch \
"tab new https://github.com/owner/repo" \
"wait 3000" \
"get url" \
"tab list" \
"scrollintoview @e1" \
"wait 500" \
"get url" \
"tab list"
get url reports the active tab's URL; tab list reports every tab's URL with stable ids. Use both — get url catches navigation in the active tab, tab list catches cases where a new tab took focus.
Content capture: prefer pdf over scroll+screenshot
For "summarize the page" / "what's in this README" / "give me the repo overview" tasks, generate a PDF of the entire scrollable area. PDF capture:
- captures the full document, not just the viewport
- avoids site-keybind collisions (no scroll required)
- produces a single artifact you can re-read or attach
Single-call form: agent-browser --session "claude-$PPID" pdf /tmp/page.pdf writes the current tab's full document to a PDF.
Recommended sandwich (open + verify URL + capture in one batch):
SESSION="claude-$PPID"
agent-browser --session "$SESSION" batch \
"tab new https://github.com/owner/repo" \
"wait 3000" \
"get url" \
"pdf /tmp/repo-contents.pdf"
Use screenshot only when you specifically need viewport-fit framing or want to capture a particular visual UI state (modal open, hover effect, error overlay). For content-summarization tasks, pdf is strictly more reliable.
Form Submission
agent-browser batch "open https://example.com/signup" "snapshot -i"
agent-browser batch "fill @e1 \"Jane Doe\"" "fill @e2 \"jane@example.com\"" "select @e3 \"California\"" "check @e4" "click @e5" "wait 2000"
Authentication with Auth Vault (Recommended)
echo "pass" | agent-browser auth save github --url https://github.com/login --username user --password-stdin
agent-browser auth login github
agent-browser auth list
agent-browser auth show github
agent-browser auth delete github
auth login waits for username/password/submit selectors before interacting, with a timeout tied to the default action timeout.
Authentication with State Persistence
agent-browser batch "open https://app.example.com/login" "snapshot -i"
agent-browser batch "fill @e1 \"$USERNAME\"" "fill @e2 \"$PASSWORD\"" "click @e3" "wait --url **/dashboard" "state save auth.json"
agent-browser batch "state load auth.json" "open https://app.example.com/dashboard"
Session Persistence
agent-browser --session-name myapp open https://app.example.com/login
agent-browser close
agent-browser --session-name myapp open https://app.example.com/dashboard
agent-browser state list
agent-browser state show myapp-default.json
agent-browser state clear myapp
agent-browser state clean --older-than 7
Working with Iframes
Iframe content is automatically inlined in snapshots. Refs inside iframes carry frame context, so you can interact with them directly.
agent-browser batch "open https://example.com/checkout" "snapshot -i"
agent-browser batch "fill @e3 \"4111111111111111\"" "fill @e4 \"12/28\"" "click @e5"
agent-browser batch "frame @e2" "snapshot -i"
agent-browser frame main
Data Extraction
agent-browser batch "open https://example.com/products" "snapshot -i"
agent-browser get text @e5
agent-browser get text body > page.txt
agent-browser snapshot -i --json
agent-browser get text @e1 --json
Dynamic Data Derivation
Never hardcode expected values or input data for verification. Derive them from runtime state.
- If identifiable from conversation context, use that
- Otherwise, extract programmatically from DOM elements, API responses, or app state
- As a last resort, ask the user through the current agent's user-confirmation mechanism; in text-only runtimes, ask the question, end the turn, and wait for the user's next response
SPA Runtime Data Extraction
Use this decision tree to choose the most reliable extraction method:
Need runtime info from a SPA?
|
+-- Can DOM inspection answer it? (img[alt], [aria-label], text content)
| YES --> Use agent-browser snapshot -i or get text @ref (fastest, most reliable)
|
+-- Is data in globals/storage? (window.__NEXT_DATA__, localStorage, cookies)
| YES --> Use agent-browser eval to read it
|
+-- Need API response data?
| YES --> Use agent-browser network requests (requires page reload, see constraints below)
|
+-- None of the above work?
--> Explore React fiber / state store via eval (fragile, last resort only)
network requests Constraints
- Tracking starts on the first call -- requests before that are not captured
- To capture page-load requests, follow this sequence:
agent-browser network requests --clear -- start tracking
- Reload or navigate the page
agent-browser wait 2000
agent-browser network requests -- read results
- If the page is already loaded, prefer DOM inspection over network capture
MutationObserver Logging in SPAs
When collecting DOM change logs across SPA navigation:
- Store logs on
window.__ prefixed variables (e.g., window.__mo, window.__labelLog), not sessionStorage
- Next.js and similar SPAs preserve
window variables across client-side navigation
- Use
__ prefix to avoid variable name collisions
agent-browser eval --stdin <<'EVALEOF'
window.__mo = [];
new MutationObserver(mutations => {
mutations.forEach(m => window.__mo.push({type: m.type, target: m.target.tagName}));
}).observe(document.body, {childList: true, subtree: true});
EVALEOF
Parallel Sessions
agent-browser --session site1 open https://site-a.com
agent-browser --session site2 open https://site-b.com
agent-browser --session site1 snapshot -i
agent-browser --session site2 snapshot -i
agent-browser session list
Connect to Existing Chrome (escape hatch)
--auto-connect is reserved for cases where state-import is insufficient (real-time observation of the user's live browser, IndexedDB-bound sites, the state save operation itself):
agent-browser --auto-connect open https://example.com
agent-browser --auto-connect snapshot
agent-browser --cdp 9222 snapshot
Auto-connect discovers Chrome via DevToolsActivePort, common debugging ports (9222, 9229), and falls back to a direct WebSocket connection if HTTP-based CDP discovery fails. Using this attaches to the user's window — coordinate to avoid collisions.
Color Scheme (Dark Mode)
agent-browser --color-scheme dark open https://example.com
AGENT_BROWSER_COLOR_SCHEME=dark agent-browser open https://example.com
agent-browser set media dark
Viewport & Responsive Testing
agent-browser set viewport 1920 1080
agent-browser screenshot desktop.png
agent-browser set viewport 375 812
agent-browser screenshot mobile.png
agent-browser set viewport 1920 1080 2
agent-browser screenshot retina.png
agent-browser set device "iPhone 14"
agent-browser screenshot device.png
The scale parameter (3rd argument) sets window.devicePixelRatio without changing CSS layout. Use it when testing retina rendering or capturing higher-resolution screenshots.
Visual Browser (Debugging)
agent-browser --session "claude-$PPID" --state "$HOME/.agent-browser-state/main.json" open https://example.com
agent-browser --session "claude-$PPID" highlight @e1
agent-browser --session "claude-$PPID" inspect
agent-browser --session "claude-$PPID" --state "$HOME/.agent-browser-state/main.json" record start demo.webm
agent-browser --session "claude-$PPID" profiler start
agent-browser --session "claude-$PPID" profiler stop trace.json
Use AGENT_BROWSER_HEADED=1 to enable headed mode via environment variable. Browser extensions work in both headed and headless mode.
Local Files (PDFs, HTML)
agent-browser --allow-file-access open file:///path/to/document.pdf
agent-browser --allow-file-access open file:///path/to/page.html
agent-browser screenshot output.png
iOS Simulator (Mobile Safari)
agent-browser device list
agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
agent-browser -p ios snapshot -i
agent-browser -p ios tap @e1
agent-browser -p ios fill @e2 "text"
agent-browser -p ios swipe up
agent-browser -p ios screenshot mobile.png
agent-browser -p ios close
Requirements: macOS with Xcode, Appium (npm install -g appium && appium driver install xcuitest)
Real devices: Works with physical iOS devices if pre-configured. Use --device "<UDID>" where UDID is from xcrun xctrace list devices.
Security
All security features are opt-in. By default, agent-browser imposes no restrictions on navigation, actions, or output.
Content Boundaries (Recommended for AI Agents)
Enable --content-boundaries to wrap page-sourced output in markers that help LLMs distinguish tool output from untrusted page content:
export AGENT_BROWSER_CONTENT_BOUNDARIES=1
agent-browser snapshot
Domain Allowlist
Restrict navigation to trusted domains. Wildcards like *.example.com also match the bare domain example.com. Sub-resource requests, WebSocket, and EventSource connections to non-allowed domains are also blocked. Include CDN domains your target pages depend on:
export AGENT_BROWSER_ALLOWED_DOMAINS="example.com,*.example.com"
agent-browser open https://example.com
agent-browser open https://malicious.com
Action Policy
Use a policy file to gate destructive actions:
export AGENT_BROWSER_ACTION_POLICY=./policy.json
Example policy.json:
{ "default": "deny", "allow": ["navigate", "snapshot", "click", "scroll", "wait", "get"] }
Auth vault operations (auth login, etc.) bypass action policy but domain allowlist still applies.
Output Limits
Prevent context flooding from large pages:
export AGENT_BROWSER_MAX_OUTPUT=50000
Diffing (Verifying Changes)
Use diff snapshot after performing an action to verify it had the intended effect. This compares the current accessibility tree against the last snapshot taken in the session.
agent-browser snapshot -i
agent-browser click @e2
agent-browser diff snapshot
For visual regression testing or monitoring:
agent-browser screenshot baseline.png
agent-browser diff screenshot --baseline baseline.png
agent-browser diff url https://staging.example.com https://prod.example.com --screenshot
diff snapshot output uses + for additions and - for removals, similar to git diff. diff screenshot produces a diff image with changed pixels highlighted in red, plus a mismatch percentage.
Timeouts and Slow Pages
The default timeout is 25 seconds. This can be overridden with the AGENT_BROWSER_DEFAULT_TIMEOUT environment variable (value in milliseconds).
Important: open already waits for the page load event before returning. In most cases, no additional wait is needed before taking a snapshot or screenshot. Only add an explicit wait when content loads asynchronously after the initial page load.
agent-browser wait "#content"
agent-browser wait @e1
agent-browser wait 2000
agent-browser wait --url "**/dashboard"
agent-browser wait --text "Results loaded"
agent-browser wait --fn "document.querySelectorAll('.item').length > 0"
Avoid wait --load networkidle unless you are certain the site has no persistent network activity. Ad-heavy sites, sites with analytics/tracking, and sites with websockets will cause networkidle to hang indefinitely. Prefer wait 2000 or wait <selector> instead.
Exception for controlled SPAs: For known single-page applications without ads or persistent network activity (e.g., internal corporate apps), wait --load networkidle remains the most reliable approach for ensuring async data loads complete fully.
To auto-shutdown the daemon after a period of inactivity (useful for ephemeral/CI environments):
AGENT_BROWSER_IDLE_TIMEOUT_MS=60000 agent-browser open example.com
Handling Native Dialogs
When a page opens a JavaScript dialog (alert(), confirm(), or prompt()), it blocks all other browser commands (snapshot, screenshot, click, etc.) until the dialog is dismissed. By default, alert and beforeunload dialogs are auto-dismissed, but confirm and prompt require explicit handling. Use --no-auto-dialog (or AGENT_BROWSER_NO_AUTO_DIALOG=1) to disable automatic handling.
When a dialog is pending, all command responses include a warning field indicating the dialog type and message. In --json mode this appears as a "warning" key in the response object.
Reactive: Dismiss a dialog that already appeared
agent-browser dialog status
agent-browser dialog accept
agent-browser dialog dismiss
agent-browser dialog accept "input text"
If agent-browser hangs after a click (commands time out), a native dialog is likely blocking. Run dialog accept or dialog dismiss to unblock.
Proactive: Prevent dialogs from blocking (recommended for QA/testing)
Before interacting with pages that may trigger alert() or confirm(), override them via eval to capture messages without blocking:
agent-browser eval --stdin <<'EVALEOF'
window.__dialogMessages = [];
window.alert = function(msg) { window.__dialogMessages.push({type:'alert',msg}); };
window.confirm = function(msg) { window.__dialogMessages.push({type:'confirm',msg}); return true; };
EVALEOF
After interactions, read captured messages:
agent-browser eval 'JSON.stringify(window.__dialogMessages)'
Important: Do NOT call the original alert/confirm from interceptors (e.g., origAlert.call(window, msg)). This re-triggers the native dialog and blocks again.
Inspecting API error responses
agent-browser network requests captures request info but not response bodies. To diagnose API errors (e.g., 4xx/5xx responses), install a fetch interceptor:
agent-browser eval --stdin <<'EVALEOF'
window.__fetchLog = [];
const origFetch = window.fetch;
window.fetch = async function(...args) {
const res = await origFetch.apply(this, args);
const url = typeof args[0] === 'string' ? args[0] : args[0]?.url || '';
if (url.includes('target-api-path')) {
const clone = res.clone();
const body = await clone.text();
window.__fetchLog.push({ url, status: res.status, body: body.substring(0, 500) });
}
return res;
};
EVALEOF
Then read results with agent-browser eval 'JSON.stringify(window.__fetchLog)'.
Session Management and Cleanup
When running multiple agents or automations concurrently, always use named sessions to avoid conflicts:
agent-browser --session agent1 open site-a.com
agent-browser --session agent2 open site-b.com
agent-browser session list
Always close your browser session when done to avoid leaked processes:
agent-browser close
agent-browser --session agent1 close
agent-browser close --all
If a previous session was not closed properly, the daemon may still be running. Use agent-browser close to clean it up, or agent-browser close --all to shut down every session at once.
To auto-shutdown the daemon after a period of inactivity (useful for ephemeral/CI environments):
AGENT_BROWSER_IDLE_TIMEOUT_MS=60000 agent-browser open example.com
Ref Lifecycle (Important)
Refs (@e1, @e2, etc.) are invalidated when the page changes. Always re-snapshot after:
- Clicking links or buttons that navigate
- Form submissions
- Dynamic content loading (dropdowns, modals)
agent-browser click @e5
agent-browser snapshot -i
agent-browser click @e1
Reconnaissance-Then-Action
Before performing any interaction (click, input, etc.):
- Take a snapshot (
agent-browser snapshot -i) to observe current state
- Identify target refs from the snapshot output
- Execute the action with discovered refs
Do not guess selectors from source code alone -- always verify against rendered state.
After open, wait for content (agent-browser wait 2000 or agent-browser wait <selector>) before inspecting -- SPA client-side transitions may not be complete immediately.
Plan Mode Verification
When investigating browser UI or rendering issues in plan mode:
- Perform actual measurement via agent-browser, not theoretical reasoning
- Capture DOM element sizes, CSS applied state, layout calculations before forming a plan
- Compare against Figma designs or reference pages for visual accuracy
Annotated Screenshots (Vision Mode)
Use --annotate to take a screenshot with numbered labels overlaid on interactive elements. Each label [N] maps to ref @eN. This also caches refs, so you can interact with elements immediately without a separate snapshot.
In native mode, this currently works on the CDP-backed browser path (Chromium/Lightpanda). The Safari/WebDriver backend does not yet support --annotate.
agent-browser screenshot --annotate
agent-browser click @e2
Use annotated screenshots when:
- The page has unlabeled icon buttons or visual-only elements
- You need to verify visual layout or styling
- Canvas or chart elements are present (invisible to text snapshots)
- You need spatial reasoning about element positions
Semantic Locators (Alternative to Refs)
When refs are unavailable or unreliable, use semantic locators:
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find role button click --name "Submit"
agent-browser find placeholder "Search" type "query"
agent-browser find testid "submit-btn" click
JavaScript Evaluation (eval)
Use eval to run JavaScript in the browser context. Shell quoting can corrupt complex expressions -- use --stdin or -b to avoid issues.
agent-browser eval 'document.title'
agent-browser eval 'document.querySelectorAll("img").length'
agent-browser eval --stdin <<'EVALEOF'
JSON.stringify(
Array.from(document.querySelectorAll("img"))
.filter(i => !i.alt)
.map(i => ({ src: i.src.split("/").pop(), width: i.width }))
)
EVALEOF
agent-browser eval -b "$(echo -n 'Array.from(document.querySelectorAll("a")).map(a => a.href)' | base64)"
Why this matters: When the shell processes your command, inner double quotes, ! characters (history expansion), backticks, and $() can all corrupt the JavaScript before it reaches agent-browser. The --stdin and -b flags bypass shell interpretation entirely.
Rules of thumb:
- Single-line, no nested quotes -> regular
eval 'expression' with single quotes is fine
- Nested quotes, arrow functions, template literals, or multiline -> use
eval --stdin <<'EVALEOF'
- Programmatic/generated scripts -> use
eval -b with base64
Configuration File
Create agent-browser.json in the project root for persistent settings. Set $schema so editors like VS Code provide autocomplete and validation against the upstream schema:
{
"$schema": "https://agent-browser.dev/schema.json",
"headed": true,
"proxy": "http://localhost:8080",
"profile": "./browser-data"
}
Priority (lowest to highest): ~/.agent-browser/config.json < ./agent-browser.json < env vars < CLI flags. Use --config <path> or AGENT_BROWSER_CONFIG env var for a custom config file (exits with error if missing/invalid). All CLI options map to camelCase keys (e.g., --executable-path -> "executablePath"). Boolean flags accept true/false values (e.g., --headed false overrides config). Extensions from user and project configs are merged, not replaced.
Diagnosing Install Issues
Run agent-browser doctor first whenever a command fails unexpectedly (Unknown command, Failed to connect, version mismatches after agent-browser upgrade, missing Chrome, stale daemon sockets, etc.). It runs a one-shot diagnosis across env, Chrome, daemons, config, providers, network, and a headless launch test.
agent-browser doctor
agent-browser doctor --offline --quick
agent-browser doctor --fix
agent-browser doctor --json
Stale socket / pid / version sidecar files are auto-cleaned on every run. Destructive actions require --fix. Exit code is 0 if all checks pass (warnings OK), 1 otherwise.
Anti-patterns
- Do not hardcode user names, IDs, or expected values -- derive from DOM/API
- Do not assume page-load requests are available without initializing
network requests first
- Do not use
sessionStorage.setItem for MutationObserver logs -- use window.__ variables
- Do not rely solely on "it renders" for visual verification -- compare against reference designs
- Do not guess refs -- always snapshot first and use the returned
@eN references
- Do not call original
alert/confirm from eval interceptors -- it re-triggers the native dialog and blocks agent-browser
- Do not use
wait --load networkidle on ad-heavy or analytics-heavy sites -- it will hang indefinitely; use wait 2000 or wait <selector> instead
- Do not use
scroll on sites with single-key keyboard shortcuts (GitHub, Gmail, Linear, Discord) -- it sends keyboard events that collide with site handlers and silently navigate away; use pdf for content capture, or scrollintoview @ref / eval window.scrollTo() followed by get url verification
- Do not skip URL/tab verification between actions during recording --
get url or tab list inside the same batch is the only way to detect silent navigation before the video is reviewed; the recording itself succeeds even when the page drifts
- Do not assume
eval window.scrollTo() is keybind-safe -- a long wait after eval scrollTo was observed to land on the dashboard in one session; verify URL with get url immediately after every scroll-like action
- Do not retry the same failing strategy more than twice -- when scroll+screenshot misfires, switch strategy (use
pdf for content capture, scrollintoview @ref instead of scroll, or capture the initial viewport only) instead of repeating the same approach
Deep-Dive Documentation
Cloud Providers
Use -p <provider> (or AGENT_BROWSER_PROVIDER) to run against a cloud browser instead of launching a local Chrome instance. Supported providers: agentcore, browserbase, browserless, browseruse, kernel.
AgentCore (AWS Bedrock)
agent-browser -p agentcore open https://example.com
AGENTCORE_PROFILE_ID=my-profile agent-browser -p agentcore open https://example.com
AGENTCORE_REGION=eu-west-1 agent-browser -p agentcore open https://example.com
Set AWS_PROFILE to select a named AWS profile.
Experimental: Native Mode
agent-browser has an experimental native Rust daemon that communicates with Chrome directly via CDP, bypassing Node.js and Playwright entirely. It is opt-in and not recommended for production use yet.
agent-browser --native open example.com
export AGENT_BROWSER_NATIVE=1
agent-browser open example.com
The native daemon supports Chromium and Safari (via WebDriver). Firefox and WebKit are not yet supported. All core commands (navigate, snapshot, click, fill, screenshot, cookies, storage, tabs, eval, etc.) work identically in native mode. Use agent-browser close before switching between native and default mode within the same session.
Browser Engine Selection
Use --engine to choose a local browser engine. The default is chrome.
agent-browser --engine lightpanda open example.com
export AGENT_BROWSER_ENGINE=lightpanda
agent-browser open example.com
agent-browser --engine lightpanda --executable-path /path/to/lightpanda open example.com
Supported engines:
chrome (default) -- Chrome/Chromium via CDP
lightpanda -- Lightpanda headless browser via CDP (10x faster, 10x less memory than Chrome)
Lightpanda does not support --extension, --profile, --state, or --allow-file-access. Install Lightpanda from https://lightpanda.io/docs/open-source/installation.
Observability Dashboard
The dashboard is a standalone background server that shows live browser viewports, command activity, and console output for all sessions.
agent-browser dashboard install
agent-browser dashboard start
agent-browser open example.com
agent-browser dashboard stop
The dashboard runs independently of browser sessions on port 4848 (configurable with --port). All sessions automatically stream to the dashboard. Sessions can also be created from the dashboard UI with local engines or cloud providers.
Dashboard AI Chat
The dashboard has an optional AI chat tab powered by the Vercel AI Gateway. Enable it by setting:
export AI_GATEWAY_API_KEY=gw_your_key_here
export AI_GATEWAY_MODEL=anthropic/claude-sonnet-4.6
export AI_GATEWAY_URL=https://ai-gateway.vercel.sh
The Chat tab is always visible in the dashboard. Set AI_GATEWAY_API_KEY to enable AI responses.
Ready-to-Use Templates
./templates/form-automation.sh https://example.com/form
./templates/authenticated-session.sh https://app.example.com/login
./templates/capture-workflow.sh https://example.com ./output