| name | agent-browser |
| description | LOCAL browser automation CLI for AI agents. The ONLY tool that can access localhost, 127.0.0.1, and local network URLs. Use for ALL browser tasks including testing local dev servers, navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, and automation. Triggers include requests to "open a website", "test localhost", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "test my local server", "login to a site", "automate browser actions", or any task requiring programmatic web interaction. |
| allowed-tools | Bash(agent-browser:*) |
Browser Automation with agent-browser
When to Use This Skill (IMPORTANT)
agent-browser is a LOCAL browser — it runs directly on this machine using a real Chromium instance. It can access:
- localhost / 127.0.0.1 (local dev servers, local APIs)
- Local network IPs (192.168.x.x, 10.x.x.x, etc.)
- Any remote URL (https://example.com, etc.)
- Local files (file:///path/to/file.html)
DO NOT use web_remote for localhost testing
The web_remote tool (formerly web_test) is a remote service powered by Cloudflare Browser Rendering. It runs on Cloudflare's servers and CANNOT reach localhost, 127.0.0.1, or any local network address. It will fail silently or error on local URLs.
| Scenario | Use This | NOT This |
|---|
| Test localhost:3000 | agent-browser | web_remote |
| Test 127.0.0.1:8080 | agent-browser | web_remote |
| Test local network device | agent-browser | web_remote |
| Fill forms, click buttons | agent-browser | web_remote |
| Multi-step browser workflow | agent-browser | web_remote |
| Screenshot a public site (quick, no interaction) | agent-browser or web_remote | — |
| Accessibility audit of public site | agent-browser or web_remote | — |
Rule of thumb: Always prefer agent-browser. Use web_remote only when you need a quick remote screenshot/a11y audit and don't need any interaction.
Core Workflow
Every browser automation follows this pattern:
- Navigate:
agent-browser open <url>
- Snapshot:
agent-browser snapshot -i (get element refs like @e1, @e2)
- Interact: Use refs to click, fill, select
- Re-snapshot: After navigation or DOM changes, get fresh refs
agent-browser open https://example.com/form
agent-browser snapshot -i
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i
Essential Commands
agent-browser open <url>
agent-browser close
agent-browser snapshot -i
agent-browser snapshot -i -C
agent-browser snapshot -s "#selector"
agent-browser click @e1
agent-browser fill @e2 "text"
agent-browser type @e2 "text"
agent-browser select @e1 "option"
agent-browser check @e1
agent-browser press Enter
agent-browser scroll down 500
agent-browser get text @e1
agent-browser get url
agent-browser get title
agent-browser wait @e1
agent-browser wait --load networkidle
agent-browser wait --url "**/page"
agent-browser wait 2000
agent-browser screenshot
agent-browser screenshot --full
agent-browser pdf output.pdf
Common Patterns
Form Submission
agent-browser open https://example.com/signup
agent-browser snapshot -i
agent-browser fill @e1 "Jane Doe"
agent-browser fill @e2 "jane@example.com"
agent-browser select @e3 "California"
agent-browser check @e4
agent-browser click @e5
agent-browser wait --load networkidle
Authentication with State Persistence
agent-browser open https://app.example.com/login
agent-browser snapshot -i
agent-browser fill @e1 "$USERNAME"
agent-browser fill @e2 "$PASSWORD"
agent-browser click @e3
agent-browser wait --url "**/dashboard"
agent-browser state save auth.json
agent-browser state load auth.json
agent-browser open https://app.example.com/dashboard
Data Extraction
agent-browser open https://example.com/products
agent-browser snapshot -i
agent-browser get text @e5
agent-browser get text body > page.txt
agent-browser snapshot -i --json
agent-browser get text @e1 --json
Parallel Sessions
agent-browser --session site1 open https://site-a.com
agent-browser --session site2 open https://site-b.com
agent-browser --session site1 snapshot -i
agent-browser --session site2 snapshot -i
agent-browser session list
Visual Browser (Debugging)
agent-browser --headed open https://example.com
agent-browser highlight @e1
agent-browser record start demo.webm
Local Files (PDFs, HTML)
agent-browser --allow-file-access open file:///path/to/document.pdf
agent-browser --allow-file-access open file:///path/to/page.html
agent-browser screenshot output.png
iOS Simulator (Mobile Safari)
agent-browser device list
agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
agent-browser -p ios snapshot -i
agent-browser -p ios tap @e1
agent-browser -p ios fill @e2 "text"
agent-browser -p ios swipe up
agent-browser -p ios screenshot mobile.png
agent-browser -p ios close
Requirements: macOS with Xcode, Appium (npm install -g appium && appium driver install xcuitest)
Real devices: Works with physical iOS devices if pre-configured. Use --device "<UDID>" where UDID is from xcrun xctrace list devices.
Ref Lifecycle (Important)
Refs (@e1, @e2, etc.) are invalidated when the page changes. Always re-snapshot after:
- Clicking links or buttons that navigate
- Form submissions
- Dynamic content loading (dropdowns, modals)
agent-browser click @e5
agent-browser snapshot -i
agent-browser click @e1
Semantic Locators (Alternative to Refs)
When refs are unavailable or unreliable, use semantic locators:
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find role button click --name "Submit"
agent-browser find placeholder "Search" type "query"
agent-browser find testid "submit-btn" click
Deep-Dive Documentation
| Reference | When to Use |
|---|
| references/commands.md | Full command reference with all options |
| references/snapshot-refs.md | Ref lifecycle, invalidation rules, troubleshooting |
| references/session-management.md | Parallel sessions, state persistence, concurrent scraping |
| references/authentication.md | Login flows, OAuth, 2FA handling, state reuse |
| references/video-recording.md | Recording workflows for debugging and documentation |
| references/proxy-support.md | Proxy configuration, geo-testing, rotating proxies |
| references/web-search.md | Web search via DuckDuckGo/Google, extracting search results |
Ready-to-Use Templates
./templates/form-automation.sh https://example.com/form
./templates/authenticated-session.sh https://app.example.com/login
./templates/capture-workflow.sh https://example.com ./output