| name | agent-browser |
| description | Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages. |
Browser Automation with agent-browser
์น ๋ธ๋ผ์ฐ์ ๋ฅผ ์๋ํํ์ฌ ํ
์คํธ, ํผ ์์ฑ, ์คํฌ๋ฆฐ์ท, ๋ฐ์ดํฐ ์ถ์ถ ๋ฑ์ ์ํํฉ๋๋ค.
Instructions
์ํฌํ๋ก์ฐ: ํ์ โ ๋ถ์ โ ์ํธ์์ฉ โ ๊ฒ์ฆ
Quick start
agent-browser open <url>
agent-browser snapshot -i
agent-browser click @e1
agent-browser fill @e2 "text"
agent-browser close
Core workflow
- Navigate:
agent-browser open <url>
- Snapshot:
agent-browser snapshot -i (returns elements with refs like @e1, @e2)
- Interact using refs from the snapshot
- Re-snapshot after navigation or significant DOM changes
Commands
Navigation
agent-browser open <url>
agent-browser back
agent-browser forward
agent-browser reload
agent-browser close
Snapshot (page analysis)
agent-browser snapshot
agent-browser snapshot -i
agent-browser snapshot -c
agent-browser snapshot -d 3
Interactions (use @refs from snapshot)
agent-browser click @e1
agent-browser dblclick @e1
agent-browser fill @e2 "text"
agent-browser type @e2 "text"
agent-browser press Enter
agent-browser press Control+a
agent-browser hover @e1
agent-browser check @e1
agent-browser uncheck @e1
agent-browser select @e1 "value"
agent-browser scroll down 500
agent-browser scrollintoview @e1
Get information
agent-browser get text @e1
agent-browser get value @e1
agent-browser get title
agent-browser get url
Screenshots
agent-browser screenshot
agent-browser screenshot path.png
agent-browser screenshot --full
Wait
agent-browser wait @e1
agent-browser wait 2000
agent-browser wait --text "Success"
agent-browser wait --load networkidle
Semantic locators (alternative to refs)
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
Examples
Example 1: Form submission
agent-browser open https://example.com/form
agent-browser snapshot -i
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i
Example 2: Authentication with saved state
agent-browser open https://app.example.com/login
agent-browser snapshot -i
agent-browser fill @e1 "username"
agent-browser fill @e2 "password"
agent-browser click @e3
agent-browser wait --url "**/dashboard"
agent-browser state save auth.json
agent-browser state load auth.json
agent-browser open https://app.example.com/dashboard
Example 3: Data extraction
agent-browser open https://example.com/products
agent-browser snapshot -i
agent-browser get text @e5
agent-browser get text @e6
agent-browser screenshot products.png
Advanced Features
Sessions (parallel browsers)
๋์ ์ฌ์ฉ ์ --session ํ์. --session ์์ด ์คํํ๋ฉด ๋ชจ๋ ์ธ์
์ด default daemon์ ๊ณต์ ํ์ฌ ์๋ก ๊ฐ์ญํ๋ค.
์ธ์
์ด๋ฆ ๊ท์น:
- worktree์์ ์คํ โ worktree ๋ธ๋์น๋ช
์ฌ์ฉ
- ๋ฉ์ธ์์ ์คํ โ ์์
๋ชฉ์ ์ผ๋ก ๋ช
๋ช
(์:
qa-login, data-extract)
agent-browser --session my-feature open site-a.com
agent-browser --session my-feature snapshot -i
agent-browser --session other-task open site-b.com
agent-browser session list
JSON output (for parsing)
Add --json for machine-readable output:
agent-browser snapshot -i --json
agent-browser get text @e1 --json
Debugging
agent-browser open example.com --headed
agent-browser console
agent-browser errors
Best Practices
- ๋์ ์ฌ์ฉ ์
--session ์ง์ : --session ์์ผ๋ฉด default ์ธ์
์ ๊ณต์ ํ์ฌ ๊ฐ์ญ ๋ฐ์. ํญ์ ๊ณ ์ ํ ์ธ์
๋ช
์ฌ์ฉ
- Always snapshot before interacting: Get fresh refs after navigation or DOM changes
- Use interactive snapshot (
-i): Reduces noise, focuses on actionable elements
- Wait appropriately: Use
wait --load networkidle after actions that trigger navigation
- Save auth state: Reuse login sessions with
state save/load
- Take screenshots for verification: Visual confirmation of expected state
- Use semantic locators for stable tests:
find role/text/label is more resilient than refs
- ์์
์๋ฃ ํ
close: ์ ํด daemon์ 15๋ถ ํ ์๋ ์ข
๋ฃ๋์ง๋ง, ๋ฆฌ์์ค ์ ์ฝ์ ์ํด ๋ช
์์ close ๊ถ์ฅ
Technical Details
- Based on Playwright browser automation
- Supports Chromium, Firefox, and WebKit
- Uses accessibility tree for element detection
- Provides stable references (@e1, @e2) for elements
- Handles common web testing scenarios out-of-the-box