| name | agent-browser |
| description | Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages. |
Browser Automation with agent-browser
웹 브라우저를 자동화하여 테스트, 폼 작성, 스크린샷, 데이터 추출 등을 수행합니다.
Instructions
워크플로우: 탐색 → 분석 → 상호작용 → 검증
Quick start
agent-browser open <url>
agent-browser snapshot -i
agent-browser click @e1
agent-browser fill @e2 "text"
agent-browser close
Core workflow
- Navigate:
agent-browser open <url>
- Snapshot:
agent-browser snapshot -i (returns elements with refs like @e1, @e2)
- Interact using refs from the snapshot
- Re-snapshot after navigation or significant DOM changes
Commands
Navigation
agent-browser open <url>
agent-browser back
agent-browser forward
agent-browser reload
agent-browser close
Snapshot (page analysis)
agent-browser snapshot
agent-browser snapshot -i
agent-browser snapshot -c
agent-browser snapshot -d 3
Interactions (use @refs from snapshot)
agent-browser click @e1
agent-browser dblclick @e1
agent-browser fill @e2 "text"
agent-browser type @e2 "text"
agent-browser press Enter
agent-browser press Control+a
agent-browser hover @e1
agent-browser check @e1
agent-browser uncheck @e1
agent-browser select @e1 "value"
agent-browser scroll down 500
agent-browser scrollintoview @e1
Get information
agent-browser get text @e1
agent-browser get value @e1
agent-browser get title
agent-browser get url
Screenshots
agent-browser screenshot
agent-browser screenshot path.png
agent-browser screenshot --full
Wait
agent-browser wait @e1
agent-browser wait 2000
agent-browser wait --text "Success"
agent-browser wait --load networkidle
Semantic locators (alternative to refs)
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
Examples
Example 1: Form submission
agent-browser open https://example.com/form
agent-browser snapshot -i
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i
Example 2: Authentication with saved state
agent-browser open https://app.example.com/login
agent-browser snapshot -i
agent-browser fill @e1 "username"
agent-browser fill @e2 "password"
agent-browser click @e3
agent-browser wait --url "**/dashboard"
agent-browser state save auth.json
agent-browser state load auth.json
agent-browser open https://app.example.com/dashboard
Example 3: Data extraction
agent-browser open https://example.com/products
agent-browser snapshot -i
agent-browser get text @e5
agent-browser get text @e6
agent-browser screenshot products.png
Advanced Features
Sessions (parallel browsers)
동시 사용 시 --session 필수. --session 없이 실행하면 모든 세션이 default daemon을 공유하여 서로 간섭한다.
세션 이름 규칙:
- worktree에서 실행 → worktree 브랜치명 사용
- 메인에서 실행 → 작업 목적으로 명명 (예:
qa-login, data-extract)
agent-browser --session my-feature open site-a.com
agent-browser --session my-feature snapshot -i
agent-browser --session other-task open site-b.com
agent-browser session list
JSON output (for parsing)
Add --json for machine-readable output:
agent-browser snapshot -i --json
agent-browser get text @e1 --json
Debugging
agent-browser open example.com --headed
agent-browser console
agent-browser errors
Best Practices
- 동시 사용 시
--session 지정: --session 없으면 default 세션을 공유하여 간섭 발생. 항상 고유한 세션명 사용
- Always snapshot before interacting: Get fresh refs after navigation or DOM changes
- Use interactive snapshot (
-i): Reduces noise, focuses on actionable elements
- Wait appropriately: Use
wait --load networkidle after actions that trigger navigation
- Save auth state: Reuse login sessions with
state save/load
- Take screenshots for verification: Visual confirmation of expected state
- Use semantic locators for stable tests:
find role/text/label is more resilient than refs
- 작업 완료 후
close: 유휴 daemon은 15분 후 자동 종료되지만, 리소스 절약을 위해 명시적 close 권장
Technical Details
- Based on Playwright browser automation
- Supports Chromium, Firefox, and WebKit
- Uses accessibility tree for element detection
- Provides stable references (@e1, @e2) for elements
- Handles common web testing scenarios out-of-the-box