| name | browser-testing |
| description | Full browser automation via Agent Browser Protocol (ABP). Navigate, click, type, scroll, drag, screenshot, extract text, handle dialogs/downloads/file pickers, manage tabs, control JS execution. Single CLI tool. |
Browser Automation — ABP
Single tool: {baseDir}/browser.js <command> [args] [--flags]
ABP is a Chromium fork with a REST API baked into the engine. Every action is atomic — JS freezes between steps, no race conditions, no manual waits.
How ABP Works (Execution Model)
ABP pauses JavaScript and virtual time between your actions. The page is frozen until the next command.
Each action triggers a 3-phase settle cycle:
- Pre-network wait (150ms) — JS fires handlers from your action
- Network tracking — waits for triggered requests to complete (up to 1s timeout)
- Post-settle (350ms) — DOM stabilizes after network responses
One command = resume → dispatch action → settle → screenshot (if requested) → re-pause.
This means: no sleep() hacks, no race conditions, no flaky selectors. If a click triggers an API call, ABP waits for it automatically.
Setup
{baseDir}/browser.js start
{baseDir}/browser.js port
Port Management
Each project gets its own ABP instance automatically — no config needed.
Port is derived deterministically from git root path → range 9222–19221.
Same project = same port. Different project = different port.
Override: --port 8222 (any command) or ABP_PORT=8222 env var.
Launch Options
All flags below apply to start and are forwarded to ABP:
B={baseDir}/browser.js
$B start --headless
$B start --user-data-dir /tmp/prof
$B start --profile-directory Default
$B start --user-agent "MyBot/1.0"
$B start --zoom 1.5
$B start --verbose
$B start --session-dir ./my-session
$B start --config-file ./abp.json
$B start --disable-pause
$B start --min-wait 500
$B start --tracking-timeout 3000
$B start --post-settle 1000
$B start --chrome-args --disable-gpu,--no-sandbox
Core Commands
B={baseDir}/browser.js
$B nav https://example.com
$B nav https://other.com --new
$B back
$B forward
$B reload
$B click 450 320
$B click 450 320 --right
$B click 450 320 --double
$B click 450 320 --mod CTRL
$B hover 300 200
$B scroll 640 400 --dy 500
$B scroll 640 400 --dy -300
$B scroll 640 400 --dx 200
$B drag 100 200 500 200
$B drag 100 200 500 200 --steps 20
$B type hello world
$B key ENTER
$B key TAB
$B key ESCAPE
$B key a --mod CTRL
$B key c --mod CTRL
$B key ARROWDOWN
$B key BACKSPACE
$B key a --mod CTRL --action down
$B key a --action up
$B slider 400 300 75
$B clear 400 300
$B pick "Select the login button"
$B observe
$B observe "form"
$B observe --shot
$B assert text "Welcome"
$B assert selector "#dashboard"
$B assert url "/dashboard"
$B assert title "Dashboard"
$B watch --text "Done" --timeout 30000
$B watch --selector ".loaded"
$B watch --eval "items.length > 5"
$B watch --text "Done" --shot
$B screenshot
$B fullpage
$B screenshot --markup clickable
$B screenshot --markup typeable
$B screenshot --markup none
$B screenshot --format png
$B text
$B text "h1.title"
$B eval 'document.title'
$B eval '({links: document.querySelectorAll("a").length})'
$B content
$B content https://example.com
$B cookies
Waiting & Network
$B wait 2000
$B network
network is useful after actions that trigger slow API calls — when the 1s default tracking timeout isn't enough. It re-runs the settle cycle without performing any action.
Console & Error Capture
ABP doesn't expose DevTools console natively, but you can inject capture:
$B console install
$B console drain
$B console clear
Install once after navigation, then drain periodically. Captures console.*, uncaught exceptions, and unhandled promise rejections.
Deep Browser Access via eval
eval gives full access to the page's JS context — any Web API, any DOM operation:
$B eval 'document.querySelectorAll("form").length'
$B eval 'document.querySelector("#app").__vue__'
$B eval 'document.querySelector("#root")._reactRootContainer'
$B eval 'JSON.stringify(Object.fromEntries(Object.entries(localStorage)))'
$B eval 'sessionStorage.getItem("auth_token")'
$B eval 'navigator.serviceWorker.getRegistrations().then(r => r.map(sw => sw.scope))'
$B eval 'indexedDB.databases().then(dbs => dbs.map(d => d.name))'
$B eval 'new Promise(r => navigator.geolocation.getCurrentPosition(p => r(p.coords)))'
$B eval 'navigator.clipboard.readText()'
$B eval '({ hidden: document.hidden, focused: document.hasFocus(), visibility: document.visibilityState })'
$B eval 'getComputedStyle(document.querySelector(".btn")).backgroundColor'
$B eval 'document.querySelector("main").getAttribute("role")'
Gotcha: eval uses global scope — const/let redeclarations fail on second call. Wrap in IIFE: (() => { ... })()
Network Analysis
Use performance.getEntriesByType('resource') via eval to audit network requests after page load. Navigate, wait 5-8s for hydration, then eval.
$B eval "
(() => {
const e = performance.getEntriesByType('resource');
const apis = e.filter(r => r.name.includes('/api/'));
return 'total=' + e.length + ' api=' + apis.length + '\\n' +
apis.map(r => r.name.replace(/https?:\/\/[^/]+/,'').split('?')[0] +
' ' + Math.round(r.duration) + 'ms ' + (r.transferSize||0) + 'B').join('\\n');
})()
"
$B eval "
(() => {
const e = performance.getEntriesByType('resource');
const c = {};
for (const r of e) {
const u = r.name;
let k = 'other';
if (u.includes('/api/')) k = 'API';
else if (u.includes('.js')) k = 'JS';
else if (u.includes('.css')) k = 'CSS';
else if (u.match(/\.(png|jpg|webp|svg|gif)/)) k = 'Images';
else if (u.includes('.woff')) k = 'Fonts';
if (!c[k]) c[k] = [0, 0];
c[k][0]++;
c[k][1] += r.transferSize || 0;
}
return JSON.stringify(c);
})()
"
$B eval "
(() => {
if (window.__abpNet) return 'already installed';
window.__abpNet = [];
const origFetch = window.fetch;
window.fetch = async (...args) => {
const url = typeof args[0] === 'string' ? args[0] : args[0]?.url || '?';
const method = args[1]?.method || 'GET';
const start = Date.now();
try {
const r = await origFetch(...args);
window.__abpNet.push({ url, method, status: r.status, ms: Date.now()-start });
return r;
} catch(e) {
window.__abpNet.push({ url, method, error: e.message, ms: Date.now()-start });
throw e;
}
};
return 'installed';
})()
"
$B eval "(() => { const n = window.__abpNet || []; window.__abpNet = []; return JSON.stringify(n); })()"
Common gotchas:
transferSize=0 means cache hit (CF, browser, or service worker)
- Resources appear incrementally — wait 5-8s after nav before eval
performance.getEntriesByType has a 150-entry default buffer — large pages may truncate
Tabs
$B tabs
$B tabs new https://google.com
$B tabs activate <id>
$B tabs close <id>
$B tabs info <id>
$B tabs stop <id>
Browser Events
ABP surfaces events that normally require polling — dialogs, file pickers, downloads, select dropdowns, permission prompts. They appear in the output of any action.
$B dialog
$B dialog accept
$B dialog accept "response text"
$B dialog dismiss
$B download
$B download list --state completed
$B download list --limit 5
$B download status <id>
$B download cancel <id>
$B download get <id>
$B download get <id> --max-size 1048576
$B file <chooser_id> /path/to/file.pdf
$B file <chooser_id> file1.jpg file2.jpg
$B file <chooser_id> --cancel
$B file <chooser_id> --save /path/out.pdf
$B select <select_id> 2
$B permission
$B permission grant <id>
$B permission grant <id> --lat 42.36 --lng -71.06
$B permission deny <id>
Event Indicators
When events occur during any action, they're printed automatically:
→ https://new-page.com # Navigation happened
⚠ dialog (confirm): Delete item? # Dialog appeared
📁 file chooser id=fc_1 # File picker opened
⬇ download: report.pdf # Download started
▾ select id=s_1 (5 options) # Native select opened
🔐 permission id=p_1 geolocation # Permission requested
↗ popup: https://popup.com # Popup window
Execution Control
ABP freezes JS between actions by default. You can control this:
$B execution
$B execution pause
$B execution resume
Session & History
$B session-data
$B status
$B history
$B history current
$B history session <id>
$B history export <id>
$B history actions
$B history action <id>
$B history events
$B history event <id>
$B history clear
npx abp-debug
Advanced
$B batch '[{"type":"mouse_click","x":350,"y":200},{"type":"keyboard_type","text":"hello"},{"type":"keyboard_press","key":"ENTER"}]'
$B shutdown
$B shutdown --timeout 10000
Global Flags
| Flag | Description |
|---|
--tab <id> | Target specific tab (default: active) |
--port <N> | Override port (default: auto per project) |
--shot | Save screenshot after action (prints path) |
--markup <types> | Screenshot markup: interactive, clickable,typeable,scrollable,grid,selected, or none |
--format <fmt> | Screenshot format: webp (default), png, jpeg |
--json | Output raw API response as JSON |
Speed Rules
Token Cost Per Command
| Command | ~Tokens | Use when |
|---|
eval | 30–50 | Check DOM state, extract data, verify actions |
assert | 20–30 | Pass/fail check (text, selector, URL) |
text | 200–800 | Read visible page text |
observe | 150–250 | Structured page snapshot (what's interactive) |
screenshot | ~1500 | Need to see layout/visuals |
content | 500–3000 | Full article extraction |
What To Use When (Decision Tree)
"Am I on the right page?" → eval 'location.href' or eval 'document.title'
"What can I interact with?" → observe
"Did my action work?" → eval (check DOM/URL state) or assert
"What data is on the page?" → text or eval
"What does it look like?" → screenshot
"I'm lost / complex layout" → screenshot
"Waiting for async result" → watch --text "..." --timeout 30000
Default to eval/text/observe. Escalate to screenshot only when layout matters.
Observe — Structured Page Snapshot (~150 tokens)
Use observe instead of screenshot when you need to know what's on the page but don't need to see it:
$B observe
$B observe "form"
$B observe ".modal"
$B observe --shot
$B observe --json
Returns: URL, title, visible interactive elements (inputs with names/values/placeholders, buttons with text/hrefs), headings, and error messages. Only visible elements — hidden inputs and off-screen elements are filtered out. Enough to decide next action without vision.
Assert — Pass/Fail Verification (~20 tokens)
After actions, verify without vision:
$B assert text "Welcome back"
$B assert selector "#dashboard"
$B assert url "/dashboard"
$B assert title "Dashboard"
Returns ✓ PASS or ✗ FAIL: <context>. Zero tokens on happy path.
Watch — Wait For Async UI (~20 tokens)
Don't write polling loops. One command, one result:
$B watch --text "Payment complete" --timeout 30000
$B watch --selector ".loaded" --timeout 5000
$B watch --url "/success" --timeout 10000
$B watch --eval "document.querySelectorAll('.item').length >= 10" --timeout 5000
Resumes JS, polls internally, returns when matched or timed out.
Core Speed Rules
- Start ABP first:
browser.js start
- Don't screenshot every step: Only screenshot when you need to see layout/visuals.
- Use
observe as default awareness: After nav or complex actions, observe gives page state in ~150 tokens vs ~1500 for a screenshot.
- Use
assert for verification: After form submission, assert text "Success" — not screenshot + read.
- Observe the URL after search: Most SPAs encode filters in URL params. Copy it, modify it,
nav directly next time.
- Extract data via
eval, not vision: One JS query extracts 10 results faster than scrolling + screenshotting.
- Batch related inputs: Click + type + Enter = one
batch call instead of three.
- Use
text for simple data: text is faster than eval for plain text extraction.
- Use
watch instead of polling loops: When waiting for loading/async results.
- Use
network for slow pages: After nav to an SPA, network waits for all pending XHR/fetch to complete.
- Use pick for ambiguity: When coordinates are unclear, let the user click.
Anti-Pattern
click → screenshot → read image → decide → click → screenshot → ...
(each step: ~3s for screenshot + LLM vision round-trip)
Fast Patterns
Blind execution (known flow):
$B nav https://app.com
$B click 450 300
$B type user@example.com
$B key TAB
$B type mysecretpassword
$B key ENTER
$B assert text "Dashboard"
Explore then act (unknown page):
$B nav https://app.com
$B observe
$B click 450 300
$B assert url "/profile"
Data extraction (scraping):
$B nav https://shop.com/products
$B eval '([...document.querySelectorAll(".product")].map(e => ({name: e.querySelector("h2").textContent, price: e.querySelector(".price").textContent})))'
Multi-page parallel (compare/verify across pages):
$B nav https://app.com/page1 --new
$B nav https://app.com/page2 --new
$B nav https://app.com/page3 --new
Async wait (payment, loading):
$B click 500 400
$B watch --text "Payment confirmed" --timeout 30000
$B screenshot