一键导入
agent-browser-cli
// Use agent-browser-cli to perceive and control the supervised Chromium browser inside the sandbox, interact with pages, capture screenshots/PDFs, inspect cookies/CDP/network/console state, and troubleshoot only when needed.
// Use agent-browser-cli to perceive and control the supervised Chromium browser inside the sandbox, interact with pages, capture screenshots/PDFs, inspect cookies/CDP/network/console state, and troubleshoot only when needed.
| name | agent-browser-cli |
| description | Use agent-browser-cli to perceive and control the supervised Chromium browser inside the sandbox, interact with pages, capture screenshots/PDFs, inspect cookies/CDP/network/console state, and troubleshoot only when needed. |
Use agent-browser-cli when a task requires browser perception, browser control, page interaction, screenshots, PDFs, cookies, Chrome DevTools Protocol, network logs, console logs, or browser troubleshooting.
agent-browser-cli controls the sandbox's supervised Chromium through a Rust daemon and the bundled Chrome extension bridge. The browser is launched by supervisord with profile data under /data and the extension loaded from /opt/agent-browser/chrome-extension. It is not Selenium or Playwright.
Follow these rules strictly:
scan for page content.snapshot for actionable element references.exec, JSON commands, or CDP only as escape hatches.snapshot after any page structure or content change before reusing @e references.@e reference is ambiguous.For ordinary browser tasks, run the target command directly. These commands auto-start the daemon when needed:
agent-browser-cli tabs
agent-browser-cli open https://example.com
agent-browser-cli scan --text-only
agent-browser-cli snapshot --limit 200
agent-browser-cli exec --tab <tabId> 'return document.title'
Do not run status, doctor, or logs first unless one of these is true:
If troubleshooting is justified, use this order:
agent-browser-cli status
agent-browser-cli doctor
agent-browser-cli logs --tail 100
daemon_not_running and running=false do not by themselves justify stopping browser work. Prefer a target command that can auto-start the daemon.
doctor only checks state. It does not auto-start the daemon or change configuration.
Choose the smallest sufficient command:
scan Read page text, headings, lists, visible content, and general page state.
snapshot Find clickable/fillable elements and create @e references.
exec Run JavaScript when high-level commands are insufficient.
JSON/CDP Handle cookies, cross-tab work, low-level browser control, or unsupported cases.
Preferred workflow:
scan or scan --text-only to understand content.click or fill with that selector.snapshot and act on @e references.snapshot again before using @e.exec, JSON, or CDP only when high-level commands cannot express the action reliably.Common commands:
agent-browser-cli tabs
agent-browser-cli tabtree
agent-browser-cli tabtree --full
agent-browser-cli tabtree --profile work
agent-browser-cli tabtree --tab <tabId>
agent-browser-cli lookup tab <tabId>
agent-browser-cli lookup browser <browser_id>
agent-browser-cli lookup profile work
agent-browser-cli tabs --profile work
agent-browser-cli open https://example.com
agent-browser-cli open --profile work https://example.com
agent-browser-cli open --window https://example.com
agent-browser-cli open --window --focus https://example.com
agent-browser-cli scan --tabs-only
agent-browser-cli scan --profile work --tab <tabId> --text-only
agent-browser-cli snapshot --limit 200
agent-browser-cli snapshot --offset 200 --limit 200
agent-browser-cli snapshot --details
agent-browser-cli click 'button[type=submit]'
agent-browser-cli click '@e1'
agent-browser-cli fill '@e2' 'hello'
agent-browser-cli fill '@e2' --clear
agent-browser-cli fill '@e2' ' world' --append
agent-browser-cli send-keys --target '@e2' 'Enter'
agent-browser-cli mouse-click '@e3'
agent-browser-cli close --tab <tabId>
agent-browser-cli exec --tab <tabId> 'return document.title'
After open, continue against result.opened_tab_id or result.opened_session_key when the command output provides them.
High-level actions support --tab <tabId>.
When multiple Chrome profiles or browser instances are present, commands may also support:
--profile <profile_id-or-label>
--browser <browser_id>
tabs output includes:
browser_id
profile_id
profile_label
tab_id
session_key
Use tabtree to inspect browser -> profile -> tabs. It supports --tab, --profile, and --browser filters. Filtered output still preserves parent nodes.
Use tabtree --full when compact output omits full URLs or session_key.
Use lookup commands when identity is unclear:
agent-browser-cli lookup tab <tabId>
agent-browser-cli lookup browser <browser_id>
agent-browser-cli lookup profile <profile_id-or-label>
Profile labels:
agent-browser-cli profile-label set work --profile <profile_id>
agent-browser-cli profile-label clear --profile <profile_id>
Profile label constraints:
-, _, and ..--profile or --browser.@e references:
session_key.snapshot.@e1, @e2, etc.snapshot.If --tab alone is ambiguous, add --profile or --browser. Do not guess.
Keep waiting and monitoring separate.
Use --wait-js for slow page changes:
agent-browser-cli click '@e1' --wait-js 'return document.body.innerText.includes("Done")' --wait-timeout 10
Use --monitor only when before/after page diff information is useful:
agent-browser-cli click '@e1' --wait-js 'return document.body.innerText.includes("Done")' --wait-timeout 10 --monitor
--monitor is disabled by default.
Do not use fixed setTimeout sleeps inside scripts when --wait-js can express the condition.
The extension does not permanently rewrite business-page alert, confirm, or prompt.
During a CLI page command, native dialogs may be temporarily suppressed and then restored. If native dialog behavior appears wrong, ask the user to reload the Chrome extension and the page before deeper troubleshooting.
Fixed API port:
127.0.0.1:18767
Default extension WebSocket port:
127.0.0.1:18765
Configuration file:
~/.agent-browser-cli/config.json
Changing the extension port affects both the Chrome extension and the daemon connection. Before running this command, explain the impact and get explicit user confirmation:
agent-browser-cli set-extension-port <port>
This writes to ~/.agent-browser-cli/config.json. If the daemon is running, it restarts.
Use network and console commands only when the task requires request or log inspection.
The extension must continuously listen to CDP events for these features. After extension modification or upgrade, require the user to reload the Chrome extension before relying on network or console capture.
Network commands:
agent-browser-cli network start --tab <tabId>
agent-browser-cli network list --tab <tabId>
agent-browser-cli network list --tab <tabId> --filter api
agent-browser-cli network detail <requestId> --tab <tabId>
agent-browser-cli network clear --tab <tabId>
agent-browser-cli network stop --tab <tabId>
Console commands:
agent-browser-cli console start --tab <tabId>
agent-browser-cli console list --tab <tabId>
agent-browser-cli console list --tab <tabId> --level error
agent-browser-cli console clear --tab <tabId>
agent-browser-cli console stop --tab <tabId>
Network and console constraints:
network detail may truncate large response bodies and mark base64Encoded.network clear clears request cache.network stop stops listening and clears request cache.console clear clears log cache.console stop stops listening and clears log cache.agent-browser-cli stop and daemon idle exit also clear daemon-side snapshot and @e caches and notify the extension to clear network and console debug caches.When opening new tabs for separate work streams, use Chrome native tab groups through --session or --group-title.
Use a new window only when isolation is useful:
agent-browser-cli open https://example.com --session research
agent-browser-cli open https://example.com --group-title "TaskA"
agent-browser-cli open --window https://example.com
agent-browser-cli open --window --focus https://example.com
agent-browser-cli open --profile work https://example.com
Rules:
--session and --group-title both set the tab group title.--group-title wins.--window does not focus the new window unless --focus is also provided.--window --group-title and --window --session add the first tab in the new window to the group.Always let the CLI write screenshots and PDFs to files. Do not paste base64 into the conversation.
Screenshots:
agent-browser-cli screenshot --out /tmp/page.png
agent-browser-cli screenshot --full-page --out /tmp/full.png
agent-browser-cli screenshot --target '@e1' --out /tmp/button.png
agent-browser-cli screenshot --selector 'button[type=submit]' --format jpeg --quality 70 --out /tmp/button.jpg
Screenshot rules:
--full-page captures the full page.--target or the compatible alias --selector.@e reference or a CSS selector.--out, files are written under /tmp/agent-browser-cli-screenshots/.PDFs:
agent-browser-cli save-pdf --out /tmp/page.pdf
agent-browser-cli save-pdf --paper a4 --landscape --scale 0.9 --out /tmp/page.pdf
PDF rules:
paper=a4, scale=1.0, and print-background=true.--no-print-background only when backgrounds should be disabled.--out, files are written under /tmp/agent-browser-cli-pdfs/.If high-level capture commands fail, use exec and CDP only if the script writes the artifact to disk. Still do not paste base64.
Use inline exec only for small scripts:
agent-browser-cli exec --tab <tabId> 'return document.title'
For complex JavaScript, write a temporary script file and run:
agent-browser-cli exec --tab <tabId> --file /tmp/script.js
Execution rules:
--wait-js over fixed delays.await, explicitly return the final value or the result may be null.Example:
agent-browser-cli exec --tab <tabId> 'document.querySelector("button").click()' --wait-js 'return document.body.innerText.includes("Done")' --wait-timeout 3
Use JSON commands for cookies, cross-tab work, CDP, extension management, or browser content permissions when high-level commands are insufficient.
Examples:
agent-browser-cli exec '{"cmd":"tabs"}'
agent-browser-cli exec '{"cmd":"cookies"}'
agent-browser-cli exec '{"cmd":"cdp","tabId":303987837,"method":"Page.captureScreenshot","params":{"format":"png"}}'
CDP interaction rules:
mouseMoved, mousePressed, mouseReleased.mouseMoved(0,0) when needed.@e, or DOM JavaScript are inadequate.Prefer the browser DataTransfer API for file upload simulation. Do not prefer CDP DOM.setFileInputFiles unless the page blocks normal DOM assignment.
const input = document.querySelector('input[type=file]');
const file = new File(['content'], 'demo.txt', { type: 'text/plain' });
const dt = new DataTransfer();
dt.items.add(file);
input.files = dt.files;
input.dispatchEvent(new Event('input', { bubbles: true }));
input.dispatchEvent(new Event('change', { bubbles: true }));
return input.files.length;
Enter troubleshooting only after a failed target command, an explicit browser/extension/connectivity error, or a direct user request.
Allowed lightweight troubleshooting commands:
agent-browser-cli status
agent-browser-cli doctor
agent-browser-cli logs --tail 100
agent-browser-cli restart
agent-browser-cli stop
agent-browser-cli tabs
Interpret status carefully:
ok=true means the status command executed successfully.healthy and summary.daemon_not_running before a target command is usually normal.Typical status:
{ "summary": "daemon_not_running", "running": false }
During normal browser work, first run a target command such as tabs, open, exec, or scan so the daemon can auto-start.
Use restart only when auto-start fails or explicit troubleshooting is requested:
agent-browser-cli restart
agent-browser-cli status
Typical status:
{ "summary": "extension_not_connected", "connection": { "extension_connected": false } }
Handle in this order:
/opt/agent-browser/chrome-extension extension is loaded.configured_port shown by doctor.chrome://extensions.agent-browser-cli status
Typical status:
{ "summary": "no_active_tabs", "connection": { "active_tabs": 0 } }
Open or ask the user to open a normal http or https page. Do not rely on about:blank, chrome:// pages, or extension pages.
Typical status:
{ "summary": "port_mismatch" }
First try:
agent-browser-cli restart
agent-browser-cli doctor
If changing the extension port is necessary, explain that it writes ~/.agent-browser-cli/config.json, affects the Chrome extension and daemon connection, and may restart the daemon. Get explicit confirmation before:
agent-browser-cli set-extension-port <port>
Daemon log file:
~/.agent-browser-cli/logs/daemon.log
Read logs through the CLI. By default, logs shows the most recent 100 lines:
agent-browser-cli logs
agent-browser-cli logs --tail 200
logs outputs plain text. It does not output JSON and does not support --follow.
agent-browser-cli restart
agent-browser-cli stop
restart reloads configuration and starts the daemon.
stop stops only the daemon.
When using this skill in an agent response:
Use when a task requires shell-level work inside the sandbox, including environment setup, script writing, code execution, running programs, downloads, package installs, scanning, or browser/tool CLIs.
Use ProjectDiscovery httpx for authorized HTTP probing, live host validation, response triage, and lightweight web fingerprint collection.
Use observer_ward for authorized web application and service fingerprint identification against in-scope HTTP targets.
Reverse engineer binaries using Ghidra's headless analyzer. Decompile executables, extract functions, strings, symbols, and analyze call graphs without GUI.
Use for authorized SQL injection testing with the sqlmap CLI, including detection, DBMS fingerprinting, request replay, and extraction checks against in-scope web targets.
Use for authorized host discovery, port scanning, service/version detection, NSE script checks, network inventory, and local network diagnostics with the nmap CLI.