| name | firefox-browser |
| description | Control the user's Firefox browser with their logins and cookies intact. Use when you need to browse websites as the user, interact with authenticated pages, fill forms, click buttons, take screenshots, or get page content. (user) |
| allowed-tools | Bash, Read, Write |
Firefox Browser Agent Bridge
Control the user's actual Firefox browser session via WebSocket. This uses their real browser with existing logins and cookies - not a headless browser.
Quick Start
nohup firefox &>/dev/null &
browser ping
browser listTabs '{}'
browser newSession '{"url": "https://example.com"}'
browser getContent '{"format": "annotated"}'
Client Usage
browser <action> '<json_params>'
browser --timing <action> '<json_params>'
browser --record /path/to/dir <action> '<json_params>'
Actions Reference
Session & Tab Management
| Action | Description | Key Params |
|---|
listTabs | List all open tabs across windows | - |
newSession | Create new tab to work in | url (optional), sandbox (private window) |
setActiveTab | Switch which tab agent works on | tabId, focus |
getActiveTab | Get current tab info | - |
Navigation & Page Info
| Action | Description | Key Params |
|---|
navigate | Go to URL in current tab | url, wait, newTab |
getContent | Get page content | format: annotated, text, html |
getInteractables | List clickable elements and inputs | selector (optional scope) |
screenshot | Capture visible area as PNG | filename (optional) |
reload | Reload current tab | - |
Interaction
| Action | Description | Key Params |
|---|
click | Click element | selector, text, or x/y coords |
type | Type into focused/selected input | selector, text, submit, clear |
fillForm | Fill form fields (inputs, textareas, selects) | fields[] array with selector/value |
scroll | Scroll the page or an element | y/x, selector, position |
waitFor | Wait for element/text | selector, text, timeout |
Rich Text Editors & File Operations
| Action | Description | Key Params |
|---|
uploadFile | Upload file to <input type="file"> | selector, path (local file path) |
dropFile | Drag-and-drop file onto element | selector, path (local file path) |
evaluate | Run JavaScript in page context | script, pageWorld (bool) |
uploadFile - Upload Files to Input Elements
browser uploadFile '{"selector": "#fileInput", "path": "/home/user/doc.pdf"}'
The CLI reads the file, base64-encodes it, and sends it via fillForm internally. Works with any <input type="file"> including hidden ones.
dropFile - Drag-and-Drop Files
browser dropFile '{"selector": "#dropZone", "path": "/home/user/image.png"}'
Simulates a native file drop. Creates a DataTransfer with the file and dispatches dragenter → dragover → drop events in the page world. Works with drop zones, contenteditable editors, and any element with drop handlers.
evaluate - Run JavaScript on the Page
browser evaluate '{"script": "return document.title"}'
browser evaluate '{"script": "return window.someAppState", "pageWorld": true}'
Use pageWorld: true when you need to interact with the page's own JavaScript context (React state, app globals, etc). Default runs in content script context.
fillForm - The Right Way to Fill Forms
IMPORTANT: There is no fill command. Use fillForm with a fields array:
browser fillForm '{"fields": [{"selector": "#email", "value": "test@example.com"}]}'
browser fillForm '{"fields": [
{"selector": "#name", "value": "John Doe"},
{"selector": "#email", "value": "john@example.com"},
{"selector": "#subject", "value": "support"},
{"selector": "#message", "value": "Hello world"}
]}'
Works with: <input>, <textarea>, <select>, checkboxes, radio buttons, contenteditable elements, rich text editors (Draft.js, Lexical, TinyMCE, ProseMirror).
For rich text editors, fillForm automatically detects the editor type and uses the appropriate insertion method (execCommand, InputEvent, or direct DOM manipulation in page world).
Control Flow
| Action | Description | Key Params |
|---|
fork | Duplicate tab into multiple paths | paths[] with name + commands |
killFork | Close a fork | fork (name) |
listForks | List active forks | - |
tryUntil | Try alternatives until one succeeds | alternatives[], timeout |
parallel | Run commands on multiple URLs | branches[] with url + commands |
batch | Run multiple commands in sequence | commands[] |
Authentication & Vault
| Action | Description | Key Params |
|---|
autoLogin | Auto-fill credentials from Bitwarden vault and optionally submit | domain, submit (default false) |
vaultStatus | Check vault lock state and credential count | - |
vaultSync | Re-sync vault from Bitwarden server via API key | - |
getAuthContext | Detect login pages, available accounts | - |
requestAuth | Request user approval for auth | reason |
Recommended Workflow
1. Start by Inspecting Available Tabs
browser listTabs '{}'
Returns:
{
"activeTabId": 123,
"windows": [
{
"windowId": 1,
"focused": true,
"tabs": [
{"tabId": 123, "url": "https://...", "title": "...", "active": true}
]
}
],
"totalTabs": 5
}
2. Start Fresh or Pick Existing Tab
browser newSession '{"url": "https://amazon.com"}'
browser newSession '{"url": "https://example.com", "sandbox": true}'
browser setActiveTab '{"tabId": 456}'
3. Read Page with Annotated Format (Recommended)
browser getContent '{"format": "annotated"}'
Returns content with interactive elements marked inline:
Product Name Here
$4.99
[button: "Add to cart" | selector: #add-btn]
[input:text: "search" | value: "" | selector: #search-box]
[link: "View details" | href: /product/123 | selector: a.details-link]
This shows what's clickable and where it is in context.
4. Handle Login Pages
If you land on a login page or get redirected to one, use autoLogin to fill credentials automatically from the Bitwarden vault:
browser autoLogin '{"domain": "github.com", "submit": true}'
This looks up the domain in the vault, fills the username/password, and optionally submits. You don't need to find or interact with form fields manually. After login, use getContent to verify you're logged in.
5. Interact Using Selectors
browser click '{"selector": "#add-btn"}'
browser click '{"text": "Add to cart"}'
browser type '{"selector": "#search-box", "text": "query", "submit": true}'
Fork: Speculative Parallel Execution
When you're not sure which path is right, fork the tab and try both:
browser fork '{
"paths": [
{
"name": "google-auth",
"commands": [{"action": "click", "params": {"text": "Sign in with Google"}}]
},
{
"name": "email-auth",
"commands": [{"action": "click", "params": {"text": "Sign in with Email"}}]
}
]
}'
Returns:
{
"forked": true,
"sourceTabId": 123,
"forks": [
{"name": "google-auth", "tabId": 456, "url": "...", "commandResults": [...]},
{"name": "email-auth", "tabId": 789, "url": "...", "commandResults": [...]}
]
}
Work on specific fork:
browser getContent '{"format": "annotated", "fork": "google-auth"}'
browser click '{"text": "Continue", "fork": "google-auth"}'
Kill the wrong path:
browser killFork '{"fork": "email-auth"}'
TryUntil: Handle Uncertain UI
When the exact button varies (cookie banners, A/B tests):
browser tryUntil '{
"alternatives": [
{"action": "click", "params": {"selector": "#accept-cookies"}},
{"action": "click", "params": {"text": "Accept All"}},
{"action": "click", "params": {"selector": ".cookie-dismiss"}}
],
"timeout": 3000
}'
Tries each until one succeeds.
Parallel: Multiple URLs at Once
Compare prices across sites:
browser parallel '{
"branches": [
{"url": "https://amazon.com/product", "commands": [{"action": "getContent", "params": {"format": "text"}}]},
{"url": "https://walmart.com/product", "commands": [{"action": "getContent", "params": {"format": "text"}}]}
]
}'
Authentication (Autonomous Login)
The bridge integrates with a Bitwarden vault (via bronzewarden) for fully autonomous credential fill. No human interaction needed.
Initial Setup
1. Install bronzewarden (local Bitwarden vault client):
cd ~/projects/bronzewarden
cargo install --path .
2. Log in to Bitwarden:
bronzewarden login -e you@example.com
export BW_CLIENTID="user.xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
export BW_CLIENTSECRET="your_client_secret"
bronzewarden login --apikey
3. Sync the vault:
bronzewarden sync
This downloads and caches your encrypted vault locally at ~/.config/bronzewarden/vault.json.
4. Set up passwordless unlock (so the native host can unlock the vault automatically):
bronzewarden setup-fingerprint
This prompts for your master password one last time, derives the decryption key, and stores it in ~/.config/bronzewarden/protected_key.json. After this, the native host unlocks the vault at startup without any password prompt.
Alternative: password file unlock (if you don't want to use setup-fingerprint):
echo "your_master_password" > ~/.config/bronzewarden/master_password
chmod 600 ~/.config/bronzewarden/master_password
export BW_PASSWORD_FILE=~/.config/bronzewarden/master_password
Add the BW_PASSWORD_FILE export to native-host-wrapper.sh so it's available when Firefox launches the host.
5. Set up API key for vault sync (needed for browser vaultSync):
echo "user.xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" > ~/.config/bronzewarden/client_id
echo "your_client_secret" > ~/.config/bronzewarden/client_secret
chmod 600 ~/.config/bronzewarden/client_id ~/.config/bronzewarden/client_secret
Or set environment variables BW_CLIENT_ID and BW_CLIENT_SECRET in native-host-wrapper.sh.
6. Verify it works:
browser ping
browser vaultStatus '{}'
Fingerprint Verification (Optional Security)
By default, autoLogin requires fingerprint verification before filling credentials. This is controlled by:
export FAB_AUTOLOGIN_REQUIRE_FINGERPRINT=true
export FAB_AUTOLOGIN_REQUIRE_FINGERPRINT=false
When enabled, a desktop notification prompts you to touch the fingerprint sensor before credentials are filled. Requires fprintd to be installed and a fingerprint enrolled.
Auto-Login Flow
browser navigate '{"url": "https://github.com"}'
browser autoLogin '{"domain": "github.com", "submit": false}'
browser autoLogin '{"domain": "github.com", "submit": true}'
Vault Management
browser vaultStatus '{}'
browser vaultSync '{}'
How It Works
- Credentials are stored in Bitwarden and decrypted locally by the native host
- The
autoLogin action sends credentials directly to the extension via the native messaging channel (never over WebSocket)
- Vault is auto-unlocked at host startup using a protected key (from
setup-fingerprint) or a password file (BW_PASSWORD_FILE)
- Credential lookup priority:
BW_PASSWORD env var → BW_PASSWORD_FILE → protected key → gnome-keyring (deprecated)
- API credential lookup priority:
BW_CLIENT_ID/BW_CLIENT_SECRET env vars → ~/.config/bronzewarden/client_id/client_secret files
Legacy Auth Detection
browser getAuthContext '{}'
Evaluate: Run JavaScript and Get Results
Execute arbitrary JavaScript in the page context and get the result back:
browser evaluate '{"script": "return document.title"}'
browser evaluate '{"script": "return document.querySelectorAll(\"input\").length"}'
browser evaluate '{"script": "return document.querySelector(\"#email\").value"}'
browser evaluate '{"script": "return Array.from(document.querySelectorAll(\"input:checked\")).map(el => el.value)"}'
Note: Use return to get a value back. The script runs in page context with full DOM access.
Scroll: Navigate Long Pages
Scroll the page by pixels, to elements, or to positions:
browser scroll '{"y": 500}'
browser scroll '{"y": -300}'
browser scroll '{"selector": "#section-5"}'
browser scroll '{"position": "top"}'
browser scroll '{"position": "bottom"}'
browser scroll '{"y": 500, "behavior": "smooth"}'
browser scroll '{"scrollTo": {"x": 0, "y": 1000}}'
Form State in Annotated Content
The getContent annotated format now shows form element states:
browser getContent '{"format": "annotated"}'
Output includes checked/selected states:
[input:radio: "Option A" | checked: true | selector: #opt-a]
[input:radio: "Option B" | checked: false | selector: #opt-b]
[input:checkbox: "Remember me" | checked: true | selector: #remember]
[select: "Country" | selected: "United States" | selector: #country]
[input:text: "Email" | value: "user@example.com" | selector: #email]
This is useful for verifying form state without screenshots.
Isolated Sessions (for Parallel Execution)
When running multiple tasks in parallel, use tabId to avoid conflicts:
browser newSession '{"url": "https://example.com"}'
browser navigate '{"url": "https://example.com/page", "tabId": 15}'
browser getContent '{"format": "annotated", "tabId": 15}'
browser click '{"selector": "#btn", "tabId": 15}'
browser type '{"selector": "#input", "text": "hello", "tabId": 15}'
This lets multiple agents work in parallel without stepping on each other.
Tips
- Start with
listTabs to see what's open
- Use
newSession for a clean start
- Use
autoLogin when you hit a login page - don't try to fill login forms manually, just browser autoLogin '{"domain": "example.com", "submit": true}'
- Use
tabId for parallel/isolated execution
- Use
annotated format - shows content + clickable elements together
- Use selectors from annotated output - more reliable than text matching
- Fork when uncertain - try multiple paths, kill the wrong ones
- Never use
sleep commands - browser commands are synchronous and wait for completion. Use waitFor action if you need to wait for specific elements or text to appear
- Use
uploadFile for file inputs - reads local files and uploads automatically
- Use
dropFile for drop zones - simulates native drag-and-drop
- Use
evaluate for custom JS - with pageWorld: true for page-context access
Troubleshooting
- Firefox not running? Start it:
nohup firefox &>/dev/null &
- Check connection:
browser ping
- Connection refused? The extension may need to be reloaded in
about:debugging
- Element not found? Use
browser getContent '{"format": "annotated"}' to see what's on the page
- Rich text editor not filling?
fillForm handles Draft.js, Lexical, TinyMCE, ProseMirror automatically
- File drop not working?
dropFile runs in page world to bypass Firefox security restrictions