| name | firefox-browser |
| description | Control the user's Firefox browser with their logins and cookies intact. Use when you need to browse websites as the user, interact with authenticated pages, fill forms, click buttons, take screenshots, or get page content. (user) |
| allowed-tools | Bash, Read, Write |
Firefox Browser Agent Bridge
Control the user's actual Firefox browser session via WebSocket. This uses their real browser with existing logins and cookies - not a headless browser.
Quick Start
nohup firefox &>/dev/null &
browser ping
browser listTabs '{}'
browser newSession '{"url": "https://example.com"}'
browser getContent '{"format": "annotated"}'
Client Usage
browser <action> '<json_params>'
Actions Reference
Session & Tab Management
| Action | Description | Key Params |
|---|
listTabs | List all open tabs across windows | - |
newSession | Create new tab to work in | url (optional) |
setActiveTab | Switch which tab agent works on | tabId, focus |
getActiveTab | Get current tab info | - |
Navigation & Page Info
| Action | Description | Key Params |
|---|
navigate | Go to URL in current tab | url, wait, newTab |
getContent | Get page content | format: annotated, text, html |
getInteractables | List clickable elements and inputs | selector (optional scope) |
screenshot | Capture visible area as PNG | filename (optional) |
Interaction
| Action | Description | Key Params |
|---|
click | Click element | selector, text, or x/y coords |
type | Type into input | selector, text, submit, clear |
fillForm | Fill multiple fields | fields[] with selector/value pairs |
waitFor | Wait for element/text | selector, text, timeout |
scroll | Scroll the page | y/x, selector, position |
evaluate | Execute JavaScript and return result | script |
Control Flow
| Action | Description | Key Params |
|---|
fork | Duplicate tab into multiple paths | paths[] with name + commands |
killFork | Close a fork | fork (name) |
listForks | List active forks | - |
tryUntil | Try alternatives until one succeeds | alternatives[], timeout |
parallel | Run commands on multiple URLs | branches[] with url + commands |
Cross-Origin Iframe Support
Many login pages (Apple, Google, Microsoft) embed their sign-in forms inside cross-origin iframes. Standard click/type/getContent only target the main frame by default. Use these to work inside iframes:
| Action | Description | Key Params |
|---|
listFrames | List all frames in current tab with their URLs, inputs, and clickable elements | - |
| Any action | Target a specific frame | Add "frameId": N to params |
| Any action | Try all frames | Add "allFrames": true to params |
Workflow for login iframes (Apple, Google, etc.):
browser listFrames '{}'
browser getContent '{"frameId": 3, "format": "annotated"}'
browser click '{"selector": "#account_name_text_field", "frameId": 3}'
browser type '{"selector": "#account_name_text_field", "text": "user@example.com", "frameId": 3}'
browser click '{"selector": "#account_name_text_field", "allFrames": true}'
browser type '{"selector": "#account_name_text_field", "text": "user@example.com", "allFrames": true}'
Authentication & Vault
| Action | Description | Key Params |
|---|
autoLogin | Auto-fill credentials from Bitwarden vault and optionally submit | domain, submit (default false) |
vaultStatus | Check vault lock state and credential count | - |
vaultSync | Re-sync vault from Bitwarden server via API key | - |
getAuthContext | Detect login pages, available accounts | - |
requestAuth | Request user approval for auth | reason |
Rich Text Editors (ProseMirror, Lexical, Slate, etc.)
The type and fillForm actions automatically handle rich text editors (ProseMirror, Lexical/Reddit, Draft.js, Tiptap, Slate, CKEditor, Quill). They use document.execCommand("insertText") in the page world, which works with any contenteditable-based editor.
browser type '{"selector": "div[contenteditable=true]", "text": "Hello world!"}'
browser type '{"selector": ".ProseMirror", "text": "New content", "clear": true}'
browser fillForm '{"fields": [
{"selector": "#title", "value": "My Title"},
{"selector": "#body .ProseMirror", "value": "Article body text"}
]}'
No special handling needed — just use type or fillForm as normal. Falls back to textContent assignment if execCommand isn't available.
1. Start by Inspecting Available Tabs
browser listTabs '{}'
Returns:
{
"activeTabId": 123,
"windows": [
{
"windowId": 1,
"focused": true,
"tabs": [
{"tabId": 123, "url": "https://...", "title": "...", "active": true}
]
}
],
"totalTabs": 5
}
2. Start Fresh or Pick Existing Tab
browser newSession '{"url": "https://amazon.com"}'
browser setActiveTab '{"tabId": 456}'
3. Read Page with Annotated Format (Recommended)
browser getContent '{"format": "annotated"}'
Returns content with interactive elements marked inline:
Product Name Here
$4.99
[button: "Add to cart" | selector: #add-btn]
[input:text: "search" | value: "" | selector: #search-box]
[link: "View details" | href: /product/123 | selector: a.details-link]
This shows what's clickable and where it is in context.
4. Interact Using Selectors
browser click '{"selector": "#add-btn"}'
browser click '{"text": "Add to cart"}'
browser type '{"selector": "#search-box", "text": "query", "submit": true}'
Fork: Speculative Parallel Execution
When you're not sure which path is right, fork the tab and try both:
browser fork '{
"paths": [
{
"name": "google-auth",
"commands": [{"action": "click", "params": {"text": "Sign in with Google"}}]
},
{
"name": "email-auth",
"commands": [{"action": "click", "params": {"text": "Sign in with Email"}}]
}
]
}'
Returns:
{
"forked": true,
"sourceTabId": 123,
"forks": [
{"name": "google-auth", "tabId": 456, "url": "...", "commandResults": [...]},
{"name": "email-auth", "tabId": 789, "url": "...", "commandResults": [...]}
]
}
Work on specific fork:
browser getContent '{"format": "annotated", "fork": "google-auth"}'
browser click '{"text": "Continue", "fork": "google-auth"}'
Kill the wrong path:
browser killFork '{"fork": "email-auth"}'
TryUntil: Handle Uncertain UI
When the exact button varies (cookie banners, A/B tests):
browser tryUntil '{
"alternatives": [
{"action": "click", "params": {"selector": "#accept-cookies"}},
{"action": "click", "params": {"text": "Accept All"}},
{"action": "click", "params": {"selector": ".cookie-dismiss"}}
],
"timeout": 3000
}'
Tries each until one succeeds.
Parallel: Multiple URLs at Once
Compare prices across sites:
browser parallel '{
"branches": [
{"url": "https://amazon.com/product", "commands": [{"action": "getContent", "params": {"format": "text"}}]},
{"url": "https://walmart.com/product", "commands": [{"action": "getContent", "params": {"format": "text"}}]}
]
}'
Authentication (Autonomous Login)
The bridge integrates with a Bitwarden vault (via bronzewarden) for fully autonomous credential fill. No human interaction needed.
Auto-Login Flow
browser navigate '{"url": "https://github.com"}'
browser autoLogin '{"domain": "github.com", "submit": false}'
browser autoLogin '{"domain": "github.com", "submit": true}'
Vault Management
browser vaultStatus '{}'
browser vaultSync '{}'
How It Works
- Credentials are stored in Bitwarden and decrypted locally by the native host
- The
autoLogin action sends credentials directly to the extension via the native messaging channel (never over WebSocket)
- Vault is auto-unlocked at host startup using a master password from the system keyring
Legacy Auth Detection
browser getAuthContext '{}'
Evaluate: Run JavaScript and Get Results
Execute arbitrary JavaScript in the page context and get the result back:
browser evaluate '{"script": "return document.title"}'
browser evaluate '{"script": "return document.querySelectorAll(\"input\").length"}'
browser evaluate '{"script": "return document.querySelector(\"#email\").value"}'
browser evaluate '{"script": "return Array.from(document.querySelectorAll(\"input:checked\")).map(el => el.value)"}'
Page World Evaluation
By default, evaluate runs in the content script's isolated world. To access page-level JavaScript variables (e.g., framework state, global objects set by the page), use pageWorld: true:
browser evaluate '{"script": "return window.__NEXT_DATA__", "pageWorld": true}'
browser evaluate '{"script": "return Object.keys(window.__prosemirrorViews || {})", "pageWorld": true}'
browser evaluate '{"script": "return window.myApp.getState()", "pageWorld": true}'
Note: Use return to get a value back. The script runs with full DOM access. Use pageWorld: true when you need to access variables set by the page's own JavaScript.
Scroll: Navigate Long Pages
Scroll the page by pixels, to elements, or to positions:
browser scroll '{"y": 500}'
browser scroll '{"y": -300}'
browser scroll '{"selector": "#section-5"}'
browser scroll '{"position": "top"}'
browser scroll '{"position": "bottom"}'
browser scroll '{"y": 500, "behavior": "smooth"}'
browser scroll '{"scrollTo": {"x": 0, "y": 1000}}'
Form State in Annotated Content
The getContent annotated format now shows form element states:
browser getContent '{"format": "annotated"}'
Output includes checked/selected states:
[input:radio: "Option A" | checked: true | selector: #opt-a]
[input:radio: "Option B" | checked: false | selector: #opt-b]
[input:checkbox: "Remember me" | checked: true | selector: #remember]
[select: "Country" | selected: "United States" | selector: #country]
[input:text: "Email" | value: "user@example.com" | selector: #email]
This is useful for verifying form state without screenshots.
Isolated Sessions (for Parallel Execution)
When running multiple tasks in parallel, use tabId to avoid conflicts:
browser newSession '{"url": "https://example.com"}'
browser navigate '{"url": "https://example.com/page", "tabId": 15}'
browser getContent '{"format": "annotated", "tabId": 15}'
browser click '{"selector": "#btn", "tabId": 15}'
browser type '{"selector": "#input", "text": "hello", "tabId": 15}'
This lets multiple agents work in parallel without stepping on each other.
Tips
- Start with
listTabs to see what's open
- Use
newSession for a clean start
- Use
tabId for parallel/isolated execution
- Use
annotated format - shows content + clickable elements together
- Use selectors from annotated output - more reliable than text matching
- Fork when uncertain - try multiple paths, kill the wrong ones
Troubleshooting
- Firefox not running? Start it:
nohup firefox &>/dev/null &
- Check connection:
browser ping
- Connection refused? The extension may need to be reloaded in
about:debugging
- Element not found? Use
browser getContent '{"format": "annotated"}' to see what's on the page