ワンクリックで
browser-automation
// Use for complex Agent Zero browser automation, including multi-tab browsing, screenshots, forms, uploads, raw pointer/keyboard actions, host-vs-container browser mode, and visual verification workflows.
// Use for complex Agent Zero browser automation, including multi-tab browsing, screenshots, forms, uploads, raw pointer/keyboard actions, host-vs-container browser mode, and visual verification workflows.
Beta desktop control through the connected A0 CLI host. Use for screenshots, screen inspection, menus, native app UI, OS-level clicking, scrolling, typing, or checking computer_use_remote status. Do not use for ordinary browser navigation; host browser requests should use the browser tool.
Use when the user asks Agent Zero to operate the built-in Linux Desktop, XFCE apps, LibreOffice GUI apps, file manager, terminal, or visual desktop workflows.
Use when creating, opening, or editing LibreOffice Calc ODS spreadsheets, or XLSX workbooks only when Excel compatibility is explicitly required.
Use when creating, opening, or editing LibreOffice Impress ODP presentations, or PPTX decks only when PowerPoint compatibility is explicitly required.
Use when creating or editing Markdown documents, notes, reports, briefs, drafts, or other editable writing where Markdown should be the primary artifact format.
Use when creating, opening, reading, or editing Office artifacts such as LibreOffice-native ODT/ODS/ODP files and compatibility DOCX/XLSX/PPTX files with the office_artifact tool.
| name | browser-automation |
| description | Use for complex Agent Zero browser automation, including multi-tab browsing, screenshots, forms, uploads, raw pointer/keyboard actions, host-vs-container browser mode, and visual verification workflows. |
Use the browser tool for rendered pages, forms, logins, downloads, JavaScript-heavy sites, screenshots, and visual inspection. Prefer search_engine or document_query for plain text research.
open creates a browser tab and returns a browser_id.content returns readable markdown plus typed refs like [link 3], [button 6], [input text 8].click, type, submit, scroll, etc.navigate on an existing browser_id for serial browsing.The same tool may run in Docker container mode or A0 CLI host-browser mode, depending on project/plugin settings.
In host mode, page content and screenshots may be blocked by host-content policy when remote models are active.
Screenshots are explicit only; the browser does not automatically load images into model context.
browser with action: "screenshot".vision_load with the returned vision_load.tool_args.paths value.Screenshot args include quality, full_page, and optional path. Without path, the screenshot is an ephemeral ref consumed by vision_load; with path, PNG is used when path ends with .png, otherwise JPEG is used.
select_option works for native selects and detectable ARIA listbox/combobox controls.set_checked works for checkbox, radio, switch, and toggle-like refs.upload_file works for file input refs or associated labels; verify the file exists in the active browser environment.browser-form-workflows.hover, double_click, right_click, and drag accept refs or viewport coordinates.key_chord presses keys in order and releases in reverse.clipboard actions are copy, cut, or paste.set_viewport resizes the page viewport.list shows open tabs; pass include_content: true sparingly.set_active deliberately changes focus.multi is only a browser action, never a top-level tool. Use:
{
"tool_name": "browser",
"tool_args": {
"action": "multi",
"calls": [
{"action": "content", "browser_id": 1},
{"action": "screenshot", "browser_id": 2}
]
}
}
Use browser action multi for parallel reads across tabs. Avoid mutating the same tab twice in one batch unless serial order is intended.