一键导入
browse
Browser automation via agent-browser CLI. Navigate, snapshot accessibility tree, interact by element ref.
用 Codex 或 Claude 帮你安装 复制这段 Prompt,粘贴到 Codex、Claude 或其他助手里,让它检查 Skill 页面并帮你完成安装。
菜单
Browser automation via agent-browser CLI. Navigate, snapshot accessibility tree, interact by element ref.
用 Codex 或 Claude 帮你安装 复制这段 Prompt,粘贴到 Codex、Claude 或其他助手里,让它检查 Skill 页面并帮你完成安装。
基于 SOC 职业分类
Fetch standardized financial statements (income, balance sheet, cash flow, earnings, company overview) for a ticker from Alpha Vantage. Returns combined annual+quarterly JSON for analysis.
Aim the ChatterBot head to find and center a target in view — the user, a person, an object, an animal. Runs a closed visual loop (capture, judge where the target is, nudge pan/tilt, repeat) until the target is centered, or reports it could not find the target after searching. Use when the user says point at me, look at me, turn to face someone, find the cat, center on the person. For a one-off snapshot without re-aiming use camera-capture; for a manual fixed angle use head-move.
Capture a still photo from the ChatterBot head camera. The captured frame is attached to your own visual input, so you can SEE it and answer questions about what is in view — whether the user is present, whether there is a cat, what the scene looks like. The camera rides the pan/tilt head, so it shows whatever the head is currently aimed at; aim first with head-move if needed. To also show the photo to the user on screen, follow with the display tool (the observation includes a ready <img> URL). Use when the user asks what you see, to take a picture or snapshot, or to check whether something or someone is in view.
Move the ChatterBot head — aim the pan/tilt camera or play an expressive gesture. The bot is a stationary companion head; this points its gaze, it does NOT drive or navigate. Use when the user asks you to look somewhere, turn toward/away, look up/down, re-center, or nod/shake/scan. Angles are degrees 0-180 with 90 centered (pan 0=full right, 180=full left; tilt 0=down, 180=up, mounting-dependent). Give pan and/or tilt for absolute aim, OR a gesture (not both). Returns the confirmed pose once the head settles.
Return one genuine saying of Ramana Maharshi, drawn verbatim from his recorded talks, with source attribution. Use when delivering an authentic Ramana quote with attribution — not a paraphrase or a synthesized reflection. The returned text is a raw quote; add your own brief framing before presenting it.
Generate an original image from a text description, locally (Bonsai-Image 4B, ternary-quantized, on a long-lived studio server). Use when the user wants a picture, illustration, avatar, or face created from a description that does not already exist on the web. For existing photos of real things, prefer image search instead; for simple diagrams or line drawings, prefer authoring inline SVG.
| name | browse |
| type | python |
| description | Browser automation via agent-browser CLI. Navigate, snapshot accessibility tree, interact by element ref. |
| schema_hint | {"action":"string (required): open|snapshot|text|click|type|fill|press|scroll|get|eval|close|batch","url":"string (for open)","selector":"string (CSS selector or @ref from snapshot, for click/type/fill/press/scroll/get)","text":"string (for type/fill)","key":"string (for press, e.g. 'Enter', 'Tab')","expression":"string (for eval — JavaScript expression)","subcommand":"string (for get: text|html|value|title|url)","actions":"list of [command, ...args] arrays (for batch)","out":"$variable"} |
Control a persistent browser session via agent-browser. Navigate pages, read the accessibility tree, and interact with elements by ref.
@e1, @e2, ...){"type":"browse","action":"open","url":"https://example.com","out":"$status"}
{"type":"browse","action":"snapshot","out":"$page"}
Returns interactive elements only (buttons, links, inputs) with element refs (@e1, @e2, ...). Use for finding clickable elements. Supports optional selector to scope.
{"type":"browse","action":"text","out":"$page_text"}
Returns the readable text content of the page (not HTML, not accessibility tree). Use this for content extraction — then apply extract to pull out specific information. Supports optional selector to scope (e.g., "selector": "#mw-content-text" for Wikipedia article body). Prefer text over eval for extracting page content.
{"type":"browse","action":"click","selector":"@e3","out":"$r"}
{"type":"browse","action":"fill","selector":"@e5","text":"search query","out":"$r"}
fill clears the field first; type appends.
{"type":"browse","action":"press","key":"Enter","out":"$r"}
{"type":"browse","action":"scroll","text":"down","out":"$r"}
Directions: up, down, left, right. Optional selector to scroll within an element.
{"type":"browse","action":"get","subcommand":"text","selector":"@e2","out":"$content"}
{"type":"browse","action":"get","subcommand":"title","out":"$title"}
{"type":"browse","action":"get","subcommand":"url","out":"$url"}
Subcommands: text, html, value (require selector), title, url (no selector needed).
{"type":"browse","action":"eval","expression":"document.title","out":"$result"}
{"type":"browse","action":"close","out":"$r"}
{"type":"browse","action":"batch","actions":[["open","https://news.ycombinator.com"],["wait","2000"],["snapshot","-i","-c"]],"out":"$page"}
Each inner array is [command, arg1, arg2, ...]. Commands run sequentially; stops on first error. The last command's output is returned. Use batch for deterministic sequences (open+wait+snapshot, or multiple fills) to conserve planner steps. Do NOT batch when you need to read a snapshot before deciding what to do next.
Success: resource_id for snapshot (Note with accessibility tree), value for everything else.
Failure: reason with error details.
Step 1 — Open and get page text (one code block):
r1 = tool("browse", action="open", url="https://en.wikipedia.org/wiki/Cognitive_architecture", out="$status")
r2 = tool("browse", action="text", out="$page_text")
Step 2 — Extract specific content using the extract tool (separate code block):
r = tool("extract", target="$page_text", instruction="Extract the first paragraph of the article", out="$para")
r2 = tool("create-note", value=get_text("$para"), name="my_result", out="$note")
Step 1 — Open and snapshot to find interactive elements:
r1 = tool("browse", action="open", url="https://example.com/search", out="$status")
r2 = tool("browse", action="snapshot", out="$page")
Step 2 — Read snapshot, interact by ref (separate code block):
page = get_text("$page") # Read accessibility tree to find element refs
r1 = tool("browse", action="fill", selector="@e5", text="search query", out="$r")
r2 = tool("browse", action="click", selector="@e7", out="$r")
r3 = tool("browse", action="text", out="$results")
text + extract, NOT eval with JavaScript. The text action gets readable page text; the extract tool pulls out what you need. No JS required.snapshot only when you need to interact (click, fill, press). It shows interactive elements with refs.@e1, @e2) are only valid until the next snapshot — page changes invalidate them.eval, the tool auto-wraps in an IIFE if you use const/let, so variable redeclaration across steps is safe.close when done if the page won't be needed again.agent-browser CLI installed and on PATH.