원클릭으로
browse
Browser automation via agent-browser CLI. Navigate, snapshot accessibility tree, interact by element ref.
Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.
메뉴
Browser automation via agent-browser CLI. Navigate, snapshot accessibility tree, interact by element ref.
Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.
SOC 직업 분류 기준
Fetch standardized financial statements (income, balance sheet, cash flow, earnings, company overview) for a ticker from Alpha Vantage. Returns combined annual+quarterly JSON for analysis.
Aim the ChatterBot head to find and center a target in view — the user, a person, an object, an animal. Runs a closed visual loop (capture, judge where the target is, nudge pan/tilt, repeat) until the target is centered, or reports it could not find the target after searching. Use when the user says point at me, look at me, turn to face someone, find the cat, center on the person. For a one-off snapshot without re-aiming use camera-capture; for a manual fixed angle use head-move.
Capture a still photo from the ChatterBot head camera. The captured frame is attached to your own visual input, so you can SEE it and answer questions about what is in view — whether the user is present, whether there is a cat, what the scene looks like. The camera rides the pan/tilt head, so it shows whatever the head is currently aimed at; aim first with head-move if needed. To also show the photo to the user on screen, follow with the display tool (the observation includes a ready <img> URL). Use when the user asks what you see, to take a picture or snapshot, or to check whether something or someone is in view.
Move the ChatterBot head — aim the pan/tilt camera or play an expressive gesture. The bot is a stationary companion head; this points its gaze, it does NOT drive or navigate. Use when the user asks you to look somewhere, turn toward/away, look up/down, re-center, or nod/shake/scan. Angles are degrees 0-180 with 90 centered (pan 0=full right, 180=full left; tilt 0=down, 180=up, mounting-dependent). Give pan and/or tilt for absolute aim, OR a gesture (not both). Returns the confirmed pose once the head settles.
Return one genuine saying of Ramana Maharshi, drawn verbatim from his recorded talks, with source attribution. Use when delivering an authentic Ramana quote with attribution — not a paraphrase or a synthesized reflection. The returned text is a raw quote; add your own brief framing before presenting it.
Generate an original image from a text description, locally (Bonsai-Image 4B, ternary-quantized, on a long-lived studio server). Use when the user wants a picture, illustration, avatar, or face created from a description that does not already exist on the web. For existing photos of real things, prefer image search instead; for simple diagrams or line drawings, prefer authoring inline SVG.
| name | browse |
| type | python |
| description | Browser automation via agent-browser CLI. Navigate, snapshot accessibility tree, interact by element ref. |
| schema_hint | {"action":"string (required): open|snapshot|text|click|type|fill|press|scroll|get|eval|close|batch","url":"string (for open)","selector":"string (CSS selector or @ref from snapshot, for click/type/fill/press/scroll/get)","text":"string (for type/fill)","key":"string (for press, e.g. 'Enter', 'Tab')","expression":"string (for eval — JavaScript expression)","subcommand":"string (for get: text|html|value|title|url)","actions":"list of [command, ...args] arrays (for batch)","out":"$variable"} |
Control a persistent browser session via agent-browser. Navigate pages, read the accessibility tree, and interact with elements by ref.
@e1, @e2, ...){"type":"browse","action":"open","url":"https://example.com","out":"$status"}
{"type":"browse","action":"snapshot","out":"$page"}
Returns interactive elements only (buttons, links, inputs) with element refs (@e1, @e2, ...). Use for finding clickable elements. Supports optional selector to scope.
{"type":"browse","action":"text","out":"$page_text"}
Returns the readable text content of the page (not HTML, not accessibility tree). Use this for content extraction — then apply extract to pull out specific information. Supports optional selector to scope (e.g., "selector": "#mw-content-text" for Wikipedia article body). Prefer text over eval for extracting page content.
{"type":"browse","action":"click","selector":"@e3","out":"$r"}
{"type":"browse","action":"fill","selector":"@e5","text":"search query","out":"$r"}
fill clears the field first; type appends.
{"type":"browse","action":"press","key":"Enter","out":"$r"}
{"type":"browse","action":"scroll","text":"down","out":"$r"}
Directions: up, down, left, right. Optional selector to scroll within an element.
{"type":"browse","action":"get","subcommand":"text","selector":"@e2","out":"$content"}
{"type":"browse","action":"get","subcommand":"title","out":"$title"}
{"type":"browse","action":"get","subcommand":"url","out":"$url"}
Subcommands: text, html, value (require selector), title, url (no selector needed).
{"type":"browse","action":"eval","expression":"document.title","out":"$result"}
{"type":"browse","action":"close","out":"$r"}
{"type":"browse","action":"batch","actions":[["open","https://news.ycombinator.com"],["wait","2000"],["snapshot","-i","-c"]],"out":"$page"}
Each inner array is [command, arg1, arg2, ...]. Commands run sequentially; stops on first error. The last command's output is returned. Use batch for deterministic sequences (open+wait+snapshot, or multiple fills) to conserve planner steps. Do NOT batch when you need to read a snapshot before deciding what to do next.
Success: resource_id for snapshot (Note with accessibility tree), value for everything else.
Failure: reason with error details.
Step 1 — Open and get page text (one code block):
r1 = tool("browse", action="open", url="https://en.wikipedia.org/wiki/Cognitive_architecture", out="$status")
r2 = tool("browse", action="text", out="$page_text")
Step 2 — Extract specific content using the extract tool (separate code block):
r = tool("extract", target="$page_text", instruction="Extract the first paragraph of the article", out="$para")
r2 = tool("create-note", value=get_text("$para"), name="my_result", out="$note")
Step 1 — Open and snapshot to find interactive elements:
r1 = tool("browse", action="open", url="https://example.com/search", out="$status")
r2 = tool("browse", action="snapshot", out="$page")
Step 2 — Read snapshot, interact by ref (separate code block):
page = get_text("$page") # Read accessibility tree to find element refs
r1 = tool("browse", action="fill", selector="@e5", text="search query", out="$r")
r2 = tool("browse", action="click", selector="@e7", out="$r")
r3 = tool("browse", action="text", out="$results")
text + extract, NOT eval with JavaScript. The text action gets readable page text; the extract tool pulls out what you need. No JS required.snapshot only when you need to interact (click, fill, press). It shows interactive elements with refs.@e1, @e2) are only valid until the next snapshot — page changes invalidate them.eval, the tool auto-wraps in an IIFE if you use const/let, so variable redeclaration across steps is safe.close when done if the page won't be needed again.agent-browser CLI installed and on PATH.