원클릭으로 Manus에서 모든 스킬 실행

$pwd:

agent-browser

Name: Agent Browser
Author: hAcKlyc

// Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.

Manus에서 실행

$ git log --oneline --stat

stars:857

forks:85

updated:2026년 5월 25일 22:07

파일 탐색기

11 개 파일

SKILL.md

readonly

related-skills.json

같은 저장소

docx.md

from "hAcKlyc/MyAgents"

Comprehensive document creation, editing, and analysis with support for tracked changes, comments, formatting preservation, and text extraction. When Claude needs to work with professional documents (.docx files) for: (1) Creating new documents, (2) Modifying or editing content, (3) Working with tracked changes, (4) Adding comments, or any other document tasks

2026-05-25857

download-anything.md

from "hAcKlyc/MyAgents"

Find and download virtually any digital resource from the internet — ebooks, academic papers, movies, TV shows, music, software, images, fonts, courses, and more. Covers both English and Chinese internet ecosystems. Includes CLI tool workflows (yt-dlp, aria2, gallery-dl, spotdl), resource site directories, cloud drive search engines (百度/阿里/夸克网盘搜索), and search techniques (Google dorks). Use when the user wants to: (1) download a video, audio, or media from a URL, (2) find and download an ebook or academic paper, (3) find and download software, (4) search for any digital resource, (5) batch download images or media from a gallery/site, (6) download torrents or magnet links, (7) find free stock assets (images, video, audio, fonts), (8) search Chinese cloud drives for resources, or (9) any task involving finding or downloading digital content from the internet.

2026-05-25857

pptx.md

from "hAcKlyc/MyAgents"

Presentation creation, editing, and analysis. When Claude needs to work with presentations (.pptx files) for: (1) Creating new presentations, (2) Modifying or editing content, (3) Working with layouts, (4) Adding comments or speaker notes, or any other presentation tasks

2026-05-25857

myagents-cli.md

from "hAcKlyc/MyAgents"

你正在 MyAgents 这款 AI 产品里运行——MyAgents 自带一套"产品能力"（定时任务、任务中心、想法收集、MCP 工具接入、模型 Provider、IM Bot 渠道、社区插件、Skills 安装、Generative UI Widget 等），全部通过内置 `myagents` CLI 暴露给你。当用户的需求**落在 MyAgents 产品能力的射程内**，就加载并使用这个 skill，用 CLI 主动帮用户把事情做掉，而不是让用户去 GUI 点击。典型触发场景：用户说"每天 X 点帮我 Y"（→ cron）、"记一下这个想法"（→ thought）、"派发成任务"（→ task）、 "接个 X 工具进来"（→ mcp）、"配 X 模型/Provider"（→ model）、"在飞书/钉钉/Telegram 里跟我聊"（→ agent channel）、 "装个 X 插件 / 装个 X skill"（→ plugin / skill）、"把图发到 IM 里"（→ im send-media）、"做个图表/仪表盘" （→ widget readme）、"看下我有啥任务/定时/Runtime/版本"（→ list / status / version）、"改下应用设置"（→ config）。即使用户没说"用 MyAgents 做"几个字，只要意图能映射到上述能力之一，就该走这个 skill。反向边界：纯业务任务（写代码、查资料、读文件）不归这里；用户自己会话里给 AI 排任务用 im-cron MCP，不是这里。

2026-05-18857

support.md

from "hAcKlyc/MyAgents"

MyAgents 用户问题响应与客服支持工作流。**任何时候用户在描述困难、报错、异常、不工作的情况—— 以及前端"召唤小助理"入口主动注入的诊断请求——都触发此 skill**。覆盖：(1) 功能异常 / 报错 / 崩溃的根因诊断， (2) 配置错误导致功能失效的排查与修复（修复时配合 `/myagents-cli` skill），(3) 功能使用困惑的解答， (4) 产品建议与功能需求收集。核心准则：**问题语境下「先理解后行动」压 CLAUDE.md 的「行动优先」**—— 先用 boot banner + CLI 只读命令 + 日志取证搞清根因，再决定是直接修、解释、提 Bug 还是提 Feature。配置错和使用困惑要直接解决，不轻易升级到 Issue 提交。

2026-05-03857

task-alignment.md

from "hAcKlyc/MyAgents"

Alignment conversation starting from a 想法/idea. Co-decides with the user whether the idea should be acted on directly in the current session, or fixed into a formal Task for independent dispatch (one-off or recurring). Handles lightweight 'do it now while we talk', heavyweight 'define precisely, run later or on a schedule', and 'just help me think about this' — all on the same skill. Use when the user arrives via the 想法 panel's 'AI 讨论' button (parameter dictionary in the first message), or says 'let's think this through', 'help me plan this', 'I want to explore X', 'I have an idea', '/task-alignment'. Also use proactively when a user jumps into a complex task without defining scope or success criteria — pause, align, and help them pick the right vessel (this session vs. a task).

2026-04-22857

package.json

"author": "hAcKlyc"

"repository": "hAcKlyc/MyAgents"

GitHub 저장소 열기 Creator 저장소 보기

$ install --global

$ download --local

Manus에서 실행

$ useful --forSOC

소프트웨어 개발자컴퓨터 및 수학직15-1252L4

name	agent-browser
description	Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
allowed-tools	Bash(agent-browser:), Bash(npm_config_prefix= npm install), Bash(npm install), Bash(npx -y agent-browser*)

Browser Automation with agent-browser

First-time Setup

The CLI is not pre-installed with the app — install it on first use, then download Chromium. Always run this self-check before issuing any agent-browser command in a fresh environment.

Step 1: Verify or install the CLI

# Probe by RUNNING the CLI (not just checking PATH presence). This catches the
# common case of a stale wrapper from a previous app version: it lives at
# ~/.myagents/bin/agent-browser, satisfies `command -v`, but execs a deleted
# bundle path → fails with "cannot find file". `--version` exercises the real
# code path and triggers the install fallback when broken.
agent-browser --version >/dev/null 2>&1 || npm_config_prefix="${MYAGENTS_NPM_GLOBAL_PREFIX:-$HOME/.myagents/npm-global}" npm install -g agent-browser@0.15.1

The install lands in ~/.myagents/npm-global/bin/agent-browser, which sits earlier in PATH than any legacy wrapper. MyAgents exposes MYAGENTS_NPM_GLOBAL_PREFIX for this command-local install instead of setting npm_config_prefix on the whole shell, so nvm-based user shells stay quiet. Subsequent agent-browser … calls find the new binary automatically.

Tell the user once that you're installing the browser tool (~few seconds the first time, instant afterward), then proceed.

If `npm install -g` fails

If the install fails (network blocked, registry unreachable, EACCES on a locked-down system Node), invoke the CLI via npx inline on every command — Bash aliases do not persist across separate tool calls in this environment, so each command must carry the prefix:

# Use the npx prefix on EVERY command. Do not try to alias — it won't survive.
npx -y agent-browser@0.15.1 open https://example.com
npx -y agent-browser@0.15.1 snapshot -i
npx -y agent-browser@0.15.1 click @e1

This is slower (~1s overhead per call) but works without an install step.

Step 2: Download Chromium (~160MB, one-time)

agent-browser install
# OR if you're on the npx fallback:
# npx -y agent-browser@0.15.1 install

Inform the user this download may take a minute on slow connections.

Troubleshooting

Symptom	Fix
`agent-browser` runs but shows "cannot find file"	Stale wrapper from a previous app version is shadowing the new install. The new install at `~/.myagents/npm-global/bin/` should win on PATH; if it doesn't, run `which agent-browser` to see which path resolves first, then either remove the stale path or invoke the new binary by its absolute path.
`npm install -g` exits with the registry blocked / network error	Use the `npx` inline fallback above. If `npx` also fails, the user's network is blocking the npm registry — ask them about proxy / VPN.
`agent-browser install` fails to download Chromium	Network issue / GFW. User may need a proxy or VPN. Ask the user.
`Executable doesn't exist` mid-task	Chromium got deleted or the install never finished. Re-run `agent-browser install`.

Core Workflow

Every browser automation follows this pattern:

Navigate: agent-browser open <url>
Snapshot: agent-browser snapshot -i (get element refs like @e1, @e2)
Interact: Use refs to click, fill, select
Re-snapshot: After navigation or DOM changes, get fresh refs

agent-browser open https://example.com/form
agent-browser snapshot -i
# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"

agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i  # Check result

Command Chaining

Commands can be chained with && in a single shell invocation. The browser persists between commands via a background daemon, so chaining is safe and more efficient than separate calls.

# Chain open + wait + snapshot in one call
agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i

# Chain multiple interactions
agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "password123" && agent-browser click @e3

# Navigate and capture
agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser screenshot page.png

When to chain: Use && when you don't need to read the output of an intermediate command before proceeding (e.g., open + wait + screenshot). Run commands separately when you need to parse the output first (e.g., snapshot to discover refs, then interact using those refs).

Essential Commands

# Navigation
agent-browser open <url>              # Navigate (aliases: goto, navigate)
agent-browser close                   # Close browser

# Snapshot
agent-browser snapshot -i             # Interactive elements with refs (recommended)
agent-browser snapshot -i -C          # Include cursor-interactive elements (divs with onclick, cursor:pointer)
agent-browser snapshot -s "#selector" # Scope to CSS selector

# Interaction (use @refs from snapshot)
agent-browser click @e1               # Click element
agent-browser click @e1 --new-tab     # Click and open in new tab
agent-browser fill @e2 "text"         # Clear and type text
agent-browser type @e2 "text"         # Type without clearing
agent-browser select @e1 "option"     # Select dropdown option
agent-browser check @e1               # Check checkbox
agent-browser press Enter             # Press key
agent-browser keyboard type "text"    # Type at current focus (no selector)
agent-browser keyboard inserttext "text"  # Insert without key events
agent-browser scroll down 500         # Scroll page
agent-browser scroll down 500 --selector "div.content"  # Scroll within a specific container

# Get information
agent-browser get text @e1            # Get element text
agent-browser get url                 # Get current URL
agent-browser get title               # Get page title

# Wait
agent-browser wait @e1                # Wait for element
agent-browser wait --load networkidle # Wait for network idle
agent-browser wait --url "**/page"    # Wait for URL pattern
agent-browser wait 2000               # Wait milliseconds

# Downloads
agent-browser download @e1 ./file.pdf          # Click element to trigger download
agent-browser wait --download ./output.zip     # Wait for any download to complete
agent-browser --download-path ./downloads open <url>  # Set default download directory

# Capture
agent-browser screenshot              # Screenshot to temp dir
agent-browser screenshot --full       # Full page screenshot
agent-browser screenshot --annotate   # Annotated screenshot with numbered element labels
agent-browser pdf output.pdf          # Save as PDF

# Diff (compare page states)
agent-browser diff snapshot                          # Compare current vs last snapshot
agent-browser diff snapshot --baseline before.txt    # Compare current vs saved file
agent-browser diff screenshot --baseline before.png  # Visual pixel diff
agent-browser diff url <url1> <url2>                 # Compare two pages
agent-browser diff url <url1> <url2> --wait-until networkidle  # Custom wait strategy
agent-browser diff url <url1> <url2> --selector "#main"  # Scope to element

Common Patterns

Form Submission

agent-browser open https://example.com/signup
agent-browser snapshot -i
agent-browser fill @e1 "Jane Doe"
agent-browser fill @e2 "jane@example.com"
agent-browser select @e3 "California"
agent-browser check @e4
agent-browser click @e5
agent-browser wait --load networkidle

Authentication with Auth Vault (Recommended)

# Save credentials once (encrypted with AGENT_BROWSER_ENCRYPTION_KEY)
# Recommended: pipe password via stdin to avoid shell history exposure
echo "pass" | agent-browser auth save github --url https://github.com/login --username user --password-stdin

# Login using saved profile (LLM never sees password)
agent-browser auth login github

# List/show/delete profiles
agent-browser auth list
agent-browser auth show github
agent-browser auth delete github

Authentication with State Persistence

# Login once and save state
agent-browser open https://app.example.com/login
agent-browser snapshot -i
agent-browser fill @e1 "$USERNAME"
agent-browser fill @e2 "$PASSWORD"
agent-browser click @e3
agent-browser wait --url "**/dashboard"
agent-browser state save auth.json

# Reuse in future sessions
agent-browser state load auth.json
agent-browser open https://app.example.com/dashboard

Session Persistence

# Auto-save/restore cookies and localStorage across browser restarts
agent-browser --session-name myapp open https://app.example.com/login
# ... login flow ...
agent-browser close  # State auto-saved to ~/.agent-browser/sessions/

# Next time, state is auto-loaded
agent-browser --session-name myapp open https://app.example.com/dashboard

# Encrypt state at rest
export AGENT_BROWSER_ENCRYPTION_KEY=$(openssl rand -hex 32)
agent-browser --session-name secure open https://app.example.com

# Manage saved states
agent-browser state list
agent-browser state show myapp-default.json
agent-browser state clear myapp
agent-browser state clean --older-than 7

Data Extraction

agent-browser open https://example.com/products
agent-browser snapshot -i
agent-browser get text @e5           # Get specific element text
agent-browser get text body > page.txt  # Get all page text

# JSON output for parsing
agent-browser snapshot -i --json
agent-browser get text @e1 --json

Parallel Sessions

agent-browser --session site1 open https://site-a.com
agent-browser --session site2 open https://site-b.com

agent-browser --session site1 snapshot -i
agent-browser --session site2 snapshot -i

agent-browser session list

Connect to Existing Chrome

# Auto-discover running Chrome with remote debugging enabled
agent-browser --auto-connect open https://example.com
agent-browser --auto-connect snapshot

# Or with explicit CDP port
agent-browser --cdp 9222 snapshot

Color Scheme (Dark Mode)

# Persistent dark mode via flag (applies to all pages and new tabs)
agent-browser --color-scheme dark open https://example.com

# Or via environment variable
AGENT_BROWSER_COLOR_SCHEME=dark agent-browser open https://example.com

# Or set during session (persists for subsequent commands)
agent-browser set media dark

Visual Browser (Debugging)

agent-browser --headed open https://example.com
agent-browser highlight @e1          # Highlight element
agent-browser record start demo.webm # Record session
agent-browser profiler start         # Start Chrome DevTools profiling
agent-browser profiler stop trace.json # Stop and save profile (path optional)

Local Files (PDFs, HTML)

# Open local files with file:// URLs
agent-browser --allow-file-access open file:///path/to/document.pdf
agent-browser --allow-file-access open file:///path/to/page.html
agent-browser screenshot output.png

iOS Simulator (Mobile Safari)

# List available iOS simulators
agent-browser device list

# Launch Safari on a specific device
agent-browser -p ios --device "iPhone 16 Pro" open https://example.com

# Same workflow as desktop - snapshot, interact, re-snapshot
agent-browser -p ios snapshot -i
agent-browser -p ios tap @e1          # Tap (alias for click)
agent-browser -p ios fill @e2 "text"
agent-browser -p ios swipe up         # Mobile-specific gesture

# Take screenshot
agent-browser -p ios screenshot mobile.png

# Close session (shuts down simulator)
agent-browser -p ios close

Requirements: macOS with Xcode, Appium (npm_config_prefix="${MYAGENTS_NPM_GLOBAL_PREFIX:-$HOME/.myagents/npm-global}" npm install -g appium && appium driver install xcuitest)

Real devices: Works with physical iOS devices if pre-configured. Use --device "<UDID>" where UDID is from xcrun xctrace list devices.

Security

All security features are opt-in. By default, agent-browser imposes no restrictions on navigation, actions, or output.

Content Boundaries (Recommended for AI Agents)

Enable --content-boundaries to wrap page-sourced output in markers that help LLMs distinguish tool output from untrusted page content:

export AGENT_BROWSER_CONTENT_BOUNDARIES=1
agent-browser snapshot
# Output:
# --- AGENT_BROWSER_PAGE_CONTENT nonce=<hex> origin=https://example.com ---
# [accessibility tree]
# --- END_AGENT_BROWSER_PAGE_CONTENT nonce=<hex> ---

Domain Allowlist

Restrict navigation to trusted domains. Wildcards like *.example.com also match the bare domain example.com. Sub-resource requests, WebSocket, and EventSource connections to non-allowed domains are also blocked. Include CDN domains your target pages depend on:

export AGENT_BROWSER_ALLOWED_DOMAINS="example.com,*.example.com"
agent-browser open https://example.com        # OK
agent-browser open https://malicious.com       # Blocked

Action Policy

Use a policy file to gate destructive actions:

export AGENT_BROWSER_ACTION_POLICY=./policy.json

Example policy.json:

{"default": "deny", "allow": ["navigate", "snapshot", "click", "scroll", "wait", "get"]}

Auth vault operations (auth login, etc.) bypass action policy but domain allowlist still applies.

Output Limits

Prevent context flooding from large pages:

export AGENT_BROWSER_MAX_OUTPUT=50000

Diffing (Verifying Changes)

Use diff snapshot after performing an action to verify it had the intended effect. This compares the current accessibility tree against the last snapshot taken in the session.

# Typical workflow: snapshot -> action -> diff
agent-browser snapshot -i          # Take baseline snapshot
agent-browser click @e2            # Perform action
agent-browser diff snapshot        # See what changed (auto-compares to last snapshot)

For visual regression testing or monitoring:

# Save a baseline screenshot, then compare later
agent-browser screenshot baseline.png
# ... time passes or changes are made ...
agent-browser diff screenshot --baseline baseline.png

# Compare staging vs production
agent-browser diff url https://staging.example.com https://prod.example.com --screenshot

diff snapshot output uses + for additions and - for removals, similar to git diff. diff screenshot produces a diff image with changed pixels highlighted in red, plus a mismatch percentage.

Timeouts and Slow Pages

The default Playwright timeout is 25 seconds for local browsers. This can be overridden with the AGENT_BROWSER_DEFAULT_TIMEOUT environment variable (value in milliseconds). For slow websites or large pages, use explicit waits instead of relying on the default timeout:

# Wait for network activity to settle (best for slow pages)
agent-browser wait --load networkidle

# Wait for a specific element to appear
agent-browser wait "#content"
agent-browser wait @e1

# Wait for a specific URL pattern (useful after redirects)
agent-browser wait --url "**/dashboard"

# Wait for a JavaScript condition
agent-browser wait --fn "document.readyState === 'complete'"

# Wait a fixed duration (milliseconds) as a last resort
agent-browser wait 5000

When dealing with consistently slow websites, use wait --load networkidle after open to ensure the page is fully loaded before taking a snapshot. If a specific element is slow to render, wait for it directly with wait <selector> or wait @ref.

Session Management and Cleanup

When running multiple agents or automations concurrently, always use named sessions to avoid conflicts:

# Each agent gets its own isolated session
agent-browser --session agent1 open site-a.com
agent-browser --session agent2 open site-b.com

# Check active sessions
agent-browser session list

Always close your browser session when done to avoid leaked processes:

agent-browser close                    # Close default session
agent-browser --session agent1 close   # Close specific session

If a previous session was not closed properly, the daemon may still be running. Use agent-browser close to clean it up before starting new work.

Ref Lifecycle (Important)

Refs (@e1, @e2, etc.) are invalidated when the page changes. Always re-snapshot after:

Clicking links or buttons that navigate
Form submissions
Dynamic content loading (dropdowns, modals)

agent-browser click @e5              # Navigates to new page
agent-browser snapshot -i            # MUST re-snapshot
agent-browser click @e1              # Use new refs

Annotated Screenshots (Vision Mode)

Use --annotate to take a screenshot with numbered labels overlaid on interactive elements. Each label [N] maps to ref @eN. This also caches refs, so you can interact with elements immediately without a separate snapshot.

agent-browser screenshot --annotate
# Output includes the image path and a legend:
#   [1] @e1 button "Submit"
#   [2] @e2 link "Home"
#   [3] @e3 textbox "Email"
agent-browser click @e2              # Click using ref from annotated screenshot

Use annotated screenshots when:

The page has unlabeled icon buttons or visual-only elements
You need to verify visual layout or styling
Canvas or chart elements are present (invisible to text snapshots)
You need spatial reasoning about element positions

Semantic Locators (Alternative to Refs)

When refs are unavailable or unreliable, use semantic locators:

agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find role button click --name "Submit"
agent-browser find placeholder "Search" type "query"
agent-browser find testid "submit-btn" click

JavaScript Evaluation (eval)

Use eval to run JavaScript in the browser context. Shell quoting can corrupt complex expressions -- use --stdin or -b to avoid issues.

# Simple expressions work with regular quoting
agent-browser eval 'document.title'
agent-browser eval 'document.querySelectorAll("img").length'

# Complex JS: use --stdin with heredoc (RECOMMENDED)
agent-browser eval --stdin <<'EVALEOF'
JSON.stringify(
  Array.from(document.querySelectorAll("img"))
    .filter(i => !i.alt)
    .map(i => ({ src: i.src.split("/").pop(), width: i.width }))
)
EVALEOF

# Alternative: base64 encoding (avoids all shell escaping issues)
agent-browser eval -b "$(echo -n 'Array.from(document.querySelectorAll("a")).map(a => a.href)' | base64)"

Why this matters: When the shell processes your command, inner double quotes, ! characters (history expansion), backticks, and $() can all corrupt the JavaScript before it reaches agent-browser. The --stdin and -b flags bypass shell interpretation entirely.

Rules of thumb:

Single-line, no nested quotes -> regular eval 'expression' with single quotes is fine
Nested quotes, arrow functions, template literals, or multiline -> use eval --stdin <<'EVALEOF'
Programmatic/generated scripts -> use eval -b with base64

Anti-Detection & Configuration

agent-browser uses its built-in defaults out of the box. If a site detects automation, override per-command with CLI flags or persist settings in ~/.agent-browser/config.json.

# Temporary: override UA for one command
agent-browser --user-agent "custom UA" open https://target.com

# Headed mode (visible browser window) for one command
agent-browser --headed open https://target.com

Persistent config: create or edit ~/.agent-browser/config.json to set defaults. args field is split by both comma and newline — avoid args containing commas (e.g. --window-size=W,H will be split incorrectly; use --start-maximized instead). All CLI flags map to camelCase keys (--executable-path → executablePath).

Config priority (lowest → highest): ~/.agent-browser/config.json < ./agent-browser.json < env vars < CLI flags.

Deep-Dive Documentation

Reference	When to Use
references/commands.md	Full command reference with all options
references/snapshot-refs.md	Ref lifecycle, invalidation rules, troubleshooting
references/session-management.md	Parallel sessions, state persistence, concurrent scraping
references/authentication.md	Login flows, OAuth, 2FA handling, state reuse
references/video-recording.md	Recording workflows for debugging and documentation
references/profiling.md	Chrome DevTools profiling for performance analysis
references/proxy-support.md	Proxy configuration, geo-testing, rotating proxies

Ready-to-Use Templates

Template	Description
templates/form-automation.sh	Form filling with validation
templates/authenticated-session.sh	Login once, reuse state
templates/capture-workflow.sh	Content extraction with screenshots

./templates/form-automation.sh https://example.com/form
./templates/authenticated-session.sh https://app.example.com/login
./templates/capture-workflow.sh https://example.com ./output

agent-browser

이 저장소의 다른 Skills

이 저장소의 다른 Skills

Browser Automation with agent-browser

First-time Setup

Step 1: Verify or install the CLI

If npm install -g fails

Step 2: Download Chromium (~160MB, one-time)

Troubleshooting

Core Workflow

Command Chaining

Essential Commands

Common Patterns

Form Submission

Authentication with Auth Vault (Recommended)

Authentication with State Persistence

Session Persistence

Data Extraction

Parallel Sessions

Connect to Existing Chrome

Color Scheme (Dark Mode)

Visual Browser (Debugging)

Local Files (PDFs, HTML)

iOS Simulator (Mobile Safari)

Security

Content Boundaries (Recommended for AI Agents)

Domain Allowlist

Action Policy

Output Limits

Diffing (Verifying Changes)

Timeouts and Slow Pages

Session Management and Cleanup

Ref Lifecycle (Important)

Annotated Screenshots (Vision Mode)

Semantic Locators (Alternative to Refs)

JavaScript Evaluation (eval)

Anti-Detection & Configuration

Deep-Dive Documentation

Ready-to-Use Templates

Browser Automation with agent-browser

First-time Setup

Step 1: Verify or install the CLI

If npm install -g fails

Step 2: Download Chromium (~160MB, one-time)

Troubleshooting

Core Workflow

Command Chaining

Essential Commands

Common Patterns

Form Submission

Authentication with Auth Vault (Recommended)

Authentication with State Persistence

Session Persistence

Data Extraction

Parallel Sessions

Connect to Existing Chrome

Color Scheme (Dark Mode)

Visual Browser (Debugging)

Local Files (PDFs, HTML)

iOS Simulator (Mobile Safari)

Security

Content Boundaries (Recommended for AI Agents)

Domain Allowlist

Action Policy

Output Limits

Diffing (Verifying Changes)

Timeouts and Slow Pages

Session Management and Cleanup

Ref Lifecycle (Important)

Annotated Screenshots (Vision Mode)

Semantic Locators (Alternative to Refs)

JavaScript Evaluation (eval)

Anti-Detection & Configuration

Deep-Dive Documentation

Ready-to-Use Templates

If `npm install -g` fails

If `npm install -g` fails