Run any Skill in Manus with one click

agent-browser

Stars1

Forks0

UpdatedJune 1, 2026 at 04:06

Browser automation CLI for AI agents — navigate pages, fill forms, click, screenshot, scrape data, and test web apps. Use whenever the user wants to drive a website programmatically, automate any browser task, log into a site, or run a browser in the cloud (Kernel, Browserbase, Browserless). Also covers human-in-the-loop "live view" takeover, where the agent drives and a person steps in from their phone for logins, CAPTCHAs, or MFA, then the agent resumes.

Installation

Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.

Run Skill in Manus

Source

mshuffett

mshuffett/dotfiles

View GitHub Repository View Creator Repositories

Download

Run Skill in Manus

Related occupationsSOC

Based on SOC occupation classification

Software DevelopersComputer and Mathematical Occupations·SOC 15-1252

File Explorer

12 files

SKILL.md

readonly

More from this repository

same repository

coach

mshuffett/dotfiles

Michael's operating + emotional coach. Operational mode — daily startup/shutdown, weekly review, pomodoro, inbox capture, daily notes (auto-loads in ~/ws/notes). Emotional/decision mode (Joe Hudson style) — use on "coach me"/"joe coach", stuck/looping/overthinking, harsh self- or other-judgment, a binary either/or decision that won't resolve, or fear, shame, loneliness, anxiety, burnout, or grief, when Michael wants to be met in a feeling rather than handed advice. Not for clinical crises (refer out).

2026-06-261

todoist

mshuffett/dotfiles

Use when creating or processing Todoist tasks, triaging inbox items, doing daily task review, calibrating Todoist triage behavior, or turning corrections into reusable preferences. Routes to operations (CLI actions) vs calibrated triage (policy, context recovery, preference memory, evals). Trigger this whenever the user asks what to do with Todoist items, wants better task triage, or is refining how Todoist decisions should work.

2026-06-231

deep-research-fanout

mshuffett/dotfiles

Run real Deep Research across ChatGPT, Claude, and Gemini in parallel via the user's own logged-in browser (Chrome extension, zero API cost), save each original report to Notion, then synthesize. Use whenever the user wants a "deep dive", "deep research", a thorough multi-source investigation, or to research a topic across the models and compare what each finds. Drives the paid subscription products, NOT the API. NOT for single-fact lookups or ordinary web search — use web-search for those.

2026-06-221

harness-engineering

mshuffett/dotfiles

Use when setting up, auditing, or improving AI agent infrastructure in a repo — AGENTS.md/CLAUDE.md files, linters, architectural constraints, feedback loops, context tiering, agent specialization, or entropy management. Also triggers on "harness engineering", "agent-friendly repo", "make my repo work well with coding agents", "set up my repo for agents", or "why is my agent struggling".

2026-06-221

adaptive-triage

mshuffett/dotfiles

Interactive Todoist triage with preference learning. Use when the user says "triage", "process my inbox", "clean up tasks", "triage my todoist", "file these captures", or mentions inbox zero. Also use when the user has a batch of raw items (voice notes, links, ideas) that need classifying and routing to Todoist projects or Obsidian. Runs an interactive confirm/correct loop that learns your routing preferences over time.

2026-06-161

session-save

mshuffett/dotfiles

Use when the user asks to save session context, identify the current session or thread, create a resumable handoff, or prepare a Todoist/note summary that must include the working directory and session id. Works across Codex and Claude Code by detecting runtime-specific session identifiers and normalizing them into one summary.

2026-06-161

name	agent-browser
description	Browser automation CLI for AI agents — navigate pages, fill forms, click, screenshot, scrape data, and test web apps. Use whenever the user wants to drive a website programmatically, automate any browser task, log into a site, or run a browser in the cloud (Kernel, Browserbase, Browserless). Also covers human-in-the-loop "live view" takeover, where the agent drives and a person steps in from their phone for logins, CAPTCHAs, or MFA, then the agent resumes.
allowed-tools	Bash(agent-browser:*)

Browser Automation with agent-browser

Core Workflow

Authentication Policy:

User-Centric Automation: When performing tasks using the user's actual accounts (e.g., checking email, managing a personal task list), ALWAYS use the profile name default (e.g., --session default) to ensure persistent session state across interactions.
Application Testing: When testing a specific application or feature (e.g., verifying a signup flow on a dev server), use a dedicated session name related to that app (e.g., --session test-myapp) to avoid polluting the default user profile.

Every browser automation follows this pattern:

Navigate: agent-browser open <url>
Snapshot: agent-browser snapshot -i (get element refs like @e1, @e2)
Interact: Use refs to click, fill, select
Re-snapshot: After navigation or DOM changes, get fresh refs

agent-browser open https://example.com/form
agent-browser snapshot -i
# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"

agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i  # Check result

Essential Commands

# Navigation
agent-browser open <url>              # Navigate (aliases: goto, navigate)
agent-browser close                   # Close browser

# Snapshot
agent-browser snapshot -i             # Interactive elements with refs (recommended)
agent-browser snapshot -i -C          # Include cursor-interactive elements (divs with onclick, cursor:pointer)
agent-browser snapshot -s "#selector" # Scope to CSS selector

# Interaction (use @refs from snapshot)
agent-browser click @e1               # Click element
agent-browser fill @e2 "text"         # Clear and type text
agent-browser type @e2 "text"         # Type without clearing
agent-browser select @e1 "option"     # Select dropdown option
agent-browser check @e1               # Check checkbox
agent-browser press Enter             # Press key
agent-browser scroll down 500         # Scroll page

# Get information
agent-browser get text @e1            # Get element text
agent-browser get url                 # Get current URL
agent-browser get title               # Get page title

# Wait
agent-browser wait @e1                # Wait for element
agent-browser wait --load networkidle # Wait for network idle
agent-browser wait --url "**/page"    # Wait for URL pattern
agent-browser wait 2000               # Wait milliseconds

# Capture
agent-browser screenshot              # Screenshot to temp dir
agent-browser screenshot --full       # Full page screenshot
agent-browser pdf output.pdf          # Save as PDF

Common Patterns

Form Submission

agent-browser open https://example.com/signup
agent-browser snapshot -i
agent-browser fill @e1 "Jane Doe"
agent-browser fill @e2 "jane@example.com"
agent-browser select @e3 "California"
agent-browser check @e4
agent-browser click @e5
agent-browser wait --load networkidle

Authentication with State Persistence

# Login once and save state
agent-browser open https://app.example.com/login
agent-browser snapshot -i
agent-browser fill @e1 "$USERNAME"
agent-browser fill @e2 "$PASSWORD"
agent-browser click @e3
agent-browser wait --url "**/dashboard"
agent-browser state save auth.json

# Reuse in future sessions
agent-browser state load auth.json
agent-browser open https://app.example.com/dashboard

Session Persistence

# Auto-save/restore cookies and localStorage across browser restarts
agent-browser --session-name myapp open https://app.example.com/login
# ... login flow ...
agent-browser close  # State auto-saved to ~/.agent-browser/sessions/

# Next time, state is auto-loaded
agent-browser --session-name myapp open https://app.example.com/dashboard

# Encrypt state at rest
export AGENT_BROWSER_ENCRYPTION_KEY=$(openssl rand -hex 32)
agent-browser --session-name secure open https://app.example.com

# Manage saved states
agent-browser state list
agent-browser state show myapp-default.json
agent-browser state clear myapp
agent-browser state clean --older-than 7

Data Extraction

agent-browser open https://example.com/products
agent-browser snapshot -i
agent-browser get text @e5           # Get specific element text
agent-browser get text body > page.txt  # Get all page text

# JSON output for parsing
agent-browser snapshot -i --json
agent-browser get text @e1 --json

Parallel Sessions

agent-browser --session site1 open https://site-a.com
agent-browser --session site2 open https://site-b.com

agent-browser --session site1 snapshot -i
agent-browser --session site2 snapshot -i

agent-browser session list

Connect to Existing Chrome

# Auto-discover running Chrome with remote debugging enabled
agent-browser --auto-connect open https://example.com
agent-browser --auto-connect snapshot

# Or with explicit CDP port
agent-browser --cdp 9222 snapshot

Visual Browser (Debugging)

agent-browser --headed open https://example.com
agent-browser highlight @e1          # Highlight element
agent-browser record start demo.webm # Record session

Local Files (PDFs, HTML)

# Open local files with file:// URLs
agent-browser --allow-file-access open file:///path/to/document.pdf
agent-browser --allow-file-access open file:///path/to/page.html
agent-browser screenshot output.png

iOS Simulator (Mobile Safari)

# List available iOS simulators
agent-browser device list

# Launch Safari on a specific device
agent-browser -p ios --device "iPhone 16 Pro" open https://example.com

# Same workflow as desktop - snapshot, interact, re-snapshot
agent-browser -p ios snapshot -i
agent-browser -p ios tap @e1          # Tap (alias for click)
agent-browser -p ios fill @e2 "text"
agent-browser -p ios swipe up         # Mobile-specific gesture

# Take screenshot
agent-browser -p ios screenshot mobile.png

# Close session (shuts down simulator)
agent-browser -p ios close

Requirements: macOS with Xcode, Appium (npm install -g appium && appium driver install xcuitest)

Real devices: Works with physical iOS devices if pre-configured. Use --device "<UDID>" where UDID is from xcrun xctrace list devices.

Cloud Browsers & Human Takeover (Kernel, Browserbase, …)

Run the browser in the cloud instead of locally, and optionally hand control to a human via a live-view URL (open on any device, including a phone) — the "agent drives, human steps in for logins/CAPTCHAs/MFA, agent resumes" pattern.

Select a provider with -p <name> or AGENT_BROWSER_PROVIDER (kernel, browserbase, browserless, browseruse, agentcore). Everything else in this skill (snapshot -i, @refs, click/fill, screenshot, eval) works unchanged once a provider is set.

# Kernel: a takeover-ready session in one shot (KERNEL_API_KEY auto-loads from ~/.env.zsh)
export AGENT_BROWSER_PROVIDER=kernel KERNEL_HEADLESS=false KERNEL_TIMEOUT_SECONDS=1800
./templates/kernel-takeover.sh https://example.com    # prints the live-view URL
agent-browser snapshot -i && agent-browser click @e1  # drive it like any session

Critical gotcha: agent-browser launches Kernel headless by default, and headless sessions have no live view. For human takeover, set KERNEL_HEADLESS=false before the first open (you can't flip an existing session). Full setup, the SDK-free way to build the takeover URL, persistent-login profiles, and cleanup (cloud browsers keep billing after close --all until you delete them) are in references/cloud-providers.md.

Ref Lifecycle (Important)

Refs (@e1, @e2, etc.) are invalidated when the page changes. Always re-snapshot after:

Clicking links or buttons that navigate
Form submissions
Dynamic content loading (dropdowns, modals)

agent-browser click @e5              # Navigates to new page
agent-browser snapshot -i            # MUST re-snapshot
agent-browser click @e1              # Use new refs

Semantic Locators (Alternative to Refs)

When refs are unavailable or unreliable, use semantic locators:

agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find role button click --name "Submit"
agent-browser find placeholder "Search" type "query"
agent-browser find testid "submit-btn" click

JavaScript Evaluation (eval)

Use eval to run JavaScript in the browser context. Shell quoting can corrupt complex expressions -- use --stdin or -b to avoid issues.

# Simple expressions work with regular quoting
agent-browser eval 'document.title'
agent-browser eval 'document.querySelectorAll("img").length'

# Complex JS: use --stdin with heredoc (RECOMMENDED)
agent-browser eval --stdin <<'EVALEOF'
JSON.stringify(
  Array.from(document.querySelectorAll("img"))
    .filter(i => !i.alt)
    .map(i => ({ src: i.src.split("/").pop(), width: i.width }))
)
EVALEOF

# Alternative: base64 encoding (avoids all shell escaping issues)
agent-browser eval -b "$(echo -n 'Array.from(document.querySelectorAll("a")).map(a => a.href)' | base64)"

Why this matters: When the shell processes your command, inner double quotes, ! characters (history expansion), backticks, and $() can all corrupt the JavaScript before it reaches agent-browser. The --stdin and -b flags bypass shell interpretation entirely.

Rules of thumb:

Single-line, no nested quotes -> regular eval 'expression' with single quotes is fine
Nested quotes, arrow functions, template literals, or multiline -> use eval --stdin <<'EVALEOF'
Programmatic/generated scripts -> use eval -b with base64

Deep-Dive Documentation

Reference	When to Use
references/commands.md	Full command reference with all options
references/snapshot-refs.md	Ref lifecycle, invalidation rules, troubleshooting
references/session-management.md	Parallel sessions, state persistence, concurrent scraping
references/authentication.md	Login flows, OAuth, 2FA handling, state reuse
references/video-recording.md	Recording workflows for debugging and documentation
references/proxy-support.md	Proxy configuration, geo-testing, rotating proxies
references/cloud-providers.md	Cloud browsers (Kernel/Browserbase), live-view human takeover, profiles, cleanup

Ready-to-Use Templates

Template	Description
templates/form-automation.sh	Form filling with validation
templates/authenticated-session.sh	Login once, reuse state
templates/capture-workflow.sh	Content extraction with screenshots
templates/kernel-takeover.sh	Launch a Kernel cloud browser + print a live-view takeover URL

./templates/form-automation.sh https://example.com/form
./templates/authenticated-session.sh https://app.example.com/login
./templates/capture-workflow.sh https://example.com ./output