Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

agent-browser

Sterne22

Forks2

Aktualisiert8. Februar 2026 um 23:12

Browser automation using Vercel's agent-browser CLI. This skill should be used when the user says "browse a website", "fill a form", "take a screenshot", or "test a web page". Uses ref-based element selection.

Installation

Mit Codex oder Claude installieren Kopieren Sie diesen Prompt, fügen Sie ihn in Codex, Claude oder einen anderen Assistant ein und lassen Sie die Skill-Seite prüfen und installieren.

In Manus ausführen

Quelle

tmchow

tmchow/tmc-marketplace

GitHub-Repository öffnen Creator-Repositorys ansehen

Download

In Manus ausführen

Verwandte BerufeSOC

Basierend auf der SOC-Berufsklassifikation

WebentwicklerInformatik- und Mathematikberufe·SOC 15-1254

SKILL.md

readonly

Mehr aus diesem Repository

gleiches Repository

iterativedesign-exploration

tmchow/tmc-marketplace

Explore radically different design approaches for any page, component, or feature. Use when the user wants to compare multiple distinct visual directions side by side — not build one specific thing, but see several fundamentally different takes on the same UI. Signals: "explore designs for", "show me different ways to design", "radically different approaches", "what could this look like", "design exploration", "let's see some options before we commit", "different layout approaches", "different visual identities for the same content", wanting to see multiple families or variations before choosing a direction. Also triggers when the user pastes design exploration feedback (starts with "## Design Exploration Feedback") to iterate on a previous round, or pastes a design direction ("## Design Direction") to finalize. Do NOT trigger for building one specific design, visual refreshes, design critiques, brainstorming without visual output, single widget prototypes, or comparing CSS frameworks.

2026-03-0722

plan-review

tmchow/tmc-marketplace

Review PRDs, brainstorm docs, tech plans, design docs, specs, or any planning document for issues. This skill should be used whenever the user wants feedback, critique, or a quality check on an existing planning or requirements document — even if they don't use the word "review." Common triggers include: "review the plan", "check the PRD", "critique my tech plan", "what's wrong with this plan", "poke holes in the requirements", "is this plan solid", "take a look at this spec", "give feedback on the brainstorm", or pointing to a doc file and asking if anything is off. Also triggers after writing a PRD or tech plan (to review what was just created), or when invoked by brainstorming or tech-planning skills. Do NOT trigger for code review (PRs, diffs, source files), writing/creating new plans, debugging, or reviewing non-planning documents (READMEs, CLAUDE.md, test coverage).

2026-03-0522

code-review

tmchow/tmc-marketplace

This skill should be used when the user says "review my code", "check these changes", or wants feedback on code before creating a PR. Also used after completing a task during iterative implementation.

2026-03-0422

iterativeimplementing

tmchow/tmc-marketplace

Execute a tech plan with dependency-aware batching, TDD, code review, and PR creation. Triggers: "implement the plan", "start building", "start implementing", "execute the plan".

2026-03-0422

iterativebrainstorming

tmchow/tmc-marketplace

Scope-first brainstorming with intelligent routing — assesses complexity upfront (Quick/Standard/Full), then adapts depth accordingly. Handles simple bug fixes in ~2 exchanges and complex features with full PRD ceremony. Triggers: "brainstorm", "create a PRD", "write requirements", "explore approaches", "think through options", or starting a new feature with unclear direction.

2026-03-0422

implementation-wrapup

tmchow/tmc-marketplace

Complete a feature branch with test verification and PR creation. Triggers: "finish up", "create a PR", "wrap up the feature". Also invoked by iterative:implementing after all tasks are complete.

2026-03-0422

name	agent-browser
description	Browser automation using Vercel's agent-browser CLI. This skill should be used when the user says "browse a website", "fill a form", "take a screenshot", or "test a web page". Uses ref-based element selection.
user-invocable	true

agent-browser: CLI Browser Automation

Vercel's headless browser CLI designed for AI agents. Uses ref-based selection (@e1, @e2) from accessibility snapshots.

Setup Check

command -v agent-browser >/dev/null 2>&1 && echo "Installed" || echo "NOT INSTALLED"

Install if needed

npm install -g agent-browser
agent-browser install  # Downloads Chromium

Core Workflow

The snapshot + ref pattern is optimal for LLMs:

Navigate to URL
Snapshot to get interactive elements with refs
Interact using refs (@e1, @e2, etc.)
Re-snapshot after navigation or DOM changes

# Step 1: Open URL
agent-browser open https://example.com

# Step 2: Get interactive elements with refs
agent-browser snapshot -i

# Step 3: Interact using refs
agent-browser click @e1
agent-browser fill @e2 "search query"

# Step 4: Re-snapshot after changes
agent-browser snapshot -i

Key Commands

Navigation

agent-browser open <url>       # Navigate to URL
agent-browser back             # Go back
agent-browser forward          # Go forward
agent-browser reload           # Reload page
agent-browser close            # Close browser

Snapshots (Essential for AI)

agent-browser snapshot              # Full accessibility tree
agent-browser snapshot -i           # Interactive elements only (recommended)
agent-browser snapshot -i --json    # JSON output for parsing
agent-browser snapshot -c           # Compact (remove empty elements)
agent-browser snapshot -d 3         # Limit depth
agent-browser snapshot -s @e5       # Scope to element subtree

Interactions

agent-browser click @e1                    # Click element
agent-browser dblclick @e1                 # Double-click
agent-browser fill @e1 "text"              # Clear and fill input
agent-browser type @e1 "text"              # Type without clearing
agent-browser press Enter                  # Press key
agent-browser hover @e1                    # Hover element
agent-browser check @e1                    # Check checkbox
agent-browser uncheck @e1                  # Uncheck checkbox
agent-browser select @e1 "option"          # Select dropdown option
agent-browser scroll down 500              # Scroll (up/down/left/right)
agent-browser scrollintoview @e1           # Scroll element into view

Get Information

agent-browser get text @e1          # Get element text
agent-browser get html @e1          # Get element HTML
agent-browser get value @e1         # Get input value
agent-browser get attr href @e1     # Get attribute
agent-browser get title             # Get page title
agent-browser get url               # Get current URL
agent-browser get count "button"    # Count matching elements

Screenshots & PDFs

agent-browser screenshot                      # Viewport screenshot
agent-browser screenshot --full               # Full page
agent-browser screenshot output.png           # Save to file
agent-browser pdf output.pdf                  # Save as PDF

Wait

agent-browser wait @e1              # Wait for element
agent-browser wait 2000             # Wait milliseconds
agent-browser wait "text"           # Wait for text to appear
agent-browser wait --url "pattern"  # Wait for URL match

Semantic Locators (Alternative to Refs)

agent-browser find role button click --name "Submit"
agent-browser find text "Sign up" click
agent-browser find label "Email" fill "user@example.com"
agent-browser find placeholder "Search..." fill "query"

Sessions & Profiles

Sessions (Parallel Isolated Browsers)

agent-browser --session browser1 open https://site1.com
agent-browser --session browser2 open https://site2.com
agent-browser session list

Profiles (Persistent State Across Restarts)

# Profiles preserve cookies, localStorage, login sessions
agent-browser --profile ~/.myapp-profile open https://app.example.com

Authentication

Skip UI Login with Headers

agent-browser open https://api.example.com --headers '{"Authorization": "Bearer <token>"}'

Save/Load Auth State

# After logging in via UI
agent-browser state save auth-state.json

# Reuse in future sessions
agent-browser state load auth-state.json
agent-browser open https://app.example.com  # Already logged in

Debug Mode

# Run with visible browser window
agent-browser --headed open https://example.com
agent-browser --headed snapshot -i
agent-browser --headed click @e1

Examples

Login Flow

agent-browser open https://app.example.com/login
agent-browser snapshot -i
# Output: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Sign in" [ref=e3]
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait 2000
agent-browser snapshot -i  # Verify logged in

Search and Extract

agent-browser open https://news.ycombinator.com
agent-browser snapshot -i
agent-browser get text @e12  # Get headline text
agent-browser click @e12     # Click to open story

Form Filling

agent-browser open https://forms.example.com
agent-browser snapshot -i
agent-browser fill @e1 "John Doe"
agent-browser fill @e2 "john@example.com"
agent-browser select @e3 "United States"
agent-browser check @e4  # Agree to terms
agent-browser click @e5  # Submit
agent-browser screenshot confirmation.png

JSON Output

agent-browser snapshot -i --json

Returns structured data with refs for programmatic parsing.

vs Playwright MCP

Feature	agent-browser (CLI)	Playwright MCP
Interface	Bash commands	MCP tools
Selection	Refs (@e1)	Refs (e1)
Output	Text/JSON	Tool responses
Parallel	Sessions	Tabs
Best for	Quick automation	Tool integration

Use agent-browser when:

You prefer Bash-based workflows
You want simpler CLI commands
You need quick one-off automation

Use Playwright MCP when:

You need deep MCP tool integration
You want tool-based responses
You're building complex automation