| name | browser-automation |
| description | Automate Google Chrome for web tasks — navigate sites, fill forms, click elements, take snapshots, and extract content. Powered by playwright-cli (pw) with a persistent browser profile. Use when asked to interact with a website, fill out a checkout form, scrape content, or perform any browser-based workflow. |
Browser Automation
Automate Google Chrome via playwright-cli (pw). Maintains a persistent Chrome profile at ~/.pw-agent/.playwright-profile/ so logins and cookies survive across sessions.
Prerequisites
Setup
~/.pw-agent/ is the runtime home. It holds the pw launcher script and persistent Chrome state.
First-time setup
-
Create runtime directories:
mkdir -p ~/.pw-agent/.playwright-profile ~/.pw-agent/.playwright-cli ~/.pw-agent/.playwright-mcp
-
Create a launcher script at ~/.pw-agent/pw:
cat > ~/.pw-agent/pw << 'LAUNCHER'
cd "$(dirname "$0")"
NPM_BIN="$(npm root -g 2>/dev/null)/../bin"
[ -d "$NPM_BIN" ] && export PATH="$NPM_BIN:$PATH"
exec playwright-cli "$@"
LAUNCHER
chmod +x ~/.pw-agent/pw
-
Seed the profile by opening a site:
cd ~/.pw-agent && ./pw open https://example.com
Log in to any services you need. Close the browser when done. The profile persists for future sessions.
CRITICAL: Never delete ~/.pw-agent/.playwright-profile/ — it contains saved logins and cookies. Re-setup should only recreate the launcher script and mkdir -p missing directories.
Usage
All commands run from ~/.pw-agent/:
cd ~/.pw-agent && ./pw open https://example.com
./pw snapshot
./pw screenshot
./pw click e42
./pw fill e42 "text"
./pw press Enter
./pw stop
Headed by default — do not pass --headless
The default mode is headed and you should keep it that way. Headed mode lets the user see the browser, intervene on bot checks, and watch the agent — all of which matter most when money is moving.
Never pass --headless for:
- Agent-card purchases or any payment flow
- Logged-in workflows where a session may need to be re-authenticated
- Any task where the user is at the keyboard and would benefit from seeing the browser
--headless is only appropriate for unattended scraping or background inspection where no human is watching, and even then ask the user first.
./pw --headless open https://example.com
./pw snapshot
Pass --headless only on the command that launches the browser (open for session 0, start for numbered sessions). Once launched, the session retains its mode for its lifetime — switching requires ./pw stop and re-opening.
Multi-session support
Run parallel browser sessions with isolated contexts:
./pw 1 start https://site-a.com
./pw 1 snapshot
./pw 1 stop
./pw 2 start https://site-b.com
./pw 2 snapshot
./pw 2 stop
Named sessions for readability:
./pw --session "checkout" open https://store.com
./pw --session "checkout" snapshot
./pw --session "checkout" click e42
./pw --session "checkout" stop
If ./pw N start says "Session already active", pick another number or stop the existing one.
Navigating within a session: Don't use open on an already-running session. Use eval instead:
./pw eval "() => { window.location.href = 'https://example.com/new-page' }"
sleep 3
./pw screenshot
Output artifacts
- Snapshots:
.playwright-cli/page-<timestamp>.yml
- Screenshots:
.playwright-cli/page-<timestamp>.png
- Downloads:
.playwright-mcp/
Screenshots vs Snapshots
- Screenshots capture the visible viewport only
- Use to verify visual state, debug layout, understand what's visible/hidden
- Use to decide what action to take next (click, scroll, type)
- Snapshots capture the entire DOM, not just the visible viewport
Element refs
./pw snapshot produces a YAML accessibility tree with [ref=eNNN] identifiers for interactive elements:
./pw snapshot
./pw fill e42 "query"
./pw click e15
Refs are invalidated after page changes (navigation, AJAX, dynamic content) — re-snapshot to get fresh refs.
Prefer keyboard over clicks for more reliable interaction:
Tab — next element / form field
Enter — select autocomplete result, activate button
Meta+Enter — submit forms, send messages
Escape — close dialogs, cancel operations
PageUp/PageDown — scroll to load more content
Wait for async content:
SNAPSHOT=$(./pw snapshot 2>&1 | grep -oE '\.playwright-cli/[^ )]+\.yml')
grep -iE "results\|loaded" "$SNAPSHOT" -C 5 || echo "not ready yet"
Dropdown <select> elements
fill does NOT work on <select> elements. Use the select command instead:
./pw select <ref> "Option Label"
./pw select <ref> "value"
The snapshot shows <select> elements as combobox roles. If fill errors with "Element is not an <input>", switch to select.
Typing vs filling
fill sets the value programmatically — fast but may not trigger React/JS change handlers on some sites. If fill doesn't work:
- Click the field first:
./pw click <ref>
- Clear it:
./pw press Meta+a then ./pw press Backspace
- Type character by character:
./pw press 3 then ./pw press . then ./pw press 4 etc.
Some numeric inputs strip non-numeric characters (e.g., . in donation amount fields that only accept whole numbers).
Nested iframes
Payment forms (Stripe, Braintree) are often inside nested iframes. Playwright-cli handles this transparently — element refs like f20e21 (with f prefix + nested frame IDs) work across iframe boundaries. Just use the ref from the snapshot as you would any other element.
Stripe iframe refs change when the iframe is recreated (e.g., navigating between wizard steps). Always re-snapshot to get fresh refs after page transitions.
Dismissing popups and modals
Many sites show cookie banners, login prompts, or promotional modals:
SNAPSHOT=$(./pw snapshot 2>&1 | grep -oE '\.playwright-cli/[^ )]+\.yml')
grep -iE "dismiss|close|no.thanks|decline|accept.*cookie" "$SNAPSHOT" | head -5
./pw click e42
JavaScript evaluation
Use heredocs to avoid shell quoting issues:
./pw eval "$(cat <<'EOF'
() => {
var links = document.querySelectorAll('a[href]');
return Array.from(links).map(a => a.href).slice(0, 10);
}
EOF
)"
IMPORTANT: eval expects a function definition, not a value expression:
./pw eval "() => document.title"
./pw eval "document.title"
Scrolling
mousewheel requires positioning the mouse over the scrollable content first:
./pw mousemove 600 400
./pw mousewheel 500 0
mousewheel 500 0 = scroll down 500px
mousewheel -500 0 = scroll up 500px
mousewheel 0 500 = scroll right 500px
Authentication tips
When a site requires login:
- Check if there's an SSO/OAuth option — prefer it over username/password
- If you need credentials, ask the user — never guess passwords
- For sites with saved sessions, the persistent profile may already be authenticated
- After logging in, the session persists in the profile for future use
Bot checks, CAPTCHAs, and human handoff
When you hit a CAPTCHA, reCAPTCHA, 3DS challenge, login wall, "verify you're human" page, or anything else that requires human input, do not retry programmatically. The agent will not solve these reliably, and on payment flows it should not try. Instead, hand the wheel to the user:
- Confirm the browser is headed. If you launched with
--headless, the user cannot interact with the page — ./pw stop and re-open without the flag before continuing.
- Screenshot the current state so the user knows what they are looking at:
./pw screenshot
- Hand off to the user in plain language. Tell them what is on the screen and what to do in the visible Chrome window. Example: "There's a reCAPTCHA on the Gumroad checkout. Switch to the Chrome window I opened, solve it, and reply when you're past it."
- Wait. Do not poll the page or re-snapshot. Wait for the user's reply before doing anything else.
- Resume. Once the user confirms, snapshot again and continue from where you stopped. Element refs may have changed during their interaction — re-grep before clicking.
This pattern works because the persistent profile and headed Chrome mean the user shares the same browser session as the agent. They do not need to restart anything, log in again, or copy URLs — they just interact with the window that's already on their screen.
Debugging checkout issues
When a checkout form behaves unexpectedly, check browser console logs:
cat ~/.pw-agent/.playwright-cli/console-*.log | tail -30
Look for HTTP status codes and error events — they reveal what the UI may not show.
Multi-step checkout wizards
Some checkout widgets use multi-step wizards where the Stripe iframe is destroyed between steps. Card data entered in step N may not be preserved when you reach step N+3.
Workaround strategies:
- Prefer merchants with single-page checkout forms — fewer points of failure
- Look for a direct checkout URL instead of an embedded widget
- If stuck with a multi-step wizard, move through the non-card steps as quickly as possible
Best practices
- Don't stop to narrate intermediate states — keep clicking through auth flows and loading screens until you hit an actual blocker
- Try the action before assuming it won't work
- Busywait with exponential backoff for page loads and async content
- After a browser session, summarize actions in shorthand:
open <url> -> click e42 -> fill e15 "query" -> press Enter
- Run
date at session start to know current date/time for interpreting relative dates