| name | browser |
| description | Browser automation SOP — choose local Playwright (free) vs Browserbase cloud (metered), with session patterns and hand-off protocol. Use when a task requires web browsing. |
| user-invocable | false |
Browser Automation
Operational guide for web browsing tasks. Two browser options are available — choose the right one for the job.
See also: reference.md for detailed API reference, troubleshooting, credentials, and plan limits.
Decision Matrix
Option A: Local Playwright (MCP tools) — FREE, unlimited
The Playwright MCP server provides mcp__playwright__browser_* tools directly in the session.
Use when:
- Reading public web content (docs, articles, search results, prices)
- Simple scraping of public data
- Form fills on sites without bot protection
- Development/testing (previewing pages, checking our apps)
- Quick lookups (weather, reference, public info)
- Any site that doesn't actively block automation
Limitations:
- Detectable as headless browser (no stealth mode)
- No CAPTCHA auto-solving
- No human hand-off capability
- No persistent cookies across sessions
- Blocked by Cloudflare, aggressive anti-bot, headless detection
Pattern:
1. mcp__playwright__browser_navigate → go to URL
2. mcp__playwright__browser_snapshot → get page accessibility tree (preferred over screenshot for actions)
3. mcp__playwright__browser_click/type/etc → interact
4. mcp__playwright__browser_take_screenshot → visual capture when needed
5. mcp__playwright__browser_close → clean up when done
Option B: Browserbase Cloud (sidecar on port 3849) — METERED
Remote Chrome with stealth mode, CAPTCHA solving, human hand-off, and /session/eval for efficient form filling via page.evaluate().
Use when:
- Site requires authentication (human enters credentials via hand-off)
- Anti-bot protection present (Cloudflare, reCAPTCHA, headless detection)
- Banking & financial sites (need stealth + security)
- Utility account management (bill pay, account changes)
- CAPTCHA-gated workflows
- Need persistent cookies across sessions (context persistence)
- Form filling tasks —
/session/eval enables batch-filling all fields in one call
Pattern:
1. POST http://localhost:3849/session/start { "url": "...", "contextName": "sitename" }
2. POST /session/eval — discover fields, batch-fill forms (see Form Filling Protocol)
3. POST /session/navigate, /session/click, /session/type — interact as needed
4. GET /session/screenshot — visual check
5. If blocked → hand off to human (see Hand-Off below)
6. POST /session/stop { "saveContext": true } — clean up, save cookies
Decision Flowchart
Is there an API, CLI tool, or curl option?
YES → Use that (no browser needed)
NO ↓
Is the content public with no bot protection?
YES → Local Playwright (free)
NO ↓
Anti-bot protection, CAPTCHA, or headless detection?
YES → Browserbase
NO ↓
Need human to enter credentials or approve something?
YES → Browserbase (hand-off)
NO ↓
Need to fill forms on a site with bot protection?
YES → Browserbase (/session/eval for efficient batch fill)
NO ↓
Need persistent cookies across sessions?
YES → Browserbase (context persistence)
NO → Local Playwright (free)
Hand-Off Protocol (Browserbase only)
When the assistant hits a blocker (CAPTCHA, login, MFA, payment approval):
- Take screenshot, assess the blocker
- Start hand-off:
POST http://localhost:3847/browser/handoff/start
- Get session info:
GET http://localhost:3849/session/status → extract wrapperPath
- Build the public hand-off URL:
https://yourdomain.com{wrapperPath}
- IMPORTANT: Always use your configured public domain URL, NEVER
localhost — the human is on their phone/laptop, not on this machine
- The daemon proxies
/handoff/* to the sidecar automatically
- Message human on the active channel with:
- Hand-off URL (tappable link to the interactive web UI)
- Screenshot of current page
- Clear description: "I'm stuck on [X] — need you to [Y]"
- Always include the available commands (see below)
- Human interacts via the web UI or channel commands:
type: [text] — types into focused field (NEVER logged — safe for passwords)
screenshot — sends fresh screenshot
done / all yours — completes hand-off (shows confirmation overlay in web UI)
abort — cancels and closes session
- Done confirmation: When the human taps Done in the web UI, a confirmation overlay appears asking them to confirm. This prevents accidental hand-back while still interacting.
- Assistant resumes autonomous navigation after
done is confirmed
Hand-Off Message Template
Always include this in the initial hand-off message to the human:
I need your help with [description of blocker].
Hand-off page: https://yourdomain.com/handoff/page?token=TOKEN_HERE
[Description of what's on screen and what you need them to do]
Available commands:
• type: [text] — type into the focused field
• screenshot — get a fresh screenshot
• done — hand control back to me
• abort — cancel and close the session
Hand-Off Triggers
- CAPTCHA that auto-solve can't handle
- Login requiring credentials the assistant doesn't have
- Multi-factor authentication (phone/email codes, authenticator)
- "Are you human?" challenges
- Payment confirmation (human must explicitly approve)
- Anything the assistant can't figure out from screenshots
Form Filling Protocol (Browserbase)
When filling forms via Browserbase, follow this efficient workflow:
1. Navigate to the Form Page
curl -s -X POST http://localhost:3849/session/start \
-H 'Content-Type: application/json' \
-d '{"url": "https://example.com/application", "contextName": "sitename"}'
2. Discover All Form Fields
Use /session/eval to enumerate every field on the page:
curl -s -X POST http://localhost:3849/session/eval \
-H 'Content-Type: application/json' \
-d '{"script": "Array.from(document.querySelectorAll(\"input, select, textarea\")).map(el => ({tag: el.tagName, type: el.type, name: el.name, id: el.id, label: el.labels?.[0]?.textContent?.trim() || null, options: el.tagName===\"SELECT\" ? Array.from(el.options).map(o=>({val:o.value,text:o.text})) : undefined})).filter(f => f.type !== \"hidden\" && f.name !== \"\")"}'
For complex discovery scripts, use Python to avoid escaping issues (see reference.md for the Python eval helper pattern).
3. Batch-Fill All Fields in One Call
Use /session/eval with the React-compatible native setter pattern:
python3 -c "
import json, subprocess
script = '''(() => {
const setter = Object.getOwnPropertyDescriptor(window.HTMLInputElement.prototype, 'value').set;
const fields = {'firstName': 'John', 'lastName': 'Smith', 'email': 'john@example.com'};
for (const [id, val] of Object.entries(fields)) {
const el = document.getElementById(id);
if (el) { setter.call(el, val); el.dispatchEvent(new Event('input', {bubbles:true})); el.dispatchEvent(new Event('change', {bubbles:true})); }
}
return 'filled';
})()'''
r = subprocess.run(['curl', '-s', '-X', 'POST', 'http://localhost:3849/session/eval',
'-H', 'Content-Type: application/json', '-d', json.dumps({'script': script})],
capture_output=True, text=True)
print(r.stdout)
"
4. Screenshot to Verify
curl -s http://localhost:3849/session/screenshot > /tmp/form-filled.png
Review the screenshot to confirm all fields are populated correctly.
5. Submit or Hand Off
- If the form is ready: use
/session/click to submit
- If there is a CAPTCHA or login blocker: hand off to human (see Hand-Off Protocol)
- If the human needs to review before submitting: hand off with instructions to review and click Submit
Tips
- Use Python for eval scripts: Shell escaping of JSON containing JavaScript is error-prone. Python's
json.dumps() handles it cleanly.
- Session timeout is 300s by default: For multi-page forms, work quickly. You can set
SESSION_TIMEOUT env var on the sidecar for longer workflows.
- Discover before filling: Never guess field IDs. Always run the discovery script first. Field IDs vary wildly across sites.
- Dropdowns need separate handling: Native
<select> uses .value + change event. Custom dropdowns (React Select, etc.) need click-based interaction via /session/click.
- Masked fields need
/session/type: For SSN, phone, credit card fields with input masks, use /session/type with the field selector instead of eval. Masks rely on keystroke events.
- Government sites: Watch for ASP.NET postbacks (dropdown changes reload the page), aggressive session timeouts, and CAPTCHA on submission. See reference.md for detailed tips.
Security Rules
type: relay text is NEVER logged — passwords, SSNs, account numbers are safe
- Live view URLs only sent via private channel DM — never in group chats or logs
- Financial transactions require explicit human approval
- Session recording is OFF — no video of banking/password screens on Browserbase servers
- Proxy providers block banking domains — hand-off to human is the approach for those
Site Knowledge (Memory-Based)
Per-site learnings (field selectors, navigation flows, quirks, anti-bot patterns) are stored as memories in .kithkit/state/memory/memories/, NOT in this skill. This keeps the skill shareable without leaking site-specific knowledge.
Before Browsing: Look Up What You Know
Before interacting with any site, check memory for prior learnings:
Grep "egov.maryland.gov" path=".kithkit/state/memory/memories/"
Grep "category: website" path=".kithkit/state/memory/memories/"
Grep "tags:.*md-business-express" path=".kithkit/state/memory/memories/"
If you find a matching memory, read it. It may contain field IDs, navigation steps, known quirks, or auth requirements that save you from re-discovering everything.
After Browsing: Capture What You Learned
When you discover something useful about a site, store it immediately using /memory add:
Memory format for website learnings:
---
date: 2026-02-11T21:00:00
category: website
importance: medium
subject: egov.maryland.gov — LLC Filing Form
tags: [maryland, business-express, llc, government, form-filling]
confidence: 0.9
source: observation
---
- URL: https://egov.maryland.gov/BusinessExpress
- Auth: account required (credential-md-business-express-username)
- Anti-bot: minimal (no Cloudflare, no CAPTCHA on forms)
- Browser: Browserbase recommended (account login required)
- Register LLC: Dashboard → Register → Maryland Limited Liability Company → "Use Online Forms"
- 6-step wizard: Business Name → Business Info → Resident Agent → Additional → Confirm → Pay
- `#BusinessSuffix`: select, options: ", LLC" / ", L.L.C." / ", Limited Liability Company"
- `#BusinessName`: input, the LLC name without suffix
- Availability check button: `#btnCheckAvailability`
- Dropdown changes trigger ASP.NET postbacks (full page reload)
- County dropdown populates after state selection — wait for reload
- Session timeout: aggressive, ~15 min idle
What to Capture
| Category | Examples |
|---|
| Access | URL, auth method, anti-bot level, which browser option to use |
| Navigation | How to reach key pages, menu paths, wizard steps |
| Form fields | IDs, names, types, labels, dropdown options, masked fields |
| Quirks | Postbacks, session timeouts, dynamic field loading, JS framework |
| Anti-bot | Cloudflare, reCAPTCHA, headless detection, rate limits |
| Auth flow | Login URL, MFA type, credential keychain keys |
| Failures | What didn't work and why (stale selectors, blocked approaches) |
Staleness & Verification
Site knowledge can go stale (redesigns, updated field IDs). Handle this with verify-on-use:
- Try the stored knowledge — use remembered selectors/flows
- If it fails — re-discover (run field enumeration, take screenshots)
- Update the memory — correct the stale data immediately
- Note the last verified date in the memory content
Don't preemptively invalidate — trust stored knowledge until it actually fails.
Naming Convention
Website memory files follow the standard memory naming:
YYYYMMDD-HHMM-site-slug.md
Examples:
20260211-2100-egov-maryland-llc-filing.md
20260212-1400-chase-bank-login.md
20260213-0900-irs-ein-application.md
Sharing
This skill (SKILL.md, reference.md) can be shared upstream or with peers. Per-site memories stay in each agent's private .kithkit/state/memory/memories/ directory. An agent receiving this skill will build their own site knowledge through use.
Cost Awareness
- Budget: 100 hours/month ($0.12/hr overage)
- Typical task: 2-5 minutes (login, navigate, grab data)
- At 5 min/task: budget supports ~1,200 tasks/month
- Proxy bandwidth: 1 GB/month — avoid large file downloads through Browserbase
- When in doubt: prefer local Playwright for anything that doesn't need stealth/auth
References
- reference.md — Detailed API reference, session lifecycle, troubleshooting, credentials, plan limits