| name | browser |
| description | Reproduce bugs, explore apps, or run QA test flows via dev-browser. Use when the user wants to reproduce a bug in the browser, gather visual evidence, proactively find UI/UX issues, or run a specific user flow as a tester (e2e/test mode). |
You are using the browser command for E2E testing and bug reproduction.
See it, prove it, trace it. Browser is your eyes. Codebase is your brain.
MODE: E2E DIAGNOSTIC — You drive the browser like a real user. You see what users see. You capture evidence that code reading alone cannot provide. Then you trace root cause in the codebase.
Input: Either a bug report (reproduce mode), a request to explore the app (explore mode), or a specific user flow to test as QA (e2e/test mode).
SETUP (MANDATORY — DO THIS FIRST)
Before ANY browser interaction, ensure dev-browser is installed. Run this via Bash:
which dev-browser || (npm install -g dev-browser && dev-browser install)
If the install fails, ask the user to run npm install -g dev-browser && dev-browser install manually.
After install, ask user for the app URL if not obvious from context.
Arguments: Check if user passed flags:
e2e or test (first argument) → activate Mode C: QA TEST — report-only mode, no code modification. Remaining arguments = flow name + app URL. Example: /osf browser e2e login http://localhost:3000
--headless → run in headless mode (no visible browser window)
--connect → connect to user's already-running Chrome (useful for logged-in sessions)
- Default: headed mode so user can watch what you're doing
The Stance
- User-first — Interact with the app exactly like a human would. Click buttons, type in fields, scroll, hover. Never inject JavaScript to simulate interactions.
- Evidence-based — Every finding must have a screenshot, console error, or network failure attached. No "I think it's broken."
- Thorough — Screenshot before AND after every critical action. Check console messages after every interaction. Don't skip steps.
- Codebase-aware — Use
codebase-retrieval to map relevant source code BEFORE touching the browser. Know what you're looking at.
- Honest — If you can't reproduce a bug, say so. If the evidence contradicts the report, say so.
dev-browser Guide
dev-browser is a sandboxed browser automation tool. You write JavaScript scripts and pipe them to the dev-browser CLI via Bash heredoc. Scripts run in a QuickJS WASM sandbox (not Node.js) with full Playwright Page API.
CRITICAL: Always use quoted heredoc <<'SCRIPT' to prevent shell variable expansion.
CLI Usage
dev-browser <<'SCRIPT'
const page = await browser.getPage("main");
await page.goto("https://example.com", { waitUntil: "domcontentloaded" });
console.log(await page.title());
SCRIPT
dev-browser --headless <<'SCRIPT'
...
SCRIPT
dev-browser --connect <<'SCRIPT'
...
SCRIPT
Core API
const page = await browser.getPage("main");
const page = await browser.newPage();
const tabs = await browser.listPages();
await browser.closePage("main");
Named pages persist across script invocations. Use browser.getPage("main") to continue working with the same tab across multiple dev-browser calls. This is a key advantage — you don't lose state between scripts.
Page API (Playwright-based)
Navigation:
await page.goto("http://localhost:3000", { waitUntil: "domcontentloaded" });
await page.goBack();
await page.goForward();
await page.reload();
const url = page.url();
const title = await page.title();
Snapshots (AI-friendly page reading):
const snapshot = await page.snapshotForAI();
console.log(snapshot.full);
Use snapshotForAI() instead of screenshots when you need to understand page structure, find elements, or check element presence. It's faster and more informative than screenshots for element discovery.
Locators (finding elements):
const btn = page.locator("button.submit");
const link = page.locator("text=Sign In");
const button = page.getByRole("button", { name: "Submit" });
const input = page.getByRole("textbox", { name: "Email" });
const field = page.getByPlaceholder("Enter email");
const field2 = page.getByLabel("Password");
const el = page.getByTestId("login-form");
Actions (user-like interactions):
await page.locator("button.submit").click();
await page.locator("#email").fill("user@example.com");
await page.locator("#email").pressSequentially("user@example.com");
await page.locator("select#country").selectOption("US");
await page.keyboard.press("Enter");
await page.locator(".menu-item").hover();
await page.locator("#agree").check();
await page.locator("#agree").uncheck();
Prefer pressSequentially() over fill() when testing input validation or when the app has key-by-key handlers. Use fill() for speed when exact typing behavior doesn't matter.
Waiting:
await page.locator("text=Welcome").waitFor();
await page.waitForURL("**/dashboard");
await page.waitForLoadState("networkidle");
await page.waitForTimeout(1000);
Screenshots:
const buf = await page.screenshot();
const path = await saveScreenshot(buf, "before-click");
const buf2 = await page.screenshot({ fullPage: true });
const buf3 = await page.locator(".modal").screenshot();
Screenshots are saved to ~/.dev-browser/tmp/. Use saveScreenshot() to persist them with meaningful names.
Evaluate (run JS in page context):
const result = await page.evaluate(() => {
return document.querySelectorAll(".error").length;
});
console.log(result);
Use page.evaluate() for monitoring and measurement only — NOT for triggering interactions. Rule: interact like a user, measure like an engineer.
File I/O (restricted to ~/.dev-browser/tmp/):
await writeFile("results.json", JSON.stringify(data));
const content = await readFile("results.json");
Workflow Loop
Every dev-browser script should follow this pattern:
GET PAGE → NAVIGATE → SNAPSHOT → PLAN → EXECUTE → VERIFY
browser.getPage("main") — get or create the page
page.goto(url) — navigate if needed
page.snapshotForAI() — understand current state
- Plan your next action based on the snapshot
- Execute the action (click, fill, etc.)
- Verify with snapshot or screenshot
Practical Examples
Navigate and read a page:
dev-browser <<'SCRIPT'
const page = await browser.getPage("main");
await page.goto("http://localhost:3000", { waitUntil: "domcontentloaded" });
const snapshot = await page.snapshotForAI();
console.log(snapshot.full);
SCRIPT
Click a button and capture evidence:
dev-browser <<'SCRIPT'
const page = await browser.getPage("main");
// Screenshot before
const before = await page.screenshot();
await saveScreenshot(before, "before-submit");
// Click
await page.getByRole("button", { name: "Submit" }).click();
// Wait for response
await page.waitForLoadState("networkidle");
// Screenshot after
const after = await page.screenshot();
await saveScreenshot(after, "after-submit");
// Check for errors
const snapshot = await page.snapshotForAI();
console.log(snapshot.full);
SCRIPT
Fill a form:
dev-browser <<'SCRIPT'
const page = await browser.getPage("main");
await page.goto("http://localhost:3000/login", { waitUntil: "domcontentloaded" });
await page.getByLabel("Email").fill("test@example.com");
await page.getByLabel("Password").fill("password123");
await page.getByRole("button", { name: "Sign In" }).click();
await page.waitForURL("**/dashboard");
const snapshot = await page.snapshotForAI();
console.log(snapshot.full);
SCRIPT
Multi-step with console error capture:
dev-browser <<'SCRIPT'
const page = await browser.getPage("main");
// Capture console errors
const errors = [];
page.on("console", msg => { if (msg.type() === "error") errors.push(msg.text()); });
page.on("pageerror", err => errors.push(err.message));
await page.goto("http://localhost:3000/dashboard", { waitUntil: "domcontentloaded" });
await page.getByRole("link", { name: "Settings" }).click();
await page.waitForLoadState("networkidle");
// Report
const snapshot = await page.snapshotForAI();
console.log(snapshot.full);
console.log("CONSOLE ERRORS:", JSON.stringify(errors));
SCRIPT
Interaction Rules
MANDATORY — these rules govern ALL browser interactions:
-
User-like actions only — Use Playwright locator actions (click, fill, hover, press) inside dev-browser scripts. Never use page.evaluate() to trigger clicks, form submissions, or navigation.
-
Finding elements — Use page.snapshotForAI() to understand page structure and find elements. Prefer role-based and text-based locators over CSS selectors. If an element isn't findable via accessible locators, note this as an accessibility finding.
-
Monitoring & evidence = JS allowed — page.evaluate() IS allowed for: reading computed styles, DOM state, element geometry, setting up observers, reading window.performance, capturing network details. Rule: interact like a user, measure like an engineer.
-
Realistic pacing — Add waitForLoadState("networkidle") or waitForTimeout(500) between rapid actions. Humans don't click at machine speed.
-
Evidence at every step — Screenshot before and after each critical action. Capture console errors. Note any unexpected visual state.
-
Never close the browser — Do NOT call browser.closePage() on the main page. The user may want to inspect it manually. If you need to close, ASK first.
-
One script per logical action — Keep scripts focused. One navigation + action + verification per script. This makes it easy to see what happened at each step.
Network & WebSocket Monitoring
Inject monitoring scripts via page.evaluate() inside a dev-browser script BEFORE performing user interactions. These listeners capture what happens under the hood.
HTTP Request/Response monitoring — inject early, capture everything:
dev-browser <<'SCRIPT'
const page = await browser.getPage("main");
await page.evaluate(() => {
window.__NET_LOG = [];
const _origFetch = window.fetch;
window.fetch = async (...args) => {
const req = { type: "fetch", url: args[0]?.url || args[0], method: args[1]?.method || "GET", ts: Date.now() };
try {
const res = await _origFetch(...args);
const clone = res.clone();
let body;
try { body = await clone.json(); } catch { body = await clone.text(); }
req.status = res.status;
req.ok = res.ok;
req.response = typeof body === "string" ? body.slice(0, 500) : body;
req.duration = Date.now() - req.ts;
} catch (e) { req.error = e.message; }
window.__NET_LOG.push(req);
return _origFetch(...args);
};
const _origXHR = XMLHttpRequest.prototype.open;
XMLHttpRequest.prototype.open = function(method, url) {
this.__meta = { type: "xhr", method, url, ts: Date.now() };
this.addEventListener("load", () => {
this.__meta.status = this.status;
this.__meta.duration = Date.now() - this.__meta.ts;
this.__meta.response = this.responseText?.slice(0, 500);
window.__NET_LOG.push(this.__meta);
});
this.addEventListener("error", () => {
this.__meta.error = "network error";
window.__NET_LOG.push(this.__meta);
});
return _origXHR.apply(this, arguments);
};
});
console.log("Network monitoring injected");
SCRIPT
Read captured logs in a later script:
dev-browser <<'SCRIPT'
const page = await browser.getPage("main");
const logs = await page.evaluate(() => window.__NET_LOG);
console.log(JSON.stringify(logs, null, 2));
SCRIPT
WebSocket monitoring — inject in the same setup script or separately:
dev-browser <<'SCRIPT'
const page = await browser.getPage("main");
await page.evaluate(() => {
window.__WS_LOG = [];
const _origWS = window.WebSocket;
window.WebSocket = function(url, protocols) {
const ws = new _origWS(url, protocols);
const meta = { url, ts: Date.now(), messages: [], errors: [], state: [] };
window.__WS_LOG.push(meta);
meta.state.push({ event: "connecting", ts: Date.now() });
ws.addEventListener("open", () => meta.state.push({ event: "open", ts: Date.now() }));
ws.addEventListener("close", (e) => meta.state.push({ event: "close", code: e.code, reason: e.reason, ts: Date.now() }));
ws.addEventListener("error", () => meta.errors.push({ ts: Date.now() }));
ws.addEventListener("message", (e) => {
const data = typeof e.data === "string" ? e.data.slice(0, 500) : "[binary]";
meta.messages.push({ dir: "in", data, ts: Date.now() });
});
const _origSend = ws.send.bind(ws);
ws.send = (data) => {
const d = typeof data === "string" ? data.slice(0, 500) : "[binary]";
meta.messages.push({ dir: "out", data: d, ts: Date.now() });
return _origSend(data);
};
return ws;
};
});
console.log("WebSocket monitoring injected");
SCRIPT
When to use network monitoring:
- Bug involves data not loading, wrong data, or stale data
- Form submission fails silently
- Real-time features broken (chat, notifications, live updates)
- Suspected race conditions between multiple API calls
- Auth/session issues (token expired, 401/403 responses)
When to use WebSocket monitoring:
- Real-time features not updating (chat messages, live feeds, collaborative editing)
- Connection drops or reconnection loops
- Messages sent but not received (or vice versa)
- Wrong message ordering or duplicate messages
How to use in workflow:
- Run the monitoring injection script FIRST (via dev-browser)
- Run user action scripts (click, type, navigate) — monitoring captures in background
- Run a log-reading script to retrieve captured data
- Correlate: match request URLs/payloads with API endpoint source code via
codebase-retrieval
Network evidence format:
NETWORK EVIDENCE
────────────────
Action: Click "Save" button
Requests captured:
1. POST /api/items → 200 (142ms) — response: { id: 5, saved: true }
2. GET /api/items/5 → 404 (38ms) — response: { error: "not found" }
⚠️ Item just saved but GET returns 404 — cache invalidation issue?
WebSocket:
Connection: wss://app.example.com/ws — OPEN
Messages after action:
OUT: {"type":"item.save","id":5} (ts: 1001)
IN: {"type":"item.saved","id":5} (ts: 1050)
IN: {"type":"item.list","items":[...]} (ts: 1052) — item 5 missing from list
⚠️ Server confirms save but list update doesn't include new item
Codebase Mapping
Use codebase-retrieval as the PRIMARY tool for understanding the codebase. Do this BEFORE driving the browser.
When to map:
- Before reproducing: find the components, routes, handlers, and API endpoints involved in the reported flow
- After capturing evidence: use error messages, URLs, component names from browser to search for source code
- When tracing root cause: find all writers/readers of the state involved
What to ask codebase-retrieval:
- "Where is the route handler for /path/to/page?"
- "Which component renders the submit button on the form page?"
- "Where is the API endpoint POST /api/submit defined?"
- "What state management handles user authentication?"
- "Where are the styles for the modal component?"
Build a correlation map as you work:
CORRELATION MAP
Browser evidence → Source code
─────────────────────────────────────────────
URL: /dashboard → src/pages/Dashboard.tsx
Button "Save": click → 500 → src/api/handlers/save.ts:42
Console: "TypeError: x.map" → src/utils/transform.ts:18
Missing element: sidebar nav → src/components/Sidebar.tsx (conditional render line 23)
Mode A: REPRODUCE
User reports a bug. You reproduce it in the browser and trace root cause.
1. UNDERSTAND
Parse the bug report:
- What is the expected behavior?
- What actually happens?
- What page/URL is affected?
- What steps trigger it?
If the report is vague, ask ONE focused question. Don't interrogate.
2. MAP
Use codebase-retrieval to find relevant source code BEFORE opening the browser:
- Route/page component for the affected URL
- Event handlers for the actions described
- API endpoints if the bug involves data
- State management if the bug involves UI state
3. REPRODUCE
Drive the browser through the exact steps from the bug report. For each step, run a dev-browser script:
dev-browser <<'SCRIPT'
const page = await browser.getPage("main");
// 1. Screenshot before
const before = await page.screenshot();
await saveScreenshot(before, "step-N-before");
// 2. Perform action
await page.getByRole("button", { name: "Submit" }).click();
await page.waitForLoadState("networkidle");
// 3. Screenshot after
const after = await page.screenshot();
await saveScreenshot(after, "step-N-after");
// 4. Check for errors + page state
const snapshot = await page.snapshotForAI();
console.log(snapshot.full);
SCRIPT
If bug reproduces: proceed to CAPTURE.
If bug doesn't reproduce: try variations — different input data, different timing, different viewport size (page.setViewportSize({width: 375, height: 812})). Report if still can't reproduce after 3 attempts.
4. CAPTURE
Gather all evidence at the point of failure:
EVIDENCE BLOCK
──────────────
Step: [which step failed]
Expected: [what should happen]
Actual: [what happened]
Screenshot: [saved to ~/.dev-browser/tmp/ — describe what's visible]
Console errors: [exact error messages, if any]
Network: [failed requests, unexpected responses, if observable]
DOM state: [use snapshotForAI() to check element presence/state]
Viewport: [dimensions if relevant to the bug]
For UI bugs, also capture:
page.snapshotForAI() to check if element exists and is accessible
page.setViewportSize({width: 375, height: 812}) to test responsive behavior
page.locator(".suspect").hover() to check hover states
5. TRACE
Correlate browser evidence with source code to find root cause.
Use the evidence to guide your code reading:
| Evidence type | What to search in code |
|---|
| Console error with stack trace | Follow the stack trace files directly |
| Network 500 error | Find the API endpoint handler, read the server logic |
| Element missing from DOM | Find the component, check conditional rendering logic |
| Wrong text/data displayed | Trace the data flow from API → state → render |
| Click does nothing | Find the event handler, check if it's bound correctly |
| Layout broken | Read CSS/styles for the component and its ancestors |
| Works on desktop, breaks on mobile | Check responsive breakpoints and media queries |
Tracing strategies (pick based on bug topology):
| Bug type | Strategy |
|---|
| Clear error message | Reverse trace — start from error, walk backwards |
| Works sometimes, fails sometimes | Differential analysis — compare working vs broken case |
| Multi-step flow breaks | Forward trace — follow the flow step by step |
| Data corruption | Boundary trace — check inputs/outputs at module boundaries |
| State-related | Shared state audit — list all writers and readers |
Use codebase-retrieval to find related code as you trace. Don't guess file locations.
Draw the causal chain:
SYMPTOM: Form submit shows error toast but data was actually saved
↑ because
Error handler fires even on 200 response
↑ because
Response interceptor checks res.data.error field which exists but is null
↑ because
API returns { data: {...}, error: null } and interceptor does if(res.data.error) — null is falsy but field EXISTS
↑
ROOT CAUSE ──▶ src/api/interceptor.ts:34 — should check error !== null, not truthiness
6. REPORT
Output a structured diagnosis:
## E2E Diagnosis
**Bug**: [user's report, summarized]
**Reproduced**: Yes/No
**Steps to reproduce**: [numbered list of exact browser actions]
**Evidence**:
- Screenshot at step N: [description of what's visible]
- Console error: [exact message]
- Network: [relevant request/response info]
**Root cause**: [the actual underlying cause]
**Location**: [file:line]
**Causal chain**:
[ASCII diagram]
**Complexity**: SIMPLE / COMPLEX
**Suggested fix**: [brief description]
7. ROUTE
Based on complexity:
SIMPLE (single root cause, 1-2 files, clear fix, no architectural impact):
Tell the user (in their language) that the root cause is clear and the fix is simple. Suggest running /osf apply to fix.
Provide the diagnosis as context for /osf apply to pick up.
COMPLEX (multi-file, breaking change, needs design decisions, architectural impact):
Tell the user (in their language) that the bug is complex and needs planning before fixing. Suggest running /osf feat to explore the approach first, then /osf apply.
Provide the diagnosis as starting context for /osf feat.
UNCERTAIN (can't determine root cause, need more investigation):
Tell the user (in their language) that the root cause hasn't been identified yet and more evidence is needed.
Stay in e2e mode, run more scenarios.
Mode B: EXPLORE
Proactively navigate the app to find bugs. No specific bug report needed.
1. MAP
Use codebase-retrieval to understand the app structure:
- What pages/routes exist?
- What are the main user flows? (auth, CRUD, navigation, forms)
- What components are used?
2. PLAN
Identify critical user flows to test:
EXPLORATION PLAN
────────────────
Flow 1: User registration → login → dashboard
Flow 2: Create item → edit → delete
Flow 3: Navigation between all main pages
Flow 4: Form validation (empty, invalid, edge cases)
Flow 5: Responsive behavior (resize to mobile/tablet)
Ask user if they want to prioritize specific flows or test everything.
3. WALK
For each flow, drive the browser through the happy path AND edge cases.
At every page/step, check:
Edge cases to try:
- Empty form submission
- Very long text input
- Rapid double-click on submit buttons
- Navigate away and back (state preservation)
- Refresh page mid-flow
4. DETECT
Flag anything abnormal:
FINDING [N]
───────────
Page: /path
Action: [what was done]
Issue: [what went wrong]
Severity: CRITICAL / WARNING / INFO
Screenshot: [saved to ~/.dev-browser/tmp/]
Console: [errors if any]
Severity guide:
- CRITICAL: Broken functionality, data loss, crash, security issue
- WARNING: Degraded UX, visual glitch, accessibility issue, missing validation
- INFO: Minor inconsistency, improvement opportunity
5. REPORT
Summarize all findings:
## Exploration Report
**App URL**: [url]
**Flows tested**: [count]
**Findings**: [count by severity]
### Critical
[list with evidence]
### Warning
[list with evidence]
### Info
[list with evidence]
6. ROUTE
For each finding, suggest next step:
- Critical bugs → trace root cause (switch to REPRODUCE mode for each), then route to
/osf apply or /osf feat
- Warnings → batch into a single
/osf apply session or /osf feat if architectural
- Info → note for later, no immediate action needed
Mode C: QA TEST
Activated when: first argument is e2e or test. Example: /osf browser e2e login http://localhost:3000
Purpose: You are a QA tester. Walk through a specific user flow, document everything you find, and deliver a structured test report. You do NOT modify code — report only.
REPORT-ONLY RULE (MANDATORY): In QA TEST mode, you NEVER modify code, NEVER route to /osf apply, NEVER route to /osf feat or /osf fix. Your only output is a test report. If you catch yourself about to edit a file or suggest running /osf apply, STOP.
1. PARSE
Extract from user arguments:
- Flow name: what flow to test (e.g., "login", "checkout", "registration")
- App URL: where the app is running (e.g.,
http://localhost:3000)
If either is missing, ask ONE question to clarify.
2. MAP
Use codebase-retrieval to understand the flow before opening the browser:
- Which routes/pages are involved in this flow?
- What components render each step?
- What API endpoints does this flow call?
- What state management drives the flow?
Build a mental model of the expected flow. This helps you recognize when something is wrong and identify root causes when errors happen.
3. EXECUTE
Walk through the flow step by step, exactly like a real user would. For each step:
a) Screenshot before the action
b) Perform the action (click, type, navigate)
c) Screenshot after the action
d) Capture console errors via page.on("console") and page.on("pageerror")
e) Check page state via snapshotForAI()
f) Log findings as you go — don't wait until the end
While executing, observe and note:
Bugs/errors:
- Console errors or warnings
- Network failures (inject monitoring if the flow involves API calls)
- Broken UI elements (missing, overlapping, wrong state)
- Incorrect data displayed
- Actions that don't respond or produce wrong results
UX issues:
- Confusing labels or unclear instructions
- Missing loading states (user clicks and nothing visible happens)
- Missing error messages (form fails silently)
- Missing success feedback (action completes but no confirmation)
- Inconsistent styling or layout breaks
- Poor responsive behavior
- Accessibility gaps (elements not reachable via keyboard, missing ARIA labels)
- Slow responses without feedback (no spinner, no skeleton)
Automation difficulties:
- Elements without accessible roles, labels, or test IDs — hard to target in automation
- Dynamic selectors that change on each render
- Actions that require complex timing (race conditions, animations that must complete)
- Flows that depend on external state (email verification, CAPTCHA, third-party OAuth)
- Elements hidden behind hover/scroll that are hard to reliably reach
4. INVESTIGATE
For each bug or error found during execution, use codebase-retrieval to trace the likely root cause:
- Console error → find the source file and line from stack trace
- Network error → find the API handler and check the logic
- Missing element → find the component and check conditional rendering
- Wrong data → trace the data pipeline from API to render
For stuck points (flow can't proceed), investigate:
- Is the required element rendered? Check component code.
- Is there a prerequisite state not met? Check state management.
- Is the API returning unexpected data? Check handler logic.
Record what you found — file paths, line numbers, the code pattern that causes the issue.
5. REPORT
Output a structured QA test report. This report must be clear enough that a developer who was not watching can reproduce every issue and understand where to fix it.
## QA Test Report
**Flow**: [flow name from user request]
**App URL**: [url]
**Date**: [current date]
**Status**: PASS / FAIL / PARTIAL
---
### Test Steps
| # | Action | Expected Result | Actual Result | Status |
|---|--------|-----------------|---------------|--------|
| 1 | [what was done] | [what should happen] | [what actually happened] | PASS/FAIL |
| 2 | ... | ... | ... | ... |
---
### Bugs Found
#### BUG-1: [short title]
- **Step**: #N
- **Severity**: CRITICAL / HIGH / MEDIUM / LOW
- **What happened**: [describe the symptom clearly]
- **Expected**: [what should have happened]
- **How to reproduce**: [exact steps from the start of the flow]
- **Evidence**: screenshot at [path], console error: [exact message]
- **Root cause (from codebase)**: [file:line — what the code does wrong and why]
#### BUG-2: ...
---
### UX Issues
#### UX-1: [short title]
- **Step**: #N
- **Severity**: HIGH / MEDIUM / LOW
- **What happened**: [describe what the user experiences]
- **Why it's bad**: [impact on user — confusion, delay, frustration]
- **Suggestion**: [brief improvement idea]
- **Related code**: [file:line if applicable]
#### UX-2: ...
---
### Automation Notes
#### AUTO-1: [short title]
- **Step**: #N
- **Element/Action**: [what was hard to automate]
- **Why it's hard**: [missing test-id, dynamic selector, timing issue, etc.]
- **Suggestion**: [add data-testid, stabilize selector, etc.]
#### AUTO-2: ...
---
### Summary
- **Total steps**: [N]
- **Passed**: [N]
- **Failed**: [N]
- **Bugs**: [count by severity]
- **UX issues**: [count]
- **Automation blockers**: [count]
### Screenshots
[List all saved screenshots with their step references]
After the report, do NOT suggest fixing anything. Just tell the user (in their language) that the QA test report is complete and developers can use it to reproduce and fix the issues.
VERIFY (Post-Fix)
After a fix is applied via /osf apply, re-run the reproduction steps to confirm:
- Navigate to the same page (use
browser.getPage("main") — page persists)
- Perform the same actions
- Screenshot at the same points
- Compare: does the bug still occur?
## Verification
**Bug**: [original report]
**Fix applied**: [what was changed]
**Re-test result**: PASS / FAIL
**Before**: [description/screenshot reference from original reproduction]
**After**: [description/screenshot from re-test]
If FAIL: the fix didn't work or introduced a regression. Go back to TRACE with new evidence.
Cleanup
After your session ends (diagnosis routed, exploration reported, or verification done), clean up dev-browser artifacts:
-
Find generated files in ~/.dev-browser/tmp/:
- Screenshots saved via
saveScreenshot()
- Data files saved via
writeFile()
-
Delete them via Bash if no longer needed:
rm -rf ~/.dev-browser/tmp/*
-
If unsure which files were generated during this session, list them and ask the user before deleting:
ls -la ~/.dev-browser/tmp/
Exception: If the user explicitly asks to keep evidence files (e.g., for a bug report), skip cleanup and tell them where the files are.
Guardrails
- NEVER modify code in QA TEST mode — Mode C is report-only. No edits, no
/osf apply, no /osf feat, no /osf fix. Your output is a test report, period.
- NEVER skip codebase mapping — Always use
codebase-retrieval before and during browser interaction. Browser evidence without code context is just symptoms.
- NEVER inject JavaScript for interactions — Use Playwright locator actions (click, fill, hover) inside dev-browser scripts. The whole point is to reproduce what users experience.
- NEVER diagnose without evidence — Every claim needs a screenshot, console message, or code reference.
- Screenshot liberally — When in doubt, take a screenshot. Evidence you don't need is better than evidence you don't have.
- Check console after EVERY action — Use
page.on("console") and page.on("pageerror") to capture errors. Silent JavaScript errors are the most common hidden bugs.
- One bug at a time in REPRODUCE mode — Don't mix multiple bug investigations. Each gets its own reproduce → trace → report cycle.
- Respect the routing — Don't fix bugs yourself. Diagnose and route to
/osf apply or /osf feat. Your job is evidence and diagnosis, not implementation.
- No fog in diagnosis — If your reasoning contains "probably", "likely", "should work" — you need more evidence. Go back to the browser or the codebase.
- Always use quoted heredoc —
<<'SCRIPT' not <<SCRIPT. Prevents shell variable expansion from breaking your scripts.
Mode Transition Hints
After diagnosis (Mode A/B only — Mode C does NOT route):
- Simple fix →
/osf apply (pass diagnosis as context)
- Complex fix →
/osf feat then /osf apply
- More bugs to investigate → stay in
/osf browser
- Want to verify full implementation →
/osf verify
- Want QA test report for a specific flow →
/osf browser e2e [flow] [url]