ワンクリックで
browser-test
// Validate a feature works by driving a real browser with Playwright MCP. No test files — just interactive verification.
// Validate a feature works by driving a real browser with Playwright MCP. No test files — just interactive verification.
| name | browser-test |
| description | Validate a feature works by driving a real browser with Playwright MCP. No test files — just interactive verification. |
| user-invocable | true |
| argument-hint | [port] [feature description or feature-file-path] |
You are the orchestrator. You do NOT drive the browser yourself. You spawn a focused sub-agent to do the browser work, monitor its progress, and collect results.
Parse $ARGUMENTS for:
5570) or :<port> formatspecs/*.feature fileIf a feature file path is given, read it now and extract the scenarios into a concrete checklist. If a plain description is given, use it directly. If neither is provided, use the default smoke test: app loads, sign in works, dashboard renders after auth.
$ARGUMENTS → use it.dev-port file in the repo root → source it for APP_PORT.dev-port? → run scripts/dev-up.sh and then read the .dev-port it creates# .dev-port format (written by dev-up.sh):
APP_PORT=5560
BASE_URL=http://localhost:5560
COMPOSE_PROJECT_NAME=langwatch-abcd1234
If a feature file was given, read it and turn each scenario into a numbered verification step. Example:
Feature file: specs/features/beta-pill.feature
Scenarios:
1. Navigate to dashboard → verify purple "Beta" badge next to Suites in sidebar
2. Hover over badge → verify popover appears with beta disclaimer text
3. Press Tab to focus badge → verify same popover appears via keyboard
browser-tests/<feature-name>/<YYYY-MM-DD>/screenshots/
Derive <feature-name> from: feature filename (without extension) > slugified description > branch name suffix.
Before verification, decide what data the feature under test requires. Many features need pre-existing data to be meaningful (e.g., a suites page needs at least one suite with runs, a trace viewer needs traces, an evaluations dashboard requires completed evaluations).
Include the seeding instructions in the sub-agent prompt (Step 3) so the sub-agent creates the data before verifying.
Use the Agent tool to spawn a sub-agent. Give it everything it needs in the prompt — port, verification steps, credentials, artifact path. The sub-agent has access to Playwright MCP tools and Bash.
Critical: The sub-agent prompt must include ALL of the following. Do not assume it knows anything — it starts with zero context:
You are a browser test agent. Your ONLY job is to drive a browser and verify features.
## Your mission
<paste the numbered verification steps here>
## Data seeding
Before verifying, create the minimal data the feature needs. Follow the checklist below.
Prefer seeding through the UI; use API/SDK only when the checklist explicitly calls for it:
<paste the seeding checklist from Step 2 here — e.g.:>
- Navigate to Suites → click "Create Suite" → fill name "Test Suite" → save
- Open the suite → add a scenario → run it once
- Wait for the run to complete before proceeding to verification
Only create what is listed above. Do not add extra data beyond what is needed.
## Connection
- App URL: http://localhost:<port>
- Browser: Chromium (headless) — use Playwright MCP tools
- Save screenshots to: <absolute artifact path>/screenshots/
## Auth (NextAuth credentials form, NOT Auth0)
- Navigate to the app → redirects to /auth/signin (Email + Password form)
- Email: browser-test@langwatch.ai
- Password: BrowserTest123!
- If "Register new account" needed, register first with same credentials
- Org name if onboarding: Browser Test Org
- After auth: dashboard shows "Hello, Browser" + "Browser Test Org" header
## How to interact
- Use browser_snapshot (accessibility tree) for finding elements — it's faster than screenshots
- Use browser_take_screenshot to capture evidence at each key step
- Use browser_wait_for with generous timeouts (60-120s for first page loads, dev mode is slow)
- Number screenshots sequentially: 01-sign-in.png, 02-dashboard.png, etc.
## Guardrails — READ THESE
- You have a maximum of 40 tool calls (seeding + verification). If you haven't finished, report what you verified and what's left.
- Do NOT debug app issues. If something doesn't work, screenshot it, mark it FAIL, and move on.
- Do NOT modify any files, fix any code, or investigate root causes.
- Do NOT go off-script. Only verify the steps listed above.
- If a step fails, take a screenshot, record FAIL, and continue to the next step.
- When done, return a markdown summary table: | # | Step | Result | Screenshot |
When the sub-agent returns:
browser-tests/<feature-name>/<YYYY-MM-DD>/report.md:# Browser Test: <feature-name>
**Date:** YYYY-MM-DD
**App:** http://localhost:<port>
**Browser:** Chromium (headless)
**Branch:** <current branch>
**PR:** #<number> (if known)
## Results
| # | Scenario | Result | Screenshot |
|---|----------|--------|------------|
| 1 | <name> | PASS | screenshots/01-xxx.png |
## Failures (if any)
- **Scenario 2:** Expected X but saw Y.
## Notes
<any observations>
.dev-port existed before), tear it down: scripts/dev-down.shScreenshots are uploaded to img402.dev (free, no auth) instead of committed to git. This avoids binary bloat in the repo.
Upload each screenshot to img402.dev:
curl -s -F "image=@browser-tests/<feature>/<date>/screenshots/01-xxx.jpeg" https://img402.dev/api/free
# Returns: {"url":"https://i.img402.dev/abc123.jpg", ...}
Collect the returned URLs for each screenshot.
Update the PR description with the results table using img402 URLs so images render inline:
Read the current PR body first (gh pr view --json body), then append a new section:
## Browser Test: <feature-name>
| # | Scenario | Result | Screenshot |
|---|----------|--------|------------|
| 1 | <name> | PASS |  |
Use gh api repos/langwatch/langwatch/pulls/<number> -X PATCH -f body="..." to update (not gh pr edit).
Do NOT commit browser-tests/ — it is gitignored. Screenshots are ephemeral local artifacts; the img402 URLs in the PR body are the permanent record.
Return the summary to the user/orchestrator. Include:
HOW_TO.md in this skill directory before your first run — it has gotchas about Chakra UI, dev mode slowness, and known issues. Include relevant warnings in the sub-agent prompt.Manage the LangWatch Kanban GitHub project board — sync statuses, view your board, find stale items, move issues, assign work.
Maintain the canonical LangWatch feature map (/feature-map.json). Use when adding features, APIs, MCP tools, CLI commands, or skills — to update the central registry and keep surfaces in sync.
Project-level code review: check changed files against LangWatch codebase rules (IDs, multitenancy, layering, naming, SRP).
Collaborative headed browser session for UI work. Launch Playwright Chromium visible to the user, handle auth, then interactively drive the browser while the user watches and gives real-time visual feedback. Edit code and refresh to verify fixes live. Use when the user says 'browser pair', 'paired browser', 'let's look at this together', 'open chromium', or wants to iterate on UI with live visual feedback.