with one click
browser-test
// Validate a feature works by driving a real browser with Playwright MCP. No test files — just interactive verification.
// Validate a feature works by driving a real browser with Playwright MCP. No test files — just interactive verification.
[HINT] Download the complete skill directory including SKILL.md and all related files
| name | browser-test |
| description | Validate a feature works by driving a real browser with Playwright MCP. No test files — just interactive verification. |
| user-invocable | true |
| argument-hint | [port] [feature description or feature-file-path] |
You are the orchestrator. You do NOT drive the browser yourself. You spawn a focused sub-agent to do the browser work, monitor its progress, and collect results.
Parse $ARGUMENTS for:
5570) or :<port> formatspecs/*.feature fileIf a feature file path is given, read it now and extract the scenarios into a concrete checklist. If a plain description is given, use it directly. If neither is provided, use the default smoke test: app loads, sign in works, dashboard renders after auth.
$ARGUMENTS → use it.dev-port file in the repo root → source it for APP_PORT.dev-port? → run scripts/dev-up.sh and then read the .dev-port it creates# .dev-port format (written by dev-up.sh):
APP_PORT=5560
BASE_URL=http://localhost:5560
COMPOSE_PROJECT_NAME=langwatch-abcd1234
If a feature file was given, read it and turn each scenario into a numbered verification step. Example:
Feature file: specs/features/beta-pill.feature
Scenarios:
1. Navigate to dashboard → verify purple "Beta" badge next to Suites in sidebar
2. Hover over badge → verify popover appears with beta disclaimer text
3. Press Tab to focus badge → verify same popover appears via keyboard
browser-tests/<feature-name>/<YYYY-MM-DD>/screenshots/
Derive <feature-name> from: feature filename (without extension) > slugified description > branch name suffix.
Before verification, decide what data the feature under test requires. Many features need pre-existing data to be meaningful (e.g., a suites page needs at least one suite with runs, a trace viewer needs traces, an evaluations dashboard requires completed evaluations).
Include the seeding instructions in the sub-agent prompt (Step 3) so the sub-agent creates the data before verifying.
Use the Agent tool to spawn a sub-agent. Give it everything it needs in the prompt — port, verification steps, credentials, artifact path. The sub-agent has access to Playwright MCP tools and Bash.
Critical: The sub-agent prompt must include ALL of the following. Do not assume it knows anything — it starts with zero context:
You are a browser test agent. Your ONLY job is to drive a browser and verify features.
## Your mission
<paste the numbered verification steps here>
## Data seeding
Before verifying, create the minimal data the feature needs. Follow the checklist below.
Prefer seeding through the UI; use API/SDK only when the checklist explicitly calls for it:
<paste the seeding checklist from Step 2 here — e.g.:>
- Navigate to Suites → click "Create Suite" → fill name "Test Suite" → save
- Open the suite → add a scenario → run it once
- Wait for the run to complete before proceeding to verification
Only create what is listed above. Do not add extra data beyond what is needed.
## Connection
- App URL: http://localhost:<port>
- Browser: Chromium (headless) — use Playwright MCP tools
- Save screenshots to: <absolute artifact path>/screenshots/
## Auth (NextAuth credentials form, NOT Auth0)
- Navigate to the app → redirects to /auth/signin (Email + Password form)
- Email: browser-test@langwatch.ai
- Password: BrowserTest123!
- If "Register new account" needed, register first with same credentials
- Org name if onboarding: Browser Test Org
- After auth: dashboard shows "Hello, Browser" + "Browser Test Org" header
## How to interact
- Use browser_snapshot (accessibility tree) for finding elements — it's faster than screenshots
- Use browser_take_screenshot to capture evidence at each key step
- Use browser_wait_for with generous timeouts (60-120s for first page loads, dev mode is slow)
- Number screenshots sequentially: 01-sign-in.png, 02-dashboard.png, etc.
## Guardrails — READ THESE
- You have a maximum of 40 tool calls (seeding + verification). If you haven't finished, report what you verified and what's left.
- Do NOT debug app issues. If something doesn't work, screenshot it, mark it FAIL, and move on.
- Do NOT modify any files, fix any code, or investigate root causes.
- Do NOT go off-script. Only verify the steps listed above.
- If a step fails, take a screenshot, record FAIL, and continue to the next step.
- When done, return a markdown summary table: | # | Step | Result | Screenshot |
When the sub-agent returns:
browser-tests/<feature-name>/<YYYY-MM-DD>/report.md:# Browser Test: <feature-name>
**Date:** YYYY-MM-DD
**App:** http://localhost:<port>
**Browser:** Chromium (headless)
**Branch:** <current branch>
**PR:** #<number> (if known)
## Results
| # | Scenario | Result | Screenshot |
|---|----------|--------|------------|
| 1 | <name> | PASS | screenshots/01-xxx.png |
## Failures (if any)
- **Scenario 2:** Expected X but saw Y.
## Notes
<any observations>
.dev-port existed before), tear it down: scripts/dev-down.shScreenshots are uploaded to img402.dev (free, no auth) instead of committed to git. This avoids binary bloat in the repo.
Upload each screenshot to img402.dev:
curl -s -F "image=@browser-tests/<feature>/<date>/screenshots/01-xxx.jpeg" https://img402.dev/api/free
# Returns: {"url":"https://i.img402.dev/abc123.jpg", ...}
Collect the returned URLs for each screenshot.
Update the PR description with the results table using img402 URLs so images render inline:
Read the current PR body first (gh pr view --json body), then append a new section:
## Browser Test: <feature-name>
| # | Scenario | Result | Screenshot |
|---|----------|--------|------------|
| 1 | <name> | PASS |  |
Use gh api repos/langwatch/langwatch/pulls/<number> -X PATCH -f body="..." to update (not gh pr edit).
Do NOT commit browser-tests/ — it is gitignored. Screenshots are ephemeral local artifacts; the img402 URLs in the PR body are the permanent record.
Return the summary to the user/orchestrator. Include:
HOW_TO.md in this skill directory before your first run — it has gotchas about Chakra UI, dev mode slowness, and known issues. Include relevant warnings in the sub-agent prompt.