بنقرة واحدة
بنقرة واحدة
| name | debug-e2e |
| description | Debug flaky Playwright E2E test failures from CI |
Use this skill when investigating flaky Playwright E2E test failures from CI.
When asked to survey recent failures rather than debug a specific one, build a table of all Playwright flakes across recent CI runs.
# List recent failed runs
gh run list --limit 80 --json databaseId,headBranch,conclusion,displayTitle,createdAt \
| jq -r '.[] | select(.conclusion == "failure") | "\(.databaseId)\t\(.headBranch)\t\(.displayTitle)\t\(.createdAt)"'
# For each failed run, find which Playwright jobs failed
gh run view <RUN_ID> --json jobs \
| jq -r '.jobs[] | select(.conclusion == "failure") | "\(.name)\t\(.conclusion)"'
# Extract test names and error summaries from failed runs
gh run view <RUN_ID> --log-failed 2>&1 | rg '› .*e2e\.ts.*›' | rg -v 'Retry'
Produce a markdown table with columns: Test, Browser, Run ID, Date. Sort by most recent first. Distinguish real flakes from bugs that were fixed between pushes (a test that fails across all browsers in one run and then disappears was likely a real bug, not a flake).
# View PR checks status
gh pr checks <PR_NUMBER>
# Get failed test logs
gh run view <RUN_ID> --log-failed 2>&1 | head -500
# Search for specific failure patterns
gh run view <RUN_ID> --log-failed 2>&1 | rg -C 30 "FAILED|Error:|Expected|Timed out"
Run the failing test multiple times to reproduce flaky behavior:
# Run specific test 20 times
npm run e2e -- test/e2e/<file>.e2e.ts --repeat-each=20 -g "<test name pattern>"
# Target specific browser if failure is browser-specific
npm run e2e -- test/e2e/<file>.e2e.ts --repeat-each=20 -g "<test name>" --project=<browser>
Start small to get signal quickly, then scale up only if needed.
Always run repeats on a single test or at most one file. Never repeat the whole suite.
Prefer narrowing with -g and --project first, then increase --repeat-each once the fix looks stable.
When tests fail, Playwright saves traces and error context:
# View error context (page snapshot at failure time)
cat test-results/<test-name>-<browser>/error-context.md
# Open trace viewer (interactive)
npx playwright show-trace test-results/<test-name>-<browser>/trace.zip
React Aria NumberInput flakiness: When fill() on a NumberInput doesn't stick, it's often because:
maxValue changing)Fix pattern - NumberInput helper:
import { fillNumberInput } from './utils'
await fillNumberInput(input, 'value')
Form hydration issues: Wait for a field that only renders after mount:
await expect(page.getByRole('radiogroup', { name: 'Block size' })).toBeVisible()
State change timing: When clicking changes form state, wait for visual confirmation:
await page.getByRole('radio', { name: 'Local' }).click()
// Wait for dependent UI to update
await expect(page.getByRole('radiogroup', { name: 'Block size' })).toBeHidden()
If deterministic waits don't work, use sleep() from test/e2e/utils.ts:
import { sleep } from './utils'
await sleep(200) // Use sparingly, prefer deterministic waits
Run at least 30-50 iterations to confirm flakiness is resolved:
npm run e2e -- test/e2e/<file>.e2e.ts --repeat-each=50 -g "<test name>"
A good target is 0 failures out of 100+ runs. Test on all browsers that failed in CI.