ワンクリックで
e2e-testing
Guide for writing, running, and debugging Playwright E2E tests in eval.
Codex または Claude でインストール この Prompt をコピーして Codex、Claude、または他のアシスタントに貼り付けると、Skill ページを確認してインストールできます。
メニュー
Guide for writing, running, and debugging Playwright E2E tests in eval.
Codex または Claude でインストール この Prompt をコピーして Codex、Claude、または他のアシスタントに貼り付けると、Skill ページを確認してインストールできます。
SOC 職業分類に基づく
Pure development workflow with test-first development and coverage review. Used by coordinator as a subagent. Never manages beads issues or commits.
Autonomous codebase cruft discovery. Scans for duplication, dead code, leaky abstractions, pattern divergence, and complexity. Files findings as beads issues. Invoked via /refactor-finder.
Single entry point for all implementation work. Triages tasks, manages beads issues, delegates to implementer skill, runs reviewers, creates PRs.
Resolves rebase conflicts by gathering full context from beads issues, git diffs, and surrounding code. Invoked by coordinator and merge-queue after a fast-path rebase fails.
Process open PRs — merge when CI passes, handle rebases, file issues for failures. Run in a dedicated window.
Collaboratively plan epics by exploring the codebase, discussing tradeoffs, filing issues, and running plan review. Invoked via /plan.
| name | e2e-testing |
| description | Guide for writing, running, and debugging Playwright E2E tests in eval. |
Guide for writing, running, and debugging Playwright E2E tests. The eval platform uses a multi-service architecture (Go API + Next.js + PostgreSQL + Redis + Centrifugo + executor) orchestrated by a shell script for E2E runs.
When a test fails, follow this sequence:
Playwright error messages are descriptive. They tell you exactly what selector failed and why. Start there.
Failed tests generate test-results/<test-name>/error-context.md with a YAML representation of the page structure:
- heading "Dashboard" [level=1] [ref=e10]
- paragraph [ref=e11]: Enter your section code to get started
- textbox "Section Join Code" [active] [ref=e15]
- button "Join Section" [disabled] [ref=e16]
This shows the actual DOM state at failure time — often more useful than screenshots for understanding what elements are rendered and their states.
On failure, Playwright captures:
frontend/test-results/<test-name>/ — shows what the page looked likelogCollector fixture)Open the HTML report:
cd frontend && npx playwright show-report
The test fixtures log API requests/responses to the browser console. Look for non-200 responses or unexpected error bodies in the console log artifacts.
Common failure patterns:
For interactive debugging:
# Start infrastructure first (if not already running)
./scripts/ensure-test-postgres.sh
# Start API manually
./scripts/ensure-test-api.sh
# Run a single test with browser visible
cd frontend && npx playwright test e2e/your-test.spec.ts --headed
# Or with Playwright Inspector (step-by-step debugging)
cd frontend && npx playwright test e2e/your-test.spec.ts --debug
Note: For headed/debug mode, you need the backend services running separately. The make test-e2e script manages this automatically for CI-style runs, but for interactive debugging you start services manually.
make test-e2e
This orchestrates everything:
# Option 1: Via make (handles all infrastructure)
make test-e2e -- e2e/your-test.spec.ts
# Option 2: Manual (requires infrastructure already running)
cd frontend && API_BASE_URL=http://localhost:$API_PORT npx playwright test e2e/your-test.spec.ts
cd frontend && npx playwright test -g "test name substring"
Every test file follows this pattern:
import { test, expect } from './fixtures/test-fixture';
import { signInAs } from './fixtures/auth';
import { createClass, createSection, /* ... */ } from './fixtures/api-setup';
test.describe('Feature Name', () => {
test('what it does', async ({ page, browser, testNamespace, setupInstructor, logCollector }) => {
// 1. API SETUP — create data via HTTP helpers (fast, no UI interaction)
const instructor = await setupInstructor();
const cls = await createClass(instructor.token, 'Test Class');
const section = await createSection(instructor.token, cls.id, 'Test Section');
// 2. UI INTERACTION — sign in, navigate, interact
await signInAs(page, instructor.email);
await page.goto('/instructor');
// 3. ASSERTIONS — verify expected state
await expect(page.locator('h2:has-text("Dashboard")')).toBeVisible();
});
});
Every test gets a unique testNamespace via the fixture. Use it when creating test data to avoid collisions:
const studentExternalId = `student-${testNamespace}`;
const studentEmail = `${studentExternalId}@test.local`;
The namespace is auto-created by the testNamespace fixture. All data created within a namespace is isolated from other tests.
Always create test data via API helpers, not UI interactions. This is faster and more reliable. Use UI interactions only to test the UI flow you're actually verifying.
Available helpers in e2e/fixtures/api-setup.ts:
| Helper | Purpose |
|---|---|
createNamespace(id, name) | Create isolated namespace |
createInvitation(email, role, namespaceId) | Invite a user |
acceptInvitation(invId, token, displayName) | Accept invitation (creates user) |
createClass(token, name) | Create a class |
createSection(token, classId, name) | Create a section (returns join_code) |
createProblem(token, classId, opts) | Create a coding problem |
startSession(token, sectionId, sectionName) | Start a session (inline problem) |
startSessionFromProblem(token, sectionId, problemId) | Start session from existing problem |
registerStudent(joinCode, extId, email, name) | Register student in section |
getSectionByJoinCode(joinCode) | Look up section/class by join code |
The setupInstructor fixture combines createInvitation + acceptInvitation into one call.
The platform runs in test auth mode (AUTH_MODE=test). Tokens are format test:<externalId>:<email>.
// Sign in through the UI (for testing the login flow)
import { signInAs, loginAsSystemAdmin } from './fixtures/auth';
await signInAs(page, 'user@test.local');
// Generate a token for API calls (no UI needed)
import { testToken } from './fixtures/api-setup';
const token = testToken('my-external-id', 'my@test.local');
Use separate browser contexts for different users:
test('multi-actor flow', async ({ page, browser, testNamespace, setupInstructor, logCollector }) => {
const instructor = await setupInstructor();
// Create a separate browser context for the instructor
const instructorContext = await browser.newContext();
const instructorPage = await instructorContext.newPage();
logCollector.attachPage(instructorPage, 'instructor-page');
try {
await signInAs(instructorPage, instructor.email);
// ... instructor actions ...
// Default `page` is for the student
await signInAs(page, studentEmail);
// ... student actions ...
} finally {
await instructorContext.close();
}
});
Always attach additional pages to the logCollector so their console logs are captured on failure.
expect(...).toBeVisible(), page.waitForURL() etc. handle retries automatically.waitForTimeout sparingly — only for debounce windows (code sync has a 500ms debounce) and brief settle times.test.setTimeout(60000) for multi-actor tests; { timeout: 15000 } for slow assertions (e.g., waiting for code execution results).In test builds (AUTH_MODE=test), editor instances are exposed on window.__TEST_EDITORS. Use helpers from e2e/fixtures/monaco.ts:
import { waitForMonacoReady, setMonacoValue, getMonacoValue } from './fixtures/monaco';
await waitForMonacoReady(page);
await setMonacoValue(page, 'print("hello")');
const code = await getMonacoValue(page);
await page.waitForTimeout(1000); // debounce sync
Never use page.keyboard.type() for Monaco content. Never use textContent to read Monaco. For multiple editors on one page, pass index parameter: setMonacoValue(page, code, 1).
| File | Purpose |
|---|---|
frontend/e2e/fixtures/test-fixture.ts | Extended Playwright test with testNamespace, setupInstructor, logCollector fixtures |
frontend/e2e/fixtures/api-setup.ts | HTTP helpers for test data setup (namespace, class, section, student, session) |
frontend/e2e/fixtures/auth.ts | signInAs(), loginAsSystemAdmin(), sidebar navigation helpers |
frontend/e2e/fixtures/monaco.ts | Monaco editor helpers (waitForMonacoReady, setMonacoValue, getMonacoValue) |
frontend/playwright.config.ts | Playwright config (Chromium only, parallel, no retries, screenshots + video on failure) |
scripts/run-e2e-tests.sh | Full-stack E2E orchestrator (postgres, executor, Go API, Next.js, Playwright) |
scripts/ensure-test-postgres.sh | Ensures test PostgreSQL is running with migrations |
scripts/ensure-test-api.sh | Builds and starts Go API in test mode on a random port |
From frontend/playwright.config.ts:
frontend/e2e/fullyParallel: true, 2 workers (safe due to namespace isolation)test.setTimeout() for longer tests)http://localhost:3000