| name | notion-playwright-tests |
| description | Orchestrate the Playwright test pipeline (planner โ generator โ healer) using the Notion ๐งช Test Runs database as the source of truth. Use this skill whenever the user wants to plan, generate, run, or heal Playwright tests for Insta Checkout, or whenever they reference the Notion test database โ e.g. "let's work on the test DB", "run the test pipeline", "plan tests for the checkout page", "generate the queued specs", "heal the failing login tests", "/notion-playwright-tests", "what's pending in the test runs DB". Always trigger when the user mentions Playwright + Notion together, even if they don't say the skill name. The skill is fully interactive โ it reads the DB, shows counts grouped by Status, and prompts the user to pick a bucket (Pending โ generator / Failed โ healer / Queued โ run). Defer to this skill for any test-pipeline workflow against this project's Notion ๐งช Test Runs DB. |
Notion Playwright Tests โ pipeline orchestrator
This skill is the front door for every Playwright test workflow on Insta Checkout. The ๐งช Test Runs Notion database is the source of truth for which tests exist, their lifecycle state, and what the next action should be. The skill reads the DB, groups by Status, and walks the user through one bucket at a time. The user always chooses what runs โ never chain phases without asking.
The reason for being prompt-driven: the user wants to see the Notion dashboard change live as agents work, and to control which Status transitions happen. Autonomous loops defeat that purpose.
When to use this skill
Trigger whenever the user wants to plan, generate, run, or heal Playwright tests against the existing Notion test DB, or wants to start tests for a new area of the app. Cues include:
- "let's work on the test DB / test runs DB"
- "plan tests for /checkout" or "plan tests for the onboarding flow"
- "generate the pending specs" / "run the queued tests" / "heal the failures"
- explicit
/notion-playwright-tests
- the user describes a Playwright workflow that involves the Notion DB
Don't use this skill for one-off Playwright work that has no Notion linkage โ for that, dispatch the planner / generator / healer subagents directly.
Notion DB reference
- URL:
https://www.notion.so/karimtamer/e09d3265a9a1406b98287a594d774dae
- Data source ID (collection):
e383357e-3467-41fd-9eb9-319dbb791689
- Schema (already created โ do not re-add): see
REFERENCE.md for full property list
- Status enum:
Pending, Queued, Running, Manual today, Passed, Failed, Blocked, Skipped, Retired
- Last result enum:
Pass, Fail, Blocked, Not run, Healed (set by the healer when it fixes a previously failing test; auto-graduates back to Pass on the next clean re-run)
- Heal type enum:
Source fix (app/source code edited), Test fix (only spec edited), Both โ set by the healer alongside Last result = Healed
Use mcp__notion__notion-fetch with the data source URI collection://e383357e-3467-41fd-9eb9-319dbb791689 to query, and mcp__notion__notion-update-page (command: "update_properties") to write.
Subagents this skill orchestrates
All three live at .claude/agents/:
playwright-test-planner โ explores a URL/feature, produces testing/<area>.plan.md. Does not write Notion.
playwright-test-generator โ drives a real browser through plan steps, writes one spec file per test, sets Notion Status Pending โ Running โ Queued|Blocked when given a <notion-page-id> in the prompt.
playwright-test-healer โ runs failing specs, debugs, edits, retries, marks test.fixme() if env-blocked. Sets Notion Status Queued|Failed โ Running โ Passed|Failed|Skipped+Blocked when given a <notion-page-id>. On a successful heal sets Last result = Healed and Heal type = Source fix | Test fix | Both to flag what was changed.
Generator dispatch โ hard rules (READ BEFORE EVERY GENERATOR CALL)
The single most damaging failure mode of this skill is dispatching the generator and getting back a spec that was written from source-code inspection alone (reading LinksPageContent.tsx, en.json, etc.) without ever opening the live page. The resulting spec looks plausible, parses cleanly, even matches the plan body โ but its locators reflect what the agent inferred from source, not what the page actually renders. These specs fail noisily on first run, often for the wrong reasons, and cost more healer time than they ever saved.
Treat the generator like a microscope: if it never looked through the lens, the slide doesn't exist.
Before any of these rules can be applied, you must have completed the "Pre-flight check โ generator tooling sanity" section above and recorded a generator_tooling: PASS outcome. If the pre-flight came back PARTIAL or FAIL, the user has either chosen a fallback path (planner + manual + healer) or asked to fix the tooling โ in either case, the rules below about dispatching the generator do not apply, and you should follow the chosen fallback for every Pending Core test until the pre-flight is re-run and passes.
Non-negotiable rules
These apply to every generator dispatch, especially Core tier:
-
Browser observation is mandatory. The generator MUST call generator_setup_page to load the target URL in a real browser before writing any test. It MUST call browser_snapshot (or another DOM-inspection tool) at least once per test to lock in real locators from rendered output. If the agent's report does not mention a snapshot/observation step per test, treat the spec as unreliable.
-
No source-only fallback. The generator MUST NOT write specs whose locators are derived solely from reading *.tsx, *.ts, en.json, ar.json, or any other source artefact. Source files are useful for cross-referencing what was observed (e.g. confirming an i18n key matches the rendered string), never as the primary basis for selectors.
-
Fail loud, not silent. If browser observation fails โ dev server down, sign-in rate-limited, route 500s, page never paints, timeout โ the generator MUST mark the affected Notion pages Blocked with a concrete Last failure reason. It MUST NOT silently fall back to source-code generation. A Blocked row is a fine outcome; a vacuously-passing spec is not.
-
Per-test, not per-file. A single snapshot at the top of a session is not enough. Each test typically exercises a different state (empty list, error 500, modal open, mobile viewport, etc.). The generator must observe each of those states (via route mocks + reload, viewport resize, etc.) before writing the corresponding test.
-
Sequential dispatch unless storage state is captured. Parallel generator dispatches can hammer Firebase with concurrent UI sign-ins for the same fixture user, triggering rate limits. Default to sequential dispatch. Parallelism is only safe once testing/.fixtures/<shape>-auth-state.json and <shape>-firebase-idb.json exist (captured by globalSetup) โ at that point the generators replay storage state instead of signing in fresh, and 3-4 parallel dispatches are fine.
Required dispatch-prompt language
Every generator call this skill makes MUST include the following block verbatim (or an equivalent that conveys all four points):
Browser observation is REQUIRED. Before writing any test, call generator_setup_page to open the target URL in a real browser, then call browser_snapshot to capture the rendered DOM. For each test you write, observe the relevant state in the live app (apply route mocks + reload, change viewport, open the modal, etc.) and base every locator on what you actually see. Do NOT infer selectors from source files alone; source is only acceptable for cross-checking. If observation fails for any reason (server down, sign-in fails, route returns an error, element never appears), mark the affected Notion page Blocked with a one-sentence Last failure reason and move on โ do NOT write the spec from source-code inference. Your final report must explicitly list which tests you observed live (and how) so the orchestrator can audit.
Auditing the generator's report
When the generator returns, scan its summary before marking todos complete:
- Does the report describe live page interactions per test (clicks, snapshots, viewport changes), or only "after reading
Foo.tsx"?
- Does it mention
generator_setup_page / browser_snapshot / browser_click use, or just "wrote helpers using Playwright patterns"?
- Did it sign in successfully (the
testing/.fixtures/<shape>-auth-state.json files appear after a fresh sign-in run)?
If the report reads like source-code analysis, dispatch the same generator again with a stricter prompt that explicitly forbids source-only writing and requires per-test snapshot evidence. Don't proceed to the next file pretending the spec is sound.
Test data via the insta-checkout-mock-data skill
Many tests only run meaningfully against a specific account state โ an approved seller with products, a pending seller mid-onboarding, a rejected seller, a seller with payment links and customers, etc. The user has standing approval to seed the dev DB during this skill's flows (dev-only โ the same does NOT apply to staging/prod).
When to invoke insta-checkout-mock-data:
- Before planning (Branch a) โ if the area under test clearly requires a particular state (e.g. "Dashboard for approved seller", "Onboarding for pending seller"), seed the fixture first so the planner sees the real screen, not a redirect or empty state.
- Before generator dispatch (Branch b โ Pending) โ if a Pending test's plan body mentions a logged-in user / specific seller state and no fresh fixture exists for this session, seed one and pass the credentials into the generator prompt.
- Before healer dispatch on
Blocked rows whose Last failure reason looks like missing creds / missing fixture (classic example: "Firebase signInWithPassword 400 for test1@test.com โ provision a valid creds account"). Seeding a matching fixture often unblocks the row directly.
- Before re-running Queued / Passed specs if the spec hard-codes credentials that have since been wiped or rotated.
How to invoke:
Call Skill with skill: "insta-checkout-mock-data" and args describing the seed (e.g. "approved seller, 3 products, 2 payment links, locale en"). That skill is interactive โ it shows a plan, accepts tweaks, runs only on explicit go-ahead, and prints a credentials block at the end. Capture the email / password / seller id from that output and:
- Pass them into the dispatched agent's prompt (e.g.
"Use credentials test_<ts>@test.com / test@123 (approved seller with 3 products) when the spec needs to log in.").
- If the spec file hard-codes credentials, edit the spec to read from a shared fixture (
testing/fixtures/<area>.ts) or accept env vars โ don't burn a one-shot login into the spec where a future session can't reproduce it.
Standing rules to follow:
- Always state the test data plan in the same prompt where you'd otherwise jump straight to dispatching an agent โ show the user what will be seeded, then proceed.
- Don't auto-loop mock data โ generator โ next test without a checkpoint. Surface what was seeded (email + seller id), then continue.
- Reuse fixtures within a session when the same state covers multiple Pending tests (e.g. one approved-seller fixture for every test in the Dashboard suite). Re-seeding per test wastes time and clutters the dev DB.
- Don't seed in production / staging. This skill only operates on the dev DB. If
MONGODB_URI in apps/backend/.env points anywhere else, stop and ask.
Authenticated test infrastructure (storage state)
Tests that hit auth-gated routes (anything under /dashboard/*) reuse signed-in sessions captured once in globalSetup. Per-test UI signins were tried and don't work โ Firebase rate-limits parallel signins for the same user, and we hit serial run times of >2 minutes.
Multi-fixture model
globalSetup provisions one seller per shape we need (onboarding, dashboard, etc.) โ currently capped at ~5 fixture types max. Each fixture has its own:
- A
testing/scripts/create-<shape>-user.ts script that seeds Firebase + Mongo for that shape (e.g. mid-onboarding pending, approved-with-data, rejected, fully-onboarded-no-payments).
- A creds JSON at
testing/.fixtures/<shape>.json (gitignored).
- A captured cookies/localStorage file at
testing/.fixtures/<shape>-auth-state.json (gitignored).
- A captured Firebase IDB dump at
testing/.fixtures/<shape>-firebase-idb.json (gitignored).
globalSetup runs every fixture script in parallel and signs each seller in to capture both auth artefacts. globalTeardown deletes every seller (via delete-test-seller.ts) regardless of which specs ran. Fixtures are always created and always destroyed โ even when the matching test suite isn't selected. The cost is small (~5s parallel) and the benefit is that any spec in the suite can target any seller shape.
What exists today
| File | Role |
|---|
testing/global-setup.ts | Runs every create-*-user.ts script in parallel, signs each in, captures cookies + IDB per shape |
testing/global-teardown.ts | Deletes every fixture seller via delete-test-seller.ts, removes state files |
testing/scripts/create-onboarding-user.ts | Pending mid-onboarding seller |
testing/scripts/create-dashboard-user.ts | Approved seller with products + active payment links + paid+confirmed link_customers (drives KPIs/chart) |
testing/scripts/delete-test-seller.ts | Generic teardown โ works for any seeded seller by uid/email/seller-id |
testing/fixtures/firebase-auth.ts | setupAuthenticatedPage(page, "onboarding" | "dashboard") โ replays the right IDB dump into the page's origin |
testing/fixtures/onboarding-user.ts / dashboard-user.ts | loadOnboardingUser() / loadDashboardUser() โ typed creds JSON readers |
playwright.config.ts | Default use.storageState points at the onboarding cookie state. Specs targeting a different fixture must override at the file level (see below). |
testing/.fixtures/<shape>.json (gitignored) | Per-shape creds + Firebase UID + seller ID |
testing/.fixtures/<shape>-auth-state.json (gitignored) | Per-shape cookies + localStorage |
testing/.fixtures/<shape>-firebase-idb.json (gitignored) | Per-shape Firebase IDB rows |
Why state is captured TWO ways (not one)
Both gates exist; we need state for both:
- Cookies โ needed for the server-side gate.
apps/landing/middleware.ts reads the Firebase __session cookie to decide whether /dashboard/* requests are authenticated. Without it, the middleware redirects /dashboard/home?locale=ar โ /onboard, then a client-side router.replace("/dashboard/home") strips query params. Cookies are restored via playwright.config.ts โ use.storageState (default) or per-spec test.use({ storageState }).
- Firebase IndexedDB โ needed for the client-side gate. The Firebase Web SDK persists the signed-in user in
firebaseLocalStorageDb (object store: firebaseLocalStorage), which Playwright's built-in storageState doesn't capture. Without it, auth.currentUser === null even with cookies set, and LocaleSync / OnboardingChecklist / etc. behave as if logged out. IDB is restored per-test via setupAuthenticatedPage(page, fixture).
How specs use this
Default (onboarding fixture) โ onboarding tests don't need any storageState override:
import { test, expect } from "@playwright/test"
import { setupAuthenticatedPage } from "../../fixtures/firebase-auth"
test.describe("Onboarding โ checklist", () => {
test.beforeEach(async ({ page }) => {
await setupAuthenticatedPage(page, "onboarding")
await page.goto("/dashboard/home")
})
})
Different fixture (e.g. dashboard) โ override the cookie file at the top of the spec, plus pass the matching IDB key:
import { test, expect } from "@playwright/test"
import { setupAuthenticatedPage } from "../../fixtures/firebase-auth"
test.use({ storageState: "./testing/.fixtures/dashboard-auth-state.json" })
test.describe("Dashboard home โ KPIs", () => {
test.beforeEach(async ({ page }) => {
await setupAuthenticatedPage(page, "dashboard")
await page.goto("/dashboard/home")
})
})
The storageState and IDB key MUST match โ mixing them produces a half-signed-in session where the middleware lets the request through but the client SDK has no current user.
When adding a NEW fixture shape
-
Confirm we don't already have a usable shape. Check the existing create-*-user.ts scripts. If one matches, reuse it.
-
Add a new testing/scripts/create-<shape>-user.ts script. Mirror create-onboarding-user.ts's structure: parse args, init Firebase Admin, create user, insert seller doc, optionally seed children (products, payment_links, link_customers, etc.), write testing/.fixtures/<shape>.json. Use lowercase-hyphen for <shape> (e.g. dashboard, rejected-seller, approved-no-data).
-
Wire the script into testing/global-setup.ts: add to the parallel array of fixture scripts to run, then add a sign-in + capture pass that writes <shape>-auth-state.json and <shape>-firebase-idb.json.
-
Add a testing/fixtures/<shape>-user.ts loader that reads the JSON and exports a typed accessor. Keep its shape parallel to onboarding-user.ts.
-
Extend setupAuthenticatedPage to accept the new fixture key and load the corresponding <shape>-firebase-idb.json.
-
Wire teardown: testing/global-teardown.ts already deletes by UID via delete-test-seller.ts. Just make sure it iterates all fixture JSON files, not just the onboarding one.
When a spec needs a clean unauthenticated context
(signup, public payment links, login page itself) โ override at the file level:
test.use({ storageState: { cookies: [], origins: [] } })
And don't call setupAuthenticatedPage.
When a spec needs to override server data
The i18n-locale spec is the precedent โ LocaleSync re-fetches /sellers/me and overrides the URL ?locale=ar param with whatever the seller's contentLocale is. Use page.route() to rewrite the response in beforeEach. Document this in the plan body so the generator/healer keeps the interception.
Known limitation
globalSetup runs every fixture once per Playwright invocation regardless of which specs are selected. Running just sign-in tests still spins up every fixture (~5s parallel + a Firebase user churn each). The proper fix is per-area Playwright projects with dependencies so each area only provisions its own fixture โ out of scope for now, but track if fixture count exceeds ~5.
Entry flow
The very first interaction in any session asks two questions in order. Don't skip either.
1. Pick a path:
What would you like to do?
(a) Plan tests for a new area of the app
(b) Work on existing tests in the Notion ๐งช Test Runs DB
2. Branching policy:
Run on a new feature branch tagged today? (default: yes โ claude/notion-tests-<area>-YYYY-MM-DD)
If yes: git checkout -b claude/notion-tests-<area>-<today> (use date +%Y-%m-%d). Slug <area> to lowercase-hyphen. If no: stay on the current branch.
Then run the pre-flight check below before dispatching into Branch (a) or Branch (b).
Pre-flight check โ generator tooling sanity (RUN BEFORE ANY DISPATCH)
The single most expensive failure mode of this skill is dispatching the generator and getting back specs that were silently written from source-code inference because generator_setup_page / browser_snapshot didn't actually work in this project. Past sessions have lost ~2 hours to that. Before any planner or generator dispatch in a new session, this skill runs a tiny dry-run to verify the generator's observation tools work end-to-end. If they don't, we stop and prompt the user โ we do not silently continue.
This check runs once per session, cached for the rest of the session unless the user changes branches, restarts servers, or modifies playwright.config.ts / testing/global-setup.ts.
Step 1 โ Static prerequisites
In parallel:
- Confirm dev server(s) the target needs are up (per "Dev server policy") โ start any that aren't.
- Confirm the relevant fixture script(s) exist under
testing/scripts/create-*-user.ts.
- Confirm
testing/global-setup.ts and testing/.fixtures/ are present.
If any of these fail, fix them or surface to the user โ don't proceed to step 2.
Step 2 โ Generator tool dry-run
Dispatch a minimal playwright-test-generator agent that does only observation, no writing, no Notion updates:
You are running a tooling-sanity dry-run for the notion-playwright-tests skill.
Do exactly this and nothing else:
1. Call `generator_setup_page` to open `http://localhost:3000` (or the project's primary URL).
2. Call `browser_snapshot` to capture the rendered DOM.
3. Report back, in <100 words: did `generator_setup_page` complete cleanly? Did `browser_snapshot` return a non-empty snapshot? Did anything teardown/wipe project state during setup (e.g. fixture files deleted, global setup ran a teardown)?
Do NOT write any spec file. Do NOT update Notion. Do NOT sign in to anything. Do NOT run `playwright test`. This is purely a "do my browser tools work in this codebase" check.
Capture the agent's report.
Step 3 โ Interpret the result and gate
Three outcomes:
-
PASS โ both tools worked, nothing got wiped. Cache generator_tools_ok = true for this session and proceed to Branch (a) or Branch (b).
-
PARTIAL โ tools loaded but observed side effects (e.g. fixture files deleted, globalSetup ran). Surface to the user verbatim:
The generator's generator_setup_page works, but it triggers project side effects: <list>. That means each generator call may interfere with seeded fixtures. Choose how to proceed:
(1) Investigate and patch the tooling first (recommended for clean specs).
(2) Pre-seed everything and tolerate the churn โ generators still observe live but each call rotates the fixture seller.
(3) Continue without the generator: planner + manual Write + healer for failures.
(4) Stop this session.
-
FAIL โ generator_setup_page errored, snapshot empty, or the agent reports it can't observe. Surface to the user:
The generator's browser-observation tools are not working in this project right now: <one-line concrete failure>. We cannot dispatch the generator and produce reliable specs until this is fixed. Choose how to proceed:
(1) Investigate and fix the tooling (preferred โ root cause).
(2) Switch to planner-only + manual + healer: planner does the live exploration, this skill writes specs directly with Write based on the plan body, healer validates against the live app on first run. Slower per-test but produces real specs.
(3) Stop this session.
In every case, do NOT silently dispatch the generator and accept source-only specs as the fallback. That's exactly what the new pre-flight check exists to prevent.
Step 4 โ Record the outcome in the session
Whatever path the user picks, write a one-line note in your scratch state for the rest of the session, e.g.:
generator_tooling: FAIL โ generator_setup_page tears down testing/.fixtures/. Fallback: planner + manual + healer.
Reference this when each Pending Core test would otherwise be dispatched to the generator. Do not "forget" the failure mid-session.
Branch (a) โ new area
-
Ask: "Which area? (a name like Checkout or a URL like /admin/products)"
-
Map area to a target URL (if the user gave a name, ask for the URL or infer from project conventions: Landing โ http://localhost:3000, Checkout โ http://localhost:3001, Admin โ http://localhost:3002).
-
Confirm dev servers are up โ check lsof -ti:3000,3001,3002,4000. Determine which port(s) the target URL needs (see "Dev server policy" below). If a required server is down, start it directly via Bash (e.g. npm run dev:landing &) โ do not ask the user to run it.
3a. Decide whether the area needs a seeded fixture. If the URL only renders meaningfully behind a particular seller state (logged-in dashboard, mid-onboarding flow, rejected seller view, payment-link checkout, etc.), pick the right path:
- A test-managed fixture already exists (
testing/scripts/create-*-user.ts) โ run it directly and capture the creds for the planner. Then make sure the spec files reference the matching storageState + setupAuthenticatedPage(page, "<shape>") per "Authenticated test infrastructure (storage state)" above.
- A new fixture shape is needed โ surface the shape options to the user, get confirmation, and add a new
testing/scripts/create-<shape>-user.ts + wire it into globalSetup / globalTeardown / firebase-auth.ts. See "When adding a NEW fixture shape" above. Run the new script once now to seed creds for the planner.
- Ad-hoc one-off seller (e.g. demo content, throw-away exploration) โ invoke
insta-checkout-mock-data instead. That skill is for the dev DB, not for the test-managed fixture set.
Skip this step only for fully public URLs (landing pages, public payment links).
-
Dispatch playwright-test-planner with the URL and target plan path:
Plan tests for <URL>. Save the plan to testing/<area-slug>.plan.md.
Use planner_setup_page and planner_save_plan.
-
When the planner returns: read testing/<area-slug>.plan.md, show the suite list + total test count, ask:
Insert N tests across M suites into Notion?
-
On yes: bulk-create pages via mcp__notion__notion-create-pages with parent data_source_id: e383357e-3467-41fd-9eb9-319dbb791689. For each test set:
Name = "<X.Y> <test title>" (X = suite number, Y = test ordinal)
Suite = suite name from the plan
Spec file = "testing/tests/<area-slug>/<suite-slug>.spec.ts"
Area = best-fit Area enum (Auth, Checkout, Dashboard, Onboarding, Payments, Admin, Landing)
Tier = Core for happy paths / critical validation, Helper for edge cases / large viewports / accessibility nuances
Source = Agent
Status = Pending
Last result = Not run
Execution = Headless
Tags = inferred only when obvious (Smoke for happy paths, Critical path for auth-defining flows, Mobile for narrow viewports, i18n for non-Latin input, Visual for layout/responsive)
- Page body: spec file path + numbered steps + expectations from the plan
-
After insertion, fall through to Branch (b) with the new Area pre-selected.
Branch (b) โ existing tests
-
Ask: "Filter by Area, or look across all?" Default: all.
-
Read pages via mcp__notion__notion-fetch on the data source URI. Group by Status. Show counts:
Pending: 12 โ plan item exists, no spec yet โ run generator
Queued: 8 โ spec written, never executed โ run tests OR healer
Failed: 3 โ last run failed โ run healer
Blocked: 2 โ env can't satisfy (e.g. missing creds) โ seed fixture + heal, or list
Skipped: 1 โ test.fixme() โ list only
Passed: 5 โ last run passed โ re-run for regression
Running: 0 โ currently in flight (warn if non-zero โ stale) โ reset?
-
Ask: "Which bucket?" After the user picks: ask scope:
All N of those, or filter by Suite / Tier / Tag?
-
Test-data prerequisite check. Before dispatching anything, scan the chosen bucket's pages for state requirements (read the page body or Last failure reason). If multiple pages share the same fixture need (e.g. an approved seller with products), invoke insta-checkout-mock-data once and reuse those credentials across the batch. If individual pages need bespoke states (rejected seller, pending onboarding), seed them on demand inside the per-page loop. See "Test data via the insta-checkout-mock-data skill" above.
-
Dispatch the right agent per page, passing the <notion-page-id> in the prompt so the agent can update Status live. When the page needs seeded data, include the credentials and seller id from the mock-data run in the dispatch prompt:
- Pending โ Core tier โ dispatch
playwright-test-generator. The dispatch prompt MUST include the "Browser observation is REQUIRED" block from "Generator dispatch โ hard rules" above. The generator navigates the live app while writing the spec; locators must come from browser_snapshot of the rendered DOM, not from reading source files. Pass the spec file path + page id so it updates Notion Status live (Pending โ Running โ Queued|Blocked). Check the required dev server is running first (see "Dev server policy"). Group tests for the same suite into the same spec file. Audit the generator's return report โ if it doesn't describe per-test live observation, re-dispatch with a stricter prompt before proceeding.
- Pending โ Helper tier โ write spec files directly using the Write tool. Before writing, read existing generator-written specs for the same area (e.g.
testing/tests/login/happy-path.spec.ts) to infer the real locator patterns โ button names, input labels, URL shapes โ so the hand-written specs stay consistent with what the generator observed. Group all tests for the same suite into one .spec.ts file matching the Spec file property in Notion. After writing, set Notion Status = Queued and update Spec file for each covered page.
- Failed โ
playwright-test-healer. Pass the spec file path + the page id. If Last failure reason looks like a stale fixture (creds 400, missing seller, missing products), seed a fresh fixture via insta-checkout-mock-data first and pass the new credentials in the healer prompt. Healer writes โ Running โ Passed|Failed|Skipped+Blocked.
- Queued / Passed (re-run) โ run via
npx playwright test <spec-file> directly. If the spec depends on credentials that may have rotated, offer to re-seed first. On failure, ask "Dispatch healer?" before going further. On pass, manually update Notion: Status=Passed, Last result=Pass, Last run = today's ISO date, Run count = previous + 1, clear Last failure reason, and clear Heal type (set to null) โ a clean re-run graduates Healed back to Pass.
- Blocked โ for each row, read
Last failure reason. If it looks like missing/expired test data, offer to seed a matching fixture via insta-checkout-mock-data and re-dispatch the healer; otherwise list with reasons and move on.
- Skipped โ list with reasons. Don't dispatch anything. Ask if the user wants to revisit them manually.
-
After each agent run finishes:
- Re-fetch the affected page(s) from Notion
- Diff the Status before/after, summarise to the user (e.g. "Generated 4/5 specs โ page 3.5 marked Blocked because the modal didn't open on the planner's click")
- Ask whether to commit. See "Commit + push + PR" below.
- Re-show the bucket counts and re-prompt.
Commit + push + PR
The project rule (CLAUDE.md) is no commits / pushes / PRs without explicit user instruction. This skill respects that โ every gate is a prompt.
After each agent run that modified files:
git status and show the diff summary
- Ask: "Commit?" โ propose a focused message:
- Generator:
test(<area>): generate <suite> specs
- Healer:
test(<area>): heal <suite> โ <one-line fix summary>
- Only commit on yes; never auto-commit.
After all chosen work in the session is done, ask once:
Push branch + open PR?
If yes:
git push -u origin <branch> โ likely refused by the project guardrail. If denied, surface the error verbatim and ask the user to authorise the push (they can rerun the command themselves or grant a one-off permission).
- After push succeeds:
gh pr create --base main --head <branch> with:
- Title:
test(<area>): <one-line session summary>
- Body: list of what was done (specs generated, tests healed, fixmes applied), plus a link to the Notion view URL filtered to the affected pages.
Note: this repo's gh auth is devinstacheckouteg-web (per gh auth status). Don't switch accounts.
Dev server policy
Standard ports and start commands for this monorepo:
| Port | App | Start command |
|---|
| 3000 | landing | npm run dev:landing |
| 3001 | checkout | npm run dev:checkout |
| 3002 | admin | npm run dev:admin |
| 4000 | backend | npm run dev:backend |
Detect which server is needed: read playwright.config.ts โ baseURL defaults to http://localhost:3000 (landing). If tests target a different app, the BASE_URL env var or webServer config will say so.
Auto-start if not running: check lsof -ti:<port>. If the process is missing, start it via Bash in the background (e.g. npm run dev:landing > /tmp/landing.log 2>&1 &) and wait ~8 s for the port to open before proceeding. Do not ask the user to start it โ run it directly (project feedback rule).
No speculative healing in this skill
This skill orchestrates; the healer agent heals. When a Queued or Passed run produces failures:
- Update Notion with
Status=Failed and a concrete Last failure reason derived from the test output.
- Do NOT edit spec files yourself to "fix" the locator/timing issue. Even if the failure looks obvious from the error message, your fix is a guess โ the test ran in a CI-style headless context and you can't see what the page actually rendered. Guesses produce churn (passing tests regressing run-to-run, mismatched expectations).
- Dispatch
playwright-test-healer instead. That agent opens a browser, observes the live app, and either fixes the spec (Test fix), fixes the app (Source fix), or determines the test premise was wrong. It marks Last result=Healed + Heal type=โฆ so the audit trail is clean.
- The exceptions where this skill DOES edit specs directly: writing brand-new Helper Pending tests (Branch b โ Pending), and architectural plumbing (storage state, route interception when the plan body documents it). Never speculative locator-fixing on tests that were running fine before.
Common locator pitfalls
When writing or reviewing Helper specs, watch for these โ they came up in the onboarding suite and will recur:
getByLabel(/password/i) matches the "Show password" toggle button (which has aria-label="Show password"). Use page.locator("input[type='password']") or a specific id like #signup-password.
- Lucide React no longer emits
data-lucide attributes in current versions. [data-lucide="check-circle"] returns 0 elements. Anchor on a class that the component applies (e.g. svg.text-success for completion icons) or on a parent with stable text.
- Tailwind
hover:border-primary/40 matches /border-primary/. When asserting the active variant of a toggle button, anchor on a class that's only present in the active state (e.g. bg-primary/10), not on border-primary alone.
- Multiple
<select> on the dashboard. The dashboard layout has a locale switcher above the OnboardingChecklist's category select, so page.locator("select") is no longer unique. Disambiguate with .first(), :has(option[value=""]), or a parent scope.
getByText("Your Business") collides with the dashboard banner copy ("...your business info..."). Scope to the locale-preview wrapper (the phone mockup) and use { exact: true }.
getByText strict-mode violations against [dir="rtl"] โ the <html>, sidebar, and preview wrapper all have dir="rtl" in Arabic locale. Use .first() or filter by a child text that's unique to the element you want.
- Arabic InstaPay step label is
ุฑุงุจุท InstaPay (key dashboard.onboarding.instapayLink), not ุฑุงุจุท ุงูุฏูุน (which is the sidebar's "Payment Links" nav โ different namespace).
LocaleSync overrides ?locale=ar by re-fetching /sellers/me on every auth state change. Tests that exercise an Arabic-locale dashboard need page.route('**/sellers/me', ...) to rewrite contentLocale: "ar" in the response.
Two reminders that catch new sessions
- Core vs Helper spec strategy: For Core Pending tests, always dispatch the
playwright-test-generator AND embed the "Browser observation is REQUIRED" block from "Generator dispatch โ hard rules" in the prompt. A Core spec written without live browser observation is unacceptable โ it can pass vacuously, fail for the wrong reasons, and burns healer time. If the generator returns a report that doesn't describe per-test snapshots / clicks / viewport changes, re-dispatch with a stricter prompt before marking it done. For Helper Pending tests, write .spec.ts files directly using the Write tool โ but first read existing generator-written specs in the same area to extract real locator patterns (button names, input roles, URL shapes). Group all tests for a suite into one file matching the Spec file property in Notion. After writing, verify with npx playwright test <file> --list and update Notion to Status = Queued.
- Dev server is auto-started: check the port with
lsof -ti:<port> before writing or running tests. If it's not running, start it via Bash in the background and wait for it to come up. Never ask the user to start it.
When the user says "do all of it" or similar autonomy bait
Don't. The whole point of this skill is the live Notion dashboard plus user-controlled lifecycle. If they push for autonomous execution, ask once: "I can run all the Pending tests through the generator without prompting between each โ that takes ~N minutes total and you'll see Notion update live. Confirm?" Only then proceed in a loop, but still stop on first error and re-prompt.