| name | expect |
| license | MIT |
| compatibility | Claude Code 2.1.183+. Requires agent-browser >= 0.25.0 (Rust-native, no Playwright). |
| description | Diff-aware AI browser testing — analyzes git changes, generates targeted test plans, and executes them via agent-browser (Rust daemon + CDP, ARIA-tree-first). Reads git diff to determine what changed, maps changes to affected pages via route map, generates a test plan scoped to the diff, and runs it with pass/fail reporting. Use when testing UI changes, verifying PRs before merge, running regression checks on changed components, or validating that recent code changes don't break the user-facing experience. |
| argument-hint | [-m <instruction>] [--target unstaged|branch|commit] [--flow <slug>] [-y] |
| context | fork |
| version | 1.1.0 |
| author | OrchestKit |
| tags | ["testing","browser","e2e","diff-aware","regression","visual","accessibility","ai-testing"] |
| user-invocable | true |
| allowed-tools | ["AskUserQuestion","Bash","Read","Write","Edit","Grep","Glob","Agent","TaskCreate","TaskUpdate","TaskList","ToolSearch","WebFetch","Monitor","PushNotification"] |
| skills | ["testing-e2e","chain-patterns","memory"] |
| complexity | high |
| persuasion-type | guidance |
| effort | high |
| model | sonnet |
| metadata | {"category":"testing","milestone":"M99","upstream-package":"agent-browser","upstream-version-tested":"0.29.1"} |
| triggers | {"keywords":["expect","test my changes","browser test","diff test","test what I changed","test the UI","visual regression","check my changes"],"examples":["test my changes before I push","expect — run browser tests on what I changed","test the login flow after my auth refactor","run visual regression on the dashboard"],"anti-triggers":["cover","unit test","generate tests","verify","implement","npm test"]} |
| paths | [".expect/**","**/*.test.{ts,tsx}","agent-browser.json"] |
| invocation_hooks | ["command -v agent-browser >/dev/null 2>&1 || echo 'Warning: agent-browser not installed — run npm install -g agent-browser'"] |
Expect — Diff-Aware AI Browser Testing
Analyze git changes, generate targeted test plans, and execute them via AI-driven browser automation.
Note: If disableSkillShellExecution is enabled (CC 2.1.91), the agent-browser install check won't run. Verify it's installed: npx agent-browser --version.
/ork:expect
/ork:expect -m "test the checkout flow"
/ork:expect --flow login
/ork:expect --target branch
/ork:expect -y
Core principle: Only test what changed. Git diff drives scope — no wasted cycles on unaffected pages.
Argument Resolution
ARGS = "[-m <instruction>] [--target unstaged|branch|commit] [--flow <slug>] [-y]"
import re
raw = ""
INSTRUCTION = None
TARGET = "unstaged"
FLOW = None
SKIP_REVIEW = False
m_match = re.search(r'-m\s+["\']([^"\']+)["\']|-m\s+(\S+)', raw)
if m_match:
INSTRUCTION = m_match.group(1) or m_match.group(2)
t_match = re.search(r'--target\s+(unstaged|branch|commit)', raw)
if t_match:
TARGET = t_match.group(1)
f_match = re.search(r'--flow\s+(\S+)', raw)
if f_match:
FLOW = f_match.group(1)
if '-y' in raw.split():
SKIP_REVIEW = True
STEP 0: MCP Probe + Prerequisite Check
ToolSearch(query="select:mcp__memory__search_nodes")
Bash("command -v agent-browser || npx agent-browser --version")
Bash("agent-browser skills get agent-browser")
CRITICAL: Task Management
TaskCreate(
subject="Expect: test changed code",
description="Diff-aware browser testing pipeline",
activeForm="Running diff-aware browser tests"
)
TaskCreate(subject="Check fingerprint (skip if unchanged)", activeForm="Checking fingerprint")
TaskCreate(subject="Scan git diff and classify changes", activeForm="Scanning diff")
TaskCreate(subject="Map changes to routes/URLs", activeForm="Mapping routes")
TaskCreate(subject="Generate AI test plan", activeForm="Generating test plan")
TaskCreate(subject="Execute tests via agent-browser", activeForm="Executing browser tests")
TaskCreate(subject="Compile test report", activeForm="Compiling report")
TaskUpdate(taskId="3", addBlockedBy=["2"])
TaskUpdate(taskId="4", addBlockedBy=["3"])
TaskUpdate(taskId="5", addBlockedBy=["4"])
TaskUpdate(taskId="6", addBlockedBy=["5"])
TaskUpdate(taskId="7", addBlockedBy=["6"])
task = TaskGet(taskId="2")
TaskUpdate(taskId="2", status="in_progress")
TaskUpdate(taskId="2", status="completed")
Pipeline Overview
Git Diff → Route Map → Fingerprint Check → Test Plan → Execute → Report
| Phase | What | Output | Reference |
|---|
| 1. Fingerprint | SHA-256 hash of changed files | Skip if unchanged since last run | references/fingerprint.md |
| 2. Diff Scan | Parse git diff, classify changes | ChangesFor data (files, components, routes) | references/diff-scanner.md |
| 3. Route Map | Map changed files to affected pages/URLs | Scoped page list | references/route-map.md |
| 4. Test Plan | Generate AI test plan from diff + route map | Markdown test plan with steps | references/test-plan.md |
| 5. Execute | Run test plan via agent-browser | Pass/fail per step, screenshots | references/execution.md |
| 6. Report | Aggregate results, artifacts, exit code | Structured report + artifacts | references/report.md |
Phase 1: Fingerprint Check
Check if the current changes have already been tested:
Read(".expect/fingerprints.json")
Load: Read("${CLAUDE_SKILL_DIR}/references/fingerprint.md")
Phase 2: Diff Scan
Analyze git changes based on --target:
if TARGET == "unstaged":
diff = Bash("git diff")
files = Bash("git diff --name-only")
elif TARGET == "branch":
diff = Bash("git diff main...HEAD")
files = Bash("git diff main...HEAD --name-only")
elif TARGET == "commit":
diff = Bash("git diff HEAD~1")
files = Bash("git diff HEAD~1 --name-only")
Classify each changed file into 3 levels:
- Direct — the file itself changed
- Imported — a file that imports the changed file
- Routed — the page/route that renders the changed component
Load: Read("${CLAUDE_SKILL_DIR}/references/diff-scanner.md")
Phase 3: Route Map
Map changed files to testable URLs using .expect/config.yaml:
base_url: http://localhost:3000
route_map:
"src/components/Header.tsx": ["/", "/about", "/pricing"]
"src/app/auth/**": ["/login", "/signup", "/forgot-password"]
"src/app/dashboard/**": ["/dashboard"]
If no route map exists, infer from Next.js App Router / Pages Router conventions.
Load: Read("${CLAUDE_SKILL_DIR}/references/route-map.md")
Phase 4: Test Plan Generation
Build an AI test plan scoped to the diff, using the scope strategy for the current target:
scope_strategy = get_scope_strategy(TARGET)
prompt = f"""
{scope_strategy}
Changes: {diff_summary}
Affected pages: {affected_urls}
Instruction: {INSTRUCTION or "Test that the changes work correctly"}
Generate a test plan with:
1. Page-level checks (loads, no console errors, correct content)
2. Interaction tests (forms, buttons, navigation affected by the diff)
3. Visual regression (compare ARIA snapshots if saved)
4. Accessibility (axe-core scan on affected pages)
"""
If --flow specified, load saved flow from .expect/flows/{slug}.yaml instead of generating.
If NOT --y, present plan to user via AskUserQuestion for review before executing.
Load: Read("${CLAUDE_SKILL_DIR}/references/test-plan.md")
Phase 5: Execution
agent-browser Quick Primer
Floor is >= 0.25.0; current tested release is 0.29.x (see upstream-version-tested). Commands below hold across this range.
| Area | Command | Notes |
|---|
| Snapshot | agent-browser snapshot -i | ARIA tree w/ @eN refs. -C/--cursor was removed in 0.22 |
| Semantic locator | agent-browser find --role button "Continue" | Stable alternative to @eN refs |
| Interaction | fill @e1 "...", click @e2, press Enter, drag @e1 @e2, upload @e1 file.pdf | All take ARIA refs |
| Waits | wait --load networkidle, wait --text "Success", wait --fn "window.ready" | Event-driven, never sleep-based |
| Network | network route "*analytics*" --abort, network route "https://api/*" --body '{...}' | Intercept + stub |
| State | state save/load auth.json, --session-name <name> | Persist auth across runs |
| Vault | vault store github_pat, vault load github_pat | Encrypted credential store |
| Diff | diff snapshot, diff screenshot --baseline /tmp/x.png | ARIA + pixel diffing |
| Capture | screenshot --annotate, pdf, record start/stop | Evidence artifacts |
| Dashboard | agent-browser dashboard start (0.25+) | Browser-side runtime inspector on :4848 |
Run the test plan
expect_task = Agent(
subagent_type="ork:expect-agent",
prompt=f"""Execute this test plan:
{test_plan}
For each step:
1. Navigate to the URL
2. Execute the test action
3. Take a screenshot on failure
4. Report PASS/FAIL with evidence
""",
run_in_background=True,
model="sonnet",
max_turns=50
)
Monitor(pid=expect_task.agent_id)
if test_plan_duration_estimate > 180:
PushNotification(
message=f"ork:expect complete — {passed}/{total} steps passed on {len(affected_urls)} pages",
status="proactive"
)
Load: Read("${CLAUDE_SKILL_DIR}/references/execution.md")
Phase 6: Report
/ork:expect Report
═══════════════════════════════════════
Target: unstaged (3 files changed)
Pages tested: 4
Duration: 45s
Results:
✓ /login — form renders, submit works
✓ /signup — validation triggers on empty fields
✗ /dashboard — chart component crashes (TypeError)
✓ /settings — preferences save correctly
3 passed, 1 failed
Artifacts:
.expect/reports/2026-03-26T16-30-00.json
.expect/screenshots/dashboard-error.png
Load: Read("${CLAUDE_SKILL_DIR}/references/report.md")
Saved Flows
Reusable test sequences stored in .expect/flows/:
name: Login Flow
steps:
- navigate: /login
- fill: { selector: "#email", value: "test@example.com" }
- fill: { selector: "#password", value: "password123" }
- click: button[type="submit"]
- assert: { url: "/dashboard" }
- assert: { text: "Welcome back" }
Run with: /ork:expect --flow login
Auto-trigger after UI edits (M125 #2)
When the dev stack is live (/ork:dev), saving any .tsx, .jsx, .css, or .scss file (and Next.js route files like app/**/page.tsx, pages/**/*.tsx) emits a nudge to run /ork:expect <route>. The hook (posttool/ui-change-detector) is default-on and:
- skips silently if
/ork:dev hasn't booted (no agent-browser session to attach to);
- enforces a 30-second cooldown per route to prevent spam on rapid saves;
- honors
.claude/state/expect-skip.<sessionId> as a per-session opt-out (write any content);
- honors
ORK_EXPECT_AUTO=0 for an env-level kill switch.
Route resolution: app/dashboard/page.tsx → /dashboard, pages/settings.tsx → /settings, component / global-style edits → / (home as proxy). Route groups like app/(marketing)/pricing/page.tsx strip to /pricing.
ARIA snapshot recording (M125 #6)
After a passing run, the posttool/expect/snapshot-recorder hook persists the captured ARIA tree to .claude/state/expect-snapshots/<route-slug>/<parent-commit>.json. Subsequent /ork:expect <route> --diff runs compare against the most recent prior snapshot for that route — surfaces structural regressions (added/removed buttons, label changes, hierarchy shifts) without needing a baseline screenshot.
For the snapshot recorder to fire, the expect run output must contain RUN_COMPLETED|passed, ROUTE|<route>, and ARIA|<json-summary> tags. The agent-browser-driven flow already emits these.
When NOT to Use
- Unit tests — use
/ork:cover instead
- API-only changes — no browser UI to test
- Generated files — skip build artifacts, lock files
- Docs-only changes — unless you want to verify docs site rendering
Related Skills
agent-browser — Browser automation engine (required dependency)
ork:cover — Test suite generation (unit/integration/e2e)
ork:verify — Grade existing test quality
testing-e2e — Playwright patterns and best practices
References
Load on demand with Read("${CLAUDE_SKILL_DIR}/references/<file>"):
| File | Content |
|---|
fingerprint.md | SHA-256 gating logic |
diff-scanner.md | Git diff parsing + 3-level classification |
route-map.md | File-to-URL mapping conventions |
test-plan.md | AI test plan generation prompt templates |
execution.md | agent-browser orchestration patterns |
report.md | Report format + artifact storage |
config-schema.md | .expect/config.yaml full schema |
aria-diffing.md | ARIA snapshot comparison for semantic diffing |
scope-strategy.md | Test depth strategy per target mode |
saved-flows.md | Markdown+YAML flow format, adaptive replay |
rrweb-recording.md | rrweb DOM replay integration |
human-review.md | AskUserQuestion plan review gate |
ci-integration.md | GitHub Actions workflow + pre-push hooks |
research.md | millionco/expect architecture analysis |
Version: 1.0.0 (March 2026) — Initial scaffold, M99 milestone