Run any Skill in Manus with one click

expect

Stars193

Forks15

UpdatedJune 22, 2026 at 18:56

Diff-aware AI browser testing — analyzes git changes, generates targeted test plans, and executes them via agent-browser (Rust daemon + CDP, ARIA-tree-first). Reads git diff to determine what changed, maps changes to affected pages via route map, generates a test plan scoped to the diff, and runs it with pass/fail reporting. Use when testing UI changes, verifying PRs before merge, running regression checks on changed components, or validating that recent code changes don't break the user-facing experience.

Installation

Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.

Run Skill in Manus

Source

yonatangross

yonatangross/orchestkit

View GitHub Repository View Creator Repositories

Download

Run Skill in Manus

Related occupationsSOC

Based on SOC occupation classification

Software Quality Assurance Analysts and TestersComputer and Mathematical Occupations·SOC 15-1253

File Explorer

31 files

SKILL.md

readonly

More from this repository

same repository

doctor

yonatangross/orchestkit

OrchestKit doctor for health diagnostics across manifest integrity, hook configuration, skill validation, agent frontmatter, MCP server connectivity, CC version compatibility, and permission rules. Reports issues with severity levels and auto-remediation suggestions. Validates component counts, detects orphaned entries, and checks CC version matrix compliance. Use when diagnosing plugin health, troubleshooting configuration issues, or running pre-release checks.

2026-06-23193

mcp-visual-output

yonatangross/orchestkit

Interactive MCP visual output via @json-render/mcp. Upgrade plain JSON tool responses to interactive dashboards rendered in sandboxed iframes inside Claude, Cursor, ChatGPT, VS Code Copilot, Goose, and Postman conversations. Covers createMcpApp(), registerJsonRenderTool(), registerJsonRenderResource(), CSP config, JSON Patch streaming, and dashboard component patterns. Use when building MCP servers that return visual output, upgrading existing MCP tools with interactive UI, or creating eval/monitoring dashboards.

2026-06-22193

multi-surface-render

yonatangross/orchestkit

Multi-surface rendering with json-render — same JSON spec produces React web, Next.js apps, React Native, Ink terminal UIs, PDFs, emails, Remotion videos, OG images, and 3D scenes. Covers renderer target selection, registry mapping, and platform-specific APIs (renderToBuffer, renderToStream, renderToFile). Use when generating output for multiple platforms, creating PDF reports, email templates, demo videos, or social media images from a single component spec.

2026-06-22193

validate-counts

yonatangross/orchestkit

Validates hook, skill, and agent counts are consistent across CLAUDE.md, hooks.json, manifests, and source directories. Use when counts may be stale after adding or removing components, before releases, or when CLAUDE.md Project Overview looks wrong.

2026-06-22193

browser-tools

yonatangross/orchestkit

OrchestKit security wrapper for browser automation. Adds URL blocklisting, rate limiting, robots.txt enforcement, and ethical scraping guardrails on top of the upstream agent-browser skill. Use when automating browser workflows that need safety guardrails.

2026-06-22193

dev

yonatangross/orchestkit

One-command dev loop boot. Spins up portless (named HTTPS subdomain), emulate (stateful API mocks), the project's dev server, and an agent-browser session — all using the current git branch as the namespace key. Replaces the 4-terminal manual setup with a single `/ork:dev` invocation. Use when starting a new feature branch, switching worktrees, or returning to a project after a break. Skip silently when prerequisite binaries (portless, emulate, agent-browser) are missing — emits install hints.

2026-06-22193

name	expect
license	MIT
compatibility	Claude Code 2.1.183+. Requires agent-browser >= 0.25.0 (Rust-native, no Playwright).
description	Diff-aware AI browser testing — analyzes git changes, generates targeted test plans, and executes them via agent-browser (Rust daemon + CDP, ARIA-tree-first). Reads git diff to determine what changed, maps changes to affected pages via route map, generates a test plan scoped to the diff, and runs it with pass/fail reporting. Use when testing UI changes, verifying PRs before merge, running regression checks on changed components, or validating that recent code changes don't break the user-facing experience.
argument-hint	[-m <instruction>] [--target unstaged\|branch\|commit] [--flow <slug>] [-y]
context	fork
version	1.1.0
author	OrchestKit
tags	["testing","browser","e2e","diff-aware","regression","visual","accessibility","ai-testing"]
user-invocable	true
allowed-tools	["AskUserQuestion","Bash","Read","Write","Edit","Grep","Glob","Agent","TaskCreate","TaskUpdate","TaskList","ToolSearch","WebFetch","Monitor","PushNotification"]
skills	["testing-e2e","chain-patterns","memory"]
complexity	high
persuasion-type	guidance
effort	high
model	sonnet
metadata	{"category":"testing","milestone":"M99","upstream-package":"agent-browser","upstream-version-tested":"0.29.1"}
triggers	{"keywords":["expect","test my changes","browser test","diff test","test what I changed","test the UI","visual regression","check my changes"],"examples":["test my changes before I push","expect — run browser tests on what I changed","test the login flow after my auth refactor","run visual regression on the dashboard"],"anti-triggers":["cover","unit test","generate tests","verify","implement","npm test"]}
paths	[".expect/","/*.test.{ts,tsx}","agent-browser.json"]
invocation_hooks	["command -v agent-browser >/dev/null 2>&1 \|\| echo 'Warning: agent-browser not installed — run npm install -g agent-browser'"]

Expect — Diff-Aware AI Browser Testing

Analyze git changes, generate targeted test plans, and execute them via AI-driven browser automation.

Note: If disableSkillShellExecution is enabled (CC 2.1.91), the agent-browser install check won't run. Verify it's installed: npx agent-browser --version.

/ork:expect                              # Auto-detect changes, test affected pages
/ork:expect -m "test the checkout flow"  # Specific instruction
/ork:expect --flow login                 # Replay a saved test flow
/ork:expect --target branch              # Test all changes on current branch vs main
/ork:expect -y                           # Skip plan review, run immediately

Core principle: Only test what changed. Git diff drives scope — no wasted cycles on unaffected pages.

Argument Resolution

ARGS = "[-m <instruction>] [--target unstaged|branch|commit] [--flow <slug>] [-y]"

# Parse from full argument string
import re
raw = ""  # Full argument string from CC

INSTRUCTION = None
TARGET = "unstaged"  # Default: test unstaged changes
FLOW = None
SKIP_REVIEW = False

# Extract -m "instruction"
m_match = re.search(r'-m\s+["\']([^"\']+)["\']|-m\s+(\S+)', raw)
if m_match:
    INSTRUCTION = m_match.group(1) or m_match.group(2)

# Extract --target
t_match = re.search(r'--target\s+(unstaged|branch|commit)', raw)
if t_match:
    TARGET = t_match.group(1)

# Extract --flow
f_match = re.search(r'--flow\s+(\S+)', raw)
if f_match:
    FLOW = f_match.group(1)

# Extract -y
if '-y' in raw.split():
    SKIP_REVIEW = True

STEP 0: MCP Probe + Prerequisite Check

# memory is alwaysLoad in .mcp.json (CC 2.1.121+, #1541) — probe below kept as fallback for older CC:
ToolSearch(query="select:mcp__memory__search_nodes")

# Verify agent-browser is available (Rust-native, no Playwright)
Bash("command -v agent-browser || npx agent-browser --version")
# If missing: "Install agent-browser: npm i -g agent-browser"

# Load agent-browser's own self-serving skill/workflow docs (required since 0.25.x)
Bash("agent-browser skills get agent-browser")

CRITICAL: Task Management

# 1. Create main task IMMEDIATELY
TaskCreate(
  subject="Expect: test changed code",
  description="Diff-aware browser testing pipeline",
  activeForm="Running diff-aware browser tests"
)

# 2. Create subtasks for each pipeline phase
TaskCreate(subject="Check fingerprint (skip if unchanged)", activeForm="Checking fingerprint")  # id=2
TaskCreate(subject="Scan git diff and classify changes", activeForm="Scanning diff")            # id=3
TaskCreate(subject="Map changes to routes/URLs", activeForm="Mapping routes")                   # id=4
TaskCreate(subject="Generate AI test plan", activeForm="Generating test plan")                   # id=5
TaskCreate(subject="Execute tests via agent-browser", activeForm="Executing browser tests")     # id=6
TaskCreate(subject="Compile test report", activeForm="Compiling report")                        # id=7

# 3. Set dependencies for sequential phases
TaskUpdate(taskId="3", addBlockedBy=["2"])  # Diff scan needs fingerprint check
TaskUpdate(taskId="4", addBlockedBy=["3"])  # Route map needs diff results
TaskUpdate(taskId="5", addBlockedBy=["4"])  # Test plan needs route map
TaskUpdate(taskId="6", addBlockedBy=["5"])  # Execution needs test plan
TaskUpdate(taskId="7", addBlockedBy=["6"])  # Report needs execution results

# 4. Before starting each task, verify it's unblocked
task = TaskGet(taskId="2")  # Verify blockedBy is empty

# 5. Update status as you progress
TaskUpdate(taskId="2", status="in_progress")  # When starting
TaskUpdate(taskId="2", status="completed")    # When done — repeat for each subtask

Pipeline Overview

Git Diff → Route Map → Fingerprint Check → Test Plan → Execute → Report

Phase	What	Output	Reference
1. Fingerprint	SHA-256 hash of changed files	Skip if unchanged since last run	`references/fingerprint.md`
2. Diff Scan	Parse git diff, classify changes	ChangesFor data (files, components, routes)	`references/diff-scanner.md`
3. Route Map	Map changed files to affected pages/URLs	Scoped page list	`references/route-map.md`
4. Test Plan	Generate AI test plan from diff + route map	Markdown test plan with steps	`references/test-plan.md`
5. Execute	Run test plan via agent-browser	Pass/fail per step, screenshots	`references/execution.md`
6. Report	Aggregate results, artifacts, exit code	Structured report + artifacts	`references/report.md`

Phase 1: Fingerprint Check

Check if the current changes have already been tested:

Read(".expect/fingerprints.json")  # Previous run hashes
# Compare SHA-256 of changed files against stored fingerprints
# If match: "No changes since last test run. Use --force to re-run."
# If no match or --force: continue to Phase 2

Load: Read("${CLAUDE_SKILL_DIR}/references/fingerprint.md")

Phase 2: Diff Scan

Analyze git changes based on --target:

if TARGET == "unstaged":
    diff = Bash("git diff")
    files = Bash("git diff --name-only")
elif TARGET == "branch":
    diff = Bash("git diff main...HEAD")
    files = Bash("git diff main...HEAD --name-only")
elif TARGET == "commit":
    diff = Bash("git diff HEAD~1")
    files = Bash("git diff HEAD~1 --name-only")

Classify each changed file into 3 levels:

Direct — the file itself changed
Imported — a file that imports the changed file
Routed — the page/route that renders the changed component

Load: Read("${CLAUDE_SKILL_DIR}/references/diff-scanner.md")

Phase 3: Route Map

Map changed files to testable URLs using .expect/config.yaml:

# .expect/config.yaml
base_url: http://localhost:3000
route_map:
  "src/components/Header.tsx": ["/", "/about", "/pricing"]
  "src/app/auth/**": ["/login", "/signup", "/forgot-password"]
  "src/app/dashboard/**": ["/dashboard"]

If no route map exists, infer from Next.js App Router / Pages Router conventions.

Load: Read("${CLAUDE_SKILL_DIR}/references/route-map.md")

Phase 4: Test Plan Generation

Build an AI test plan scoped to the diff, using the scope strategy for the current target:

scope_strategy = get_scope_strategy(TARGET)  # See references/scope-strategy.md

prompt = f"""
{scope_strategy}

Changes: {diff_summary}
Affected pages: {affected_urls}
Instruction: {INSTRUCTION or "Test that the changes work correctly"}

Generate a test plan with:
1. Page-level checks (loads, no console errors, correct content)
2. Interaction tests (forms, buttons, navigation affected by the diff)
3. Visual regression (compare ARIA snapshots if saved)
4. Accessibility (axe-core scan on affected pages)
"""

If --flow specified, load saved flow from .expect/flows/{slug}.yaml instead of generating.

If NOT --y, present plan to user via AskUserQuestion for review before executing.

Load: Read("${CLAUDE_SKILL_DIR}/references/test-plan.md")

Phase 5: Execution

agent-browser Quick Primer

Floor is >= 0.25.0; current tested release is 0.29.x (see upstream-version-tested). Commands below hold across this range.

Area	Command	Notes
Snapshot	`agent-browser snapshot -i`	ARIA tree w/ `@eN` refs. `-C`/`--cursor` was removed in 0.22
Semantic locator	`agent-browser find --role button "Continue"`	Stable alternative to `@eN` refs
Interaction	`fill @e1 "..."`, `click @e2`, `press Enter`, `drag @e1 @e2`, `upload @e1 file.pdf`	All take ARIA refs
Waits	`wait --load networkidle`, `wait --text "Success"`, `wait --fn "window.ready"`	Event-driven, never sleep-based
Network	`network route "analytics" --abort`, `network route "https://api/*" --body '{...}'`	Intercept + stub
State	`state save/load auth.json`, `--session-name <name>`	Persist auth across runs
Vault	`vault store github_pat`, `vault load github_pat`	Encrypted credential store
Diff	`diff snapshot`, `diff screenshot --baseline /tmp/x.png`	ARIA + pixel diffing
Capture	`screenshot --annotate`, `pdf`, `record start/stop`	Evidence artifacts
Dashboard	`agent-browser dashboard start` (0.25+)	Browser-side runtime inspector on :4848

Run the test plan

expect_task = Agent(
  subagent_type="ork:expect-agent",
  prompt=f"""Execute this test plan:
  {test_plan}

  For each step:
  1. Navigate to the URL
  2. Execute the test action
  3. Take a screenshot on failure
  4. Report PASS/FAIL with evidence
  """,
  run_in_background=True,
  model="sonnet",
  max_turns=50
)

# Stream agent-browser progress line-by-line instead of polling (CC 2.1.98+)
# Each stdout line from agent-browser arrives as a notification — useful for
# catching a failing step early rather than waiting for the full plan.
# Full pattern: Read("${CLAUDE_PLUGIN_ROOT}/skills/chain-patterns/references/monitor-patterns.md")
Monitor(pid=expect_task.agent_id)

# For long test plans (>3 min typical), notify on completion — requires
# Remote Control + "Push when Claude decides" config (CC 2.1.110+).
# Skip silently if the user doesn't have Remote Control enabled.
if test_plan_duration_estimate > 180:
    PushNotification(
        message=f"ork:expect complete — {passed}/{total} steps passed on {len(affected_urls)} pages",
        status="proactive"
    )

Load: Read("${CLAUDE_SKILL_DIR}/references/execution.md")

Phase 6: Report

/ork:expect Report
═══════════════════════════════════════
Target: unstaged (3 files changed)
Pages tested: 4
Duration: 45s

Results:
  ✓ /login — form renders, submit works
  ✓ /signup — validation triggers on empty fields
  ✗ /dashboard — chart component crashes (TypeError)
  ✓ /settings — preferences save correctly

3 passed, 1 failed

Artifacts:
  .expect/reports/2026-03-26T16-30-00.json
  .expect/screenshots/dashboard-error.png

Load: Read("${CLAUDE_SKILL_DIR}/references/report.md")

Saved Flows

Reusable test sequences stored in .expect/flows/:

# .expect/flows/login.yaml
name: Login Flow
steps:
  - navigate: /login
  - fill: { selector: "#email", value: "test@example.com" }
  - fill: { selector: "#password", value: "password123" }
  - click: button[type="submit"]
  - assert: { url: "/dashboard" }
  - assert: { text: "Welcome back" }

Run with: /ork:expect --flow login

Auto-trigger after UI edits (M125 #2)

When the dev stack is live (/ork:dev), saving any .tsx, .jsx, .css, or .scss file (and Next.js route files like app/**/page.tsx, pages/**/*.tsx) emits a nudge to run /ork:expect <route>. The hook (posttool/ui-change-detector) is default-on and:

skips silently if /ork:dev hasn't booted (no agent-browser session to attach to);
enforces a 30-second cooldown per route to prevent spam on rapid saves;
honors .claude/state/expect-skip.<sessionId> as a per-session opt-out (write any content);
honors ORK_EXPECT_AUTO=0 for an env-level kill switch.

Route resolution: app/dashboard/page.tsx → /dashboard, pages/settings.tsx → /settings, component / global-style edits → / (home as proxy). Route groups like app/(marketing)/pricing/page.tsx strip to /pricing.

ARIA snapshot recording (M125 #6)

After a passing run, the posttool/expect/snapshot-recorder hook persists the captured ARIA tree to .claude/state/expect-snapshots/<route-slug>/<parent-commit>.json. Subsequent /ork:expect <route> --diff runs compare against the most recent prior snapshot for that route — surfaces structural regressions (added/removed buttons, label changes, hierarchy shifts) without needing a baseline screenshot.

For the snapshot recorder to fire, the expect run output must contain RUN_COMPLETED|passed, ROUTE|<route>, and ARIA|<json-summary> tags. The agent-browser-driven flow already emits these.

When NOT to Use

Unit tests — use /ork:cover instead
API-only changes — no browser UI to test
Generated files — skip build artifacts, lock files
Docs-only changes — unless you want to verify docs site rendering

Related Skills

agent-browser — Browser automation engine (required dependency)
ork:cover — Test suite generation (unit/integration/e2e)
ork:verify — Grade existing test quality
testing-e2e — Playwright patterns and best practices

References

Load on demand with Read("${CLAUDE_SKILL_DIR}/references/<file>"):

File	Content
`fingerprint.md`	SHA-256 gating logic
`diff-scanner.md`	Git diff parsing + 3-level classification
`route-map.md`	File-to-URL mapping conventions
`test-plan.md`	AI test plan generation prompt templates
`execution.md`	agent-browser orchestration patterns
`report.md`	Report format + artifact storage
`config-schema.md`	.expect/config.yaml full schema
`aria-diffing.md`	ARIA snapshot comparison for semantic diffing
`scope-strategy.md`	Test depth strategy per target mode
`saved-flows.md`	Markdown+YAML flow format, adaptive replay
`rrweb-recording.md`	rrweb DOM replay integration
`human-review.md`	AskUserQuestion plan review gate
`ci-integration.md`	GitHub Actions workflow + pre-push hooks
`research.md`	millionco/expect architecture analysis

Version: 1.0.0 (March 2026) — Initial scaffold, M99 milestone