// Comprehensive E2E testing skill using Playwright MCP for systematic web application testing. This skill should be used when users need to test web-based systems end-to-end, set up test regimes, run exploratory tests, or analyze test history. Triggers on requests like "test my webapp", "set up E2E tests", "run the tests", "what's been flaky", or when validating web application functionality. The skill observes and reports only - it never fixes issues. Supports three modes - setup (create test regime), run (execute tests), and report (analyze results).
| name | e2e-testing |
| description | Comprehensive E2E testing skill using Playwright MCP for systematic web application testing. This skill should be used when users need to test web-based systems end-to-end, set up test regimes, run exploratory tests, or analyze test history. Triggers on requests like "test my webapp", "set up E2E tests", "run the tests", "what's been flaky", or when validating web application functionality. The skill observes and reports only - it never fixes issues. Supports three modes - setup (create test regime), run (execute tests), and report (analyze results). |
A comprehensive E2E testing skill using Playwright MCP for systematic testing of any web-based system. The skill:
Before using this skill, verify Playwright MCP is available:
playwright in MCP server configuration{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["@playwright/mcp@latest"]
}
}
}
If Playwright MCP is unavailable, inform the user and provide setup instructions before proceeding.
This skill operates in three modes. Determine mode from user request:
| User Request | Mode |
|---|---|
| "Set up tests for...", "Create test regime" | Setup |
| "Run the tests", "Test the...", "Execute tests" | Run |
| "Show test results", "What failed?", "What's flaky?" | Report |
If unclear, ask: "Would you like to set up a test regime, run existing tests, or view reports?"
Purpose: Create or update test regime through interactive discovery.
Determine entry point from user context:
| Context | Entry |
|---|---|
| User provides URL | URL Exploration |
| User describes system purpose | Description-Based |
| User points to documentation | Documentation Extraction |
| Combination of above | Combined Flow (recommended) |
Ask for any missing information:
Use Playwright MCP to explore:
Navigate to base URL
Capture accessibility snapshot
Identify:
- Navigation elements (menus, links)
- Interactive elements (buttons, forms)
- Key pages and sections
For each discovered element, note:
While exploring, actively look for:
Document discoveries as: "Found alternative: [description]"
For each key workflow, create scenario with:
scenario: [Descriptive name]
description: [What this tests]
preconditions:
- [Required state before test]
blocking: [true/false - does failure prevent other tests?]
steps:
- action: [navigate/click/type/verify/wait]
target: [selector or description]
value: [input value if applicable]
flexibility:
type: [exact/contains/ai_judgment]
criteria: [specific rules or judgment prompt]
success_criteria:
- [What must be true for pass]
alternatives:
- [Alternative path if primary fails]
Write regime to tests/e2e/test_regime.yml:
# Test Regime: [Application Name]
# Created: [YYYY-MM-DD]
# Last Updated: [YYYY-MM-DD]
metadata:
application: [Name]
base_url: [URL]
description: [Purpose]
global_settings:
screenshot_every_step: true
capture_network: true
capture_console: true
discovery_cap: 5 # Max new paths to discover per run
blocking_dependencies:
- scenario: login
blocks: [profile, settings, checkout] # These won't run if login fails
scenarios:
- scenario: [name]
# ... scenario definition
Show user:
Purpose: Execute tests sequentially with full evidence capture.
Principle: No Invalid Skips
A test should only have three outcomes:
| Status | Meaning |
|---|---|
| PASSED | The feature works as specified |
| FAILED | The feature doesn't work or doesn't exist |
| SKIPPED | Only for legitimate environmental reasons (see below) |
@skip decorator for documented WIP features with ticket reference| Situation | Correct Status | Notes Format |
|---|---|---|
| Feature doesn't exist in UI | FAILED | "Expected [feature] not found. Feature not implemented." |
| Test wasn't executed/completed | FAILED | "Test not executed. [What wasn't verified]." |
| Test would fail | FAILED | That's the point of testing |
| "Didn't get around to it" | FAILED | Incomplete test coverage is a failure |
| Feature works differently than spec | FAILED | "Implementation doesn't match specification: [details]" |
The purpose of a test is to fail when something doesn't work. Marking missing features or unexecuted tests as "skipped" produces artificially inflated pass rates and hides real issues. A test report that only shows green isn't useful if it achieved that by ignoring problems.
When a test cannot find the expected UI element or feature:
Status: FAILED
Notes: "FAILED: Expected 'Add to Cart' button not found. Feature not implemented or selector changed."
When a test is not fully executed:
Status: FAILED
Notes: "FAILED: Test not executed. Checkout flow verification was not completed - stopped at payment step."
When environment is genuinely unavailable (valid skip):
Status: SKIPPED
Notes: "SKIPPED: Payment gateway sandbox unavailable. Ticket: PAY-123"
Verify regime exists: Check for tests/e2e/test_regime.yml
Load history: Check for tests/e2e/test_history.json
Verify Playwright MCP: Confirm browser automation is available
Every test run starts fresh from Step 1 of Scenario 1. Never skip steps or use cached state.
Execute scenarios in order. For each scenario:
1. Check preconditions
2. Execute each step:
a. Perform action via Playwright MCP
b. Capture screenshot
c. Capture DOM state
d. Capture network activity
e. Capture console logs
f. Evaluate success using flexibility criteria
3. Record result (pass/fail/blocked/skipped)
- PASS: Step completed successfully
- FAIL: Step failed OR element not found OR feature missing
- BLOCKED: Dependent on a failed blocking scenario
- SKIPPED: Only for valid environmental reasons (see Test Status Integrity)
4. If failed: Try alternatives if defined
5. If blocking failure: Stop dependent scenarios
When a step fails:
While executing, watch for undocumented paths:
For discoveries:
discovery_cap limit)For each success check, apply the configured flexibility type:
| Type | Evaluation Method |
|---|---|
exact | String/value must match exactly |
contains | Target must contain specified text |
ai_judgment | Use AI reasoning: "Does this accomplish [goal]?" |
For ai_judgment, provide confidence level:
For each step, capture and store:
evidence/
scenario-name/
step-01/
screenshot.png
dom-snapshot.html
network-log.json
console-log.txt
accessibility-snapshot.yaml
After run completes:
Compare to previous runs:
Update history file:
{
"runs": [
{
"timestamp": "ISO-8601",
"scenarios": {
"scenario-name": {
"result": "pass|fail|blocked|skipped",
"result_notes": "Details about the result",
"duration_ms": 1234,
"steps_completed": 5,
"confidence": "high|medium|low",
"discoveries": []
}
}
}
],
"flaky_scenarios": ["scenario-1", "scenario-2"],
"suggested_variations": [
{
"scenario": "login",
"variation": "Test with special characters in password",
"reason": "Failed 3/10 runs with complex passwords"
}
]
}
Result status rules (see Test Status Integrity):
pass: Feature works as specifiedfail: Feature doesn't work, doesn't exist, or test incompleteblocked: Depends on failed blocking scenarioskipped: ONLY for valid environmental reasons (with ticket reference)suggested_variations in historyPurpose: Generate actionable reports from test results.
Generate both reports after every run:
Output to tests/e2e/reports/YYYY-MM-DD-HHmmss-report.md:
# E2E Test Report: [Application Name]
**Run Date**: YYYY-MM-DD HH:mm:ss
**Duration**: X minutes
**Result**: X passed, Y failed, Z blocked, W skipped
## Summary
| Scenario | Result | Duration | Confidence |
|----------|--------|----------|------------|
| Login | PASS | 2.3s | High |
| Checkout | FAIL | 5.1s | High |
## Failures
### Checkout Flow
**Step Failed**: Step 3 - Click "Complete Purchase"
**Error**: Button not found within timeout
**Evidence**:
- Screenshot: `evidence/checkout/step-03/screenshot.png`
- Expected: Button with text "Complete Purchase"
- Actual: Page showed error message "Session expired"
**Reproduction Steps**:
1. Navigate to https://app.example.com
2. Login with test credentials
3. Add item to cart
4. Click checkout
5. [FAILS HERE] Click "Complete Purchase"
**Suggested Investigation**:
- Session timeout may be too aggressive
- Check if login state persists through checkout flow
## Discoveries
Found 2 undocumented paths:
1. **Alternative checkout**: Guest checkout available via footer link
2. **Quick reorder**: "Buy again" button on order history
## Flaky Areas
Based on history (last 10 runs):
- `search-results`: 7/10 pass rate - timing issue suspected
- `image-upload`: 8/10 pass rate - file size variations
## Suggested New Tests
Based on failures and history:
1. Test session persistence during long checkout
2. Test guest checkout flow (discovered)
3. Add timeout resilience to search tests
Output to tests/e2e/reports/YYYY-MM-DD-HHmmss-report.json:
{
"metadata": {
"application": "App Name",
"base_url": "https://...",
"run_timestamp": "ISO-8601",
"duration_ms": 123456,
"regime_version": "hash-of-regime-file"
},
"summary": {
"total": 10,
"passed": 7,
"failed": 2,
"blocked": 1,
"skipped": 0
},
"scenarios": [
{
"name": "checkout",
"result": "fail",
"duration_ms": 5100,
"confidence": "high",
"failed_step": {
"index": 3,
"action": "click",
"target": "button:Complete Purchase",
"error": "Element not found",
"evidence_path": "evidence/checkout/step-03/"
},
"reproduction": {
"playwright_commands": [
"await page.goto('https://app.example.com')",
"await page.fill('#username', 'test')",
"await page.click('button:Login')",
"await page.click('.add-to-cart')",
"await page.click('button:Checkout')",
"// FAILED: await page.click('button:Complete Purchase')"
]
},
"alternatives_tried": [
{
"path": "Use keyboard Enter instead of click",
"result": "fail"
}
]
}
],
"discoveries": [
{
"type": "alternative_path",
"description": "Guest checkout via footer",
"location": "footer > a.guest-checkout",
"tested": true,
"result": "pass"
}
],
"history_analysis": {
"regressions": ["checkout"],
"persistent_failures": [],
"flaky": ["search-results", "image-upload"]
},
"suggested_actions": [
{
"type": "investigate",
"scenario": "checkout",
"reason": "New regression - passed in previous 5 runs"
},
{
"type": "add_test",
"scenario": "guest-checkout",
"reason": "Discovered undocumented path"
}
]
}
After generating reports:
Display summary to user:
Highlight actionable items:
Offer next steps:
Before completing any mode, verify:
test-regime-schema.md - Complete YAML schema for test regime filesflexibility-criteria-guide.md - How to define and evaluate flexible success criteriahistory-schema.md - JSON schema for test history trackingReport templates are embedded in this skill. The machine-readable format is designed for consumption by a future bug-fix skill.