원클릭으로
visual-testing
// Screenshot-based visual comparison and regression testing using claude-in-chrome MCP. Captures, compares, and validates UI states to detect layout shifts, visual bugs, and design regressions across viewports.
// Screenshot-based visual comparison and regression testing using claude-in-chrome MCP. Captures, compares, and validates UI states to detect layout shifts, visual bugs, and design regressions across viewports.
Complex browser automation workflow using claude-in-chrome MCP with mandatory sequential-thinking planning. Use when automating multi-step web interactions, form filling, navigation sequences, or web scraping.
End-to-end testing workflow for validating complete user journeys through web applications using claude-in-chrome MCP. Specializes in test assertions, suite organization, evidence collection, and pass/fail reporting.
Structured data extraction from web pages using claude-in-chrome MCP with sequential-thinking planning. Focus on READ operations, data transformation, and pagination handling for multi-page extraction.
Use when conducting comprehensive code review for pull requests across multiple quality dimensions. Orchestrates 12-15 specialized reviewer agents across 4 phases using star topology coordination. Covers automated checks, parallel specialized reviews (quality, security, performance, architecture, documentation), integration analysis, and final merge recommendation in a 4-hour workflow.
Create Claude Code hooks with proper schemas, RBAC integration, and performance requirements. Use when implementing PreToolUse, PostToolUse, SessionStart, or any of the 10 hook event types for automation, validation, or security enforcement.
Build reliable GitHub integrations, webhooks, and automation bridges
| name | visual-testing |
| version | 1.0.0 |
| description | Screenshot-based visual comparison and regression testing using claude-in-chrome MCP. Captures, compares, and validates UI states to detect layout shifts, visual bugs, and design regressions across viewports. |
| category | tooling |
| tags | ["testing","visual","regression","browser","mcp","screenshots"] |
| author | system |
| cognitive_frame | {"primary":"spatial","secondary":"aspectual","rationale":"Visual testing requires precise spatial awareness for layout comparison and aspectual understanding of state changes between captures"} |
| triggers | ["visual regression","screenshot comparison","UI testing","layout testing","responsive testing","visual diff","baseline comparison"] |
Kaynak dogrulama modu etkin.
[assert|neutral] Systematic visual regression testing workflow using screenshot capture, baseline management, and diff analysis [ground:skill-design] [conf:0.92] [state:confirmed]
Visual Testing specializes in detecting unintended UI changes through screenshot-based comparison. Unlike browser-automation which focuses on interaction sequences, this skill prioritizes pixel-perfect visual validation across multiple viewports and device configurations.
Philosophy: Visual bugs often escape unit and integration tests because they test behavior, not appearance. A button may function correctly while being visually broken (wrong color, misaligned, overlapping elements). Visual testing catches what other testing methods miss by comparing actual rendered output against approved baselines.
Methodology: Six-phase workflow with baseline management:
Value Proposition: Reduce visual bug escapes by 85% through systematic screenshot comparison. Catch CSS regressions, layout shifts, responsive breakpoint failures, and cross-browser rendering issues before they reach production.
Key Differentiation from browser-automation:
| Aspect | browser-automation | visual-testing |
|---|---|---|
| Focus | Interaction sequences | Visual state capture |
| Output | Workflow completion | Diff reports |
| Validation | Functional success | Pixel comparison |
| Artifacts | Execution logs | Baseline images + diffs |
| Primary Use | E2E workflows | Regression detection |
Trigger Thresholds:
| Scenario | Recommendation |
|---|---|
| Single page screenshot | Use computer tool directly (too simple) |
| 2-5 page visual checks | Consider this skill |
| Multi-viewport responsive testing | Mandatory use |
| Baseline comparison needed | Mandatory use |
| Design system validation | Mandatory use |
Primary Use Cases:
Apply When:
Visual Testing operates on 5 fundamental principles:
Golden images (baselines) serve as the source of truth. Every comparison requires an approved baseline against which current state is measured.
Rationale: Without baselines, visual testing becomes subjective screenshot collection. Baselines make quality objective and measurable.
In Practice:
Test across multiple viewport configurations to catch responsive regressions that only appear at specific breakpoints.
Rationale: Most visual bugs manifest at edge cases - unusual screen widths, portrait vs landscape, mobile vs desktop. Single-viewport testing misses these.
In Practice:
Not every pixel difference is a regression. Configure tolerance thresholds to distinguish intentional changes from bugs.
Rationale: Anti-aliasing, font rendering, and timing-dependent animations create non-deterministic pixel variations. Zero-tolerance comparison produces false positives.
In Practice:
Use the zoom tool for detailed inspection of specific UI elements when full-page screenshots insufficient.
Rationale: Small elements (icons, badges, indicators) may have regressions invisible at full-page scale. Zoomed captures reveal micro-regressions.
In Practice:
Static screenshots miss animation and transition regressions. Use GIF recording to capture temporal UI behavior.
Rationale: CSS animations, hover states, loading sequences, and micro-interactions are visible only in motion. GIF recording captures these temporal aspects.
In Practice:
Before executing visual tests, validate required MCPs:
Preflight Sequence:
async function visualTestPreflight() {
const checks = {
sequential_thinking: false,
claude_in_chrome: false,
memory_mcp: false
};
// Check sequential-thinking MCP (required for planning)
try {
await mcp__sequential-thinking__sequentialthinking({
thought: "Visual test preflight - verifying MCP availability",
thoughtNumber: 1,
totalThoughts: 1,
nextThoughtNeeded: false
});
checks.sequential_thinking = true;
} catch (error) {
throw new Error("CRITICAL: sequential-thinking MCP required for visual test planning");
}
// Check claude-in-chrome MCP (required for capture)
try {
const context = await mcp__claude-in-chrome__tabs_context_mcp({});
checks.claude_in_chrome = true;
} catch (error) {
throw new Error("CRITICAL: claude-in-chrome MCP required for screenshot capture");
}
// Check memory-mcp (required for baseline storage)
try {
// Memory MCP check
checks.memory_mcp = true;
} catch (error) {
throw new Error("CRITICAL: memory-mcp required for baseline storage");
}
return checks;
}
Standard Viewport Matrix:
const VIEWPORT_PRESETS = {
// Mobile Devices
iphone_se: { width: 375, height: 667, name: "iPhone SE" },
iphone_14: { width: 390, height: 844, name: "iPhone 14" },
iphone_14_pro_max: { width: 430, height: 932, name: "iPhone 14 Pro Max" },
pixel_7: { width: 412, height: 915, name: "Pixel 7" },
// Tablets
ipad_mini: { width: 768, height: 1024, name: "iPad Mini" },
ipad_pro_11: { width: 834, height: 1194, name: "iPad Pro 11" },
ipad_pro_12: { width: 1024, height: 1366, name: "iPad Pro 12.9" },
// Desktop
laptop_sm: { width: 1280, height: 720, name: "Laptop Small (720p)" },
laptop_md: { width: 1440, height: 900, name: "Laptop Medium" },
desktop_hd: { width: 1920, height: 1080, name: "Desktop Full HD" },
desktop_4k: { width: 2560, height: 1440, name: "Desktop 2K" }
};
// Standard test matrix (most common)
const STANDARD_MATRIX = ["iphone_14", "ipad_pro_11", "desktop_hd"];
// Extended test matrix (comprehensive)
const EXTENDED_MATRIX = [
"iphone_se", "iphone_14_pro_max", "pixel_7",
"ipad_mini", "ipad_pro_12",
"laptop_sm", "desktop_hd", "desktop_4k"
];
const DIFF_THRESHOLDS = {
// Strict (design system components)
strict: {
pixelDiff: 0.01, // 0.01% tolerance (nearly pixel-perfect)
description: "For design system components requiring exact match"
},
// Default (most pages)
default: {
pixelDiff: 0.1, // 0.1% tolerance
description: "Standard threshold for most UI testing"
},
// Relaxed (dynamic content)
relaxed: {
pixelDiff: 1.0, // 1% tolerance
description: "For pages with minor dynamic variations"
},
// Animation (high variance)
animation: {
pixelDiff: 5.0, // 5% tolerance
description: "For animation captures with timing variance"
}
};
Error Categories:
| Category | Example | Recovery Strategy |
|---|---|---|
| MCP_UNAVAILABLE | claude-in-chrome offline | ABORT - cannot proceed |
| NAVIGATION_FAILED | Page timeout/404 | Retry 3x with backoff |
| CAPTURE_FAILED | Screenshot error | Retry with fresh tab |
| BASELINE_MISSING | No golden image | Prompt for baseline creation |
| COMPARISON_FAILED | Diff computation error | Log and skip, flag for review |
| THRESHOLD_EXCEEDED | Visual regression detected | Generate report, flag issue |
Purpose: Define visual test scope using sequential-thinking decomposition.
Process:
Input Contract:
inputs:
target_url: string # URL to test
pages: list[string] # Page paths to capture
viewport_matrix: list[string] # Viewport presets to use
capture_mode: string # "full_page" | "element" | "both"
threshold_profile: string # "strict" | "default" | "relaxed"
interaction_sequence: list # Optional: actions before capture
Output Contract:
outputs:
test_plan:
pages: list[PagePlan]
viewports: list[ViewportConfig]
capture_points: list[CapturePoint]
threshold: number
Purpose: Navigate to target page and establish correct state for capture.
Process:
Agent: Task("Setup page state", "Acting as browser-specialist: Navigate to URL, wait for full load, execute any required interactions to reach target state", "general-purpose")
Purpose: Capture screenshots across all configured viewports.
Process:
For each viewport in viewport_matrix:
1. Resize window (resize_window)
2. Wait for reflow (wait 500ms)
3. Capture full page (computer screenshot)
4. Capture zoomed regions if configured (computer zoom)
5. Store capture with viewport/page metadata
Key Tools:
resize_window: Set viewport dimensionscomputer (screenshot): Full page capturecomputer (zoom): Element-level detail capturegif_creator: For interaction sequencesPurpose: Compare current captures against stored baselines.
Process:
visual-testing/baselines/{project}/{page}/{viewport})Comparison Algorithm:
function compareScreenshots(current, baseline, threshold) {
const totalPixels = current.width * current.height;
let diffPixels = 0;
for (let y = 0; y < current.height; y++) {
for (let x = 0; x < current.width; x++) {
if (!pixelsMatch(current.getPixel(x, y), baseline.getPixel(x, y))) {
diffPixels++;
}
}
}
const diffPercent = (diffPixels / totalPixels) * 100;
return {
passed: diffPercent <= threshold,
diffPercent: diffPercent,
diffPixels: diffPixels,
totalPixels: totalPixels
};
}
Purpose: Generate comprehensive visual regression report.
Process:
Report Structure:
visual_regression_report:
timestamp: ISO8601
project: string
summary:
total_captures: number
passed: number
failed: number
new_baselines: number
failures:
- page: string
viewport: string
diff_percent: number
threshold: number
baseline_timestamp: ISO8601
current_capture_id: string
metadata:
viewports_tested: list
threshold_profile: string
duration_ms: number
Purpose: Update baselines when changes are intentional.
Process:
Baseline Storage Schema:
baseline:
namespace: "visual-testing/baselines/{project}/{page}/{viewport}"
data:
image_id: string # Reference to stored screenshot
captured_at: ISO8601
approved_by: string
threshold_used: number
viewport: object
url: string
version: number
tags:
WHO: "visual-testing:1.0.0"
WHEN: ISO8601
PROJECT: string
WHY: "baseline-capture"
No patterns recorded yet. This section will be updated through Loop 1.5 reflection.
No patterns recorded yet.
No patterns recorded yet.
Different visual testing scenarios require different approaches:
Patterns: "responsive", "breakpoint", "mobile", "tablet", "desktop", "viewport"
Common Characteristics:
Key Focus:
Approach: Use extended viewport matrix, focus on breakpoint edge cases (width +/- 10px from breakpoint)
Patterns: "component", "button", "card", "form", "modal", "dropdown"
Common Characteristics:
Key Focus:
Approach: Use zoom tool for detailed capture, strict threshold, capture all states via interaction sequence
Patterns: "animation", "transition", "hover", "loading", "skeleton"
Common Characteristics:
Key Focus:
Approach: Use gif_creator for recording, relaxed/animation threshold profile, capture key frames
Patterns: "staging vs production", "before after", "compare", "deploy validation"
Common Characteristics:
Key Focus:
Approach: Capture both states, generate side-by-side diff, use relaxed threshold for content areas
Different stakeholders need different visual test outputs:
Developers: Technical diffs with pixel coordinates, DOM structure comparison, CSS property changes
Designers: Visual overlays, color accuracy reports, spacing measurements, design token compliance
QA Team: Pass/fail summaries, regression counts, trend reports, baseline approval queues
Executives: High-level dashboards, regression trends, release readiness indicators
For pages with dynamic content, configure ignore regions to prevent false positives:
const IGNORE_REGIONS = {
common: [
{ selector: "[data-testid='timestamp']", reason: "Dynamic timestamp" },
{ selector: ".ad-container", reason: "Third-party ads" },
{ selector: ".live-chat-widget", reason: "Chat widget state varies" }
],
page_specific: {
"/dashboard": [
{ selector: ".metric-value", reason: "Live metrics" },
{ selector: ".user-avatar", reason: "User-specific content" }
]
}
};
For critical visual tests, use LLM Council for consensus:
// When visual diff is borderline (threshold +/- 0.5%)
async function multiModelVisualValidation(current, baseline, diff) {
const prompt = `
Analyze this visual comparison:
- Diff percentage: ${diff.diffPercent}%
- Changed pixels: ${diff.diffPixels}
- Threshold: ${diff.threshold}%
Is this change:
A) Intentional design update (approve new baseline)
B) Unintentional regression (flag for fix)
C) Acceptable variation (pass with note)
Provide reasoning.
`;
// Route to Gemini for image analysis capability
return await geminiAnalyze(current, baseline, prompt);
}
Avoid these common mistakes:
| Anti-Pattern | Problem | Solution |
|---|---|---|
| No wait after resize | Captures before reflow complete | Add 500ms wait after resize_window |
| Ignoring async content | Missing dynamically loaded elements | Wait for network idle or specific selectors |
| Single viewport only | Missing responsive regressions | Use minimum 3 viewports (mobile, tablet, desktop) |
| Capturing during animation | Non-deterministic frames | Wait for animations or use GIF |
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Zero tolerance | False positives from anti-aliasing | Use minimum 0.01% threshold |
| No baseline versioning | Cannot rollback bad baseline | Version baselines with timestamps |
| Comparing different viewports | Invalid diff | Validate viewport match before compare |
| No ignore regions | Dynamic content causes failures | Configure ignore regions for timestamps, ads |
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Skip planning phase | Missing edge cases | ALWAYS use sequential-thinking first |
| No interaction before capture | Missing auth/state-dependent pages | Plan interaction sequences |
| Silent baseline updates | Regressions approved accidentally | Require explicit approval |
| No cleanup | Orphaned tabs accumulate | Close tabs after test completion |
Full Mode (comprehensive):
Quick Mode (smoke test):
For large test suites (20+ pages):
| Decision | Option A | Option B | Guidance |
|---|---|---|---|
| Threshold strictness | Strict (0.01%) | Relaxed (1%) | Strict for design system, relaxed for content-heavy |
| Viewport coverage | Extended (8+) | Standard (3) | Extended for responsive-focused apps |
| Capture mode | Full page | Element zoom | Full page default, zoom for component testing |
| Baseline storage | Local | Memory MCP | Memory MCP for cross-session persistence |
Visual Testing works with other skills in the ecosystem:
| Skill | When to Use First | What It Provides |
|---|---|---|
intent-analyzer | Always first | Detect visual testing need, extract URLs |
browser-automation | For complex page states | Navigation + interaction to reach state |
prompt-architect | For test plan optimization | Structured test specifications |
| Skill | When to Use After | What It Does |
|---|---|---|
fix-bug | On regression detection | Fix visual bugs identified |
documenter | For test reports | Generate visual test documentation |
deployment | Before deploy | Gate deployment on visual test pass |
| Skill | When to Run Together | How They Coordinate |
|---|---|---|
e2e-test | Same page coverage | Visual captures functional tests |
browser-automation | Page state setup | Automation provides capture-ready state |
code-review-assistant | CSS changes | Visual test validates review findings |
Required MCPs:
| MCP | Purpose | Tools Used |
|---|---|---|
| sequential-thinking | Test planning | sequentialthinking |
| claude-in-chrome | Screenshot capture | navigate, resize_window, computer (screenshot, zoom), gif_creator, tabs_context_mcp, tabs_create_mcp |
| memory-mcp | Baseline storage | memory_store, vector_search, memory_query |
Tool-Specific Usage:
| Tool | Purpose in Visual Testing |
|---|---|
tabs_context_mcp | Get/verify browser context before tests |
tabs_create_mcp | Create clean tab for test isolation |
resize_window | Set viewport dimensions |
navigate | Load target URL |
computer (screenshot) | Capture full page state |
computer (zoom) | Capture specific region with magnification |
computer (wait) | Pause for reflow/animation completion |
gif_creator | Record interaction sequences |
read_page | Verify page structure before capture |
find | Locate elements for region capture |
Pattern: skills/tooling/visual-testing/{type}/{project}/{page}/{viewport}
Types:
baselines/ - Golden images (approved screenshots)captures/ - Current test capturesreports/ - Visual regression reportsdiffs/ - Generated diff visualizationsStore:
Retrieve:
Tagging:
{
"WHO": "visual-testing:1.0.0",
"WHEN": "ISO8601_timestamp",
"PROJECT": "{project_name}",
"WHY": "visual-regression-testing",
"page": "{page_path}",
"viewport": "{viewport_name}",
"threshold_profile": "{profile}",
"passed": true
}
visual_test_request:
required:
target_url: string # Base URL to test
optional:
pages: list[string] # Specific paths (default: ["/"])
viewport_matrix: list[string] # Preset names (default: STANDARD_MATRIX)
capture_mode: string # "full_page" | "element" | "both" (default: "full_page")
threshold_profile: string # "strict" | "default" | "relaxed" (default: "default")
compare_baseline: boolean # Whether to compare (default: true)
update_baseline: boolean # Whether to update on approval (default: false)
interaction_sequence: list # Actions before capture
ignore_regions: list # Selectors to ignore
visual_test_result:
summary:
status: "passed" | "failed" | "new_baselines"
total_captures: number
passed: number
failed: number
new_baselines: number
execution_time_ms: number
captures:
- page: string
viewport: string
capture_id: string
baseline_id: string | null
comparison:
passed: boolean
diff_percent: number
threshold: number
failures:
- page: string
viewport: string
diff_percent: number
reason: string
report_id: string # Memory MCP reference to full report
| Loop | Visual Testing Role |
|---|---|
| Loop 1 | Execute visual tests as part of validation |
| Loop 1.5 | Capture learnings about threshold tuning, false positives |
| Loop 2 | Quality validation of test coverage |
| Loop 3 | Aggregate patterns for threshold optimization |
Visual testing supports evaluation via:
| Signal | Confidence | Learning |
|---|---|---|
| User approves new baseline | HIGH (0.90) | Threshold was appropriate |
| User rejects false positive | HIGH (0.90) | Threshold too strict for context |
| User flags missed regression | HIGH (0.90) | Threshold too relaxed |
| Same page fails repeatedly | MEDIUM (0.75) | Investigate dynamic content issue |
Complexity: Medium (3 viewports, 5 pages)
Task: Validate homepage responsive behavior across mobile, tablet, desktop
Planning Output (sequential-thinking):
Thought 1/6: Need to validate responsive breakpoints for homepage
Thought 2/6: Viewports: iPhone 14 (390px), iPad Pro 11 (834px), Desktop HD (1920px)
Thought 3/6: Capture sections: hero, features, pricing, footer
Thought 4/6: Use default threshold (0.1%) for static content
Thought 5/6: Check baseline existence, compare if present
Thought 6/6: Generate report with pass/fail per viewport
Execution:
// 1. Create test tab
await tabs_create_mcp() // -> tabId: 123
// 2. Navigate to homepage
await navigate({ url: "https://example.com/", tabId: 123 })
// 3. Mobile viewport (iPhone 14)
await resize_window({ width: 390, height: 844, tabId: 123 })
await computer({ action: "wait", duration: 0.5, tabId: 123 })
await computer({ action: "screenshot", tabId: 123 }) // -> capture_mobile.png
// 4. Tablet viewport (iPad Pro 11)
await resize_window({ width: 834, height: 1194, tabId: 123 })
await computer({ action: "wait", duration: 0.5, tabId: 123 })
await computer({ action: "screenshot", tabId: 123 }) // -> capture_tablet.png
// 5. Desktop viewport (Desktop HD)
await resize_window({ width: 1920, height: 1080, tabId: 123 })
await computer({ action: "wait", duration: 0.5, tabId: 123 })
await computer({ action: "screenshot", tabId: 123 }) // -> capture_desktop.png
// 6. Compare each against baseline from Memory MCP
// 7. Generate report
Result: 3/3 viewports passed, no regressions detected
Execution Time: 45 seconds
Complexity: Medium (4 states per button, zoom captures)
Task: Validate primary button visual states (default, hover, active, disabled)
Planning Output:
Thought 1/8: Testing primary button component visual states
Thought 2/8: States to capture: default, hover, active, disabled
Thought 3/8: Use zoom tool for detailed button capture
Thought 4/8: Strict threshold (0.01%) for design system component
Thought 5/8: Capture default state first
Thought 6/8: Use hover action for hover state
Thought 7/8: Use mouse down for active state
Thought 8/8: Navigate to disabled example for disabled state
Execution:
// 1. Navigate to component library
await navigate({ url: "https://storybook.example.com/button", tabId: 123 })
// 2. Find button element
const button = await find({ query: "primary button", tabId: 123 })
// 3. Zoom capture default state
await computer({ action: "zoom", region: [button.x, button.y, button.x + 200, button.y + 50], tabId: 123 })
// 4. Hover state capture
await computer({ action: "hover", coordinate: [button.x + 100, button.y + 25], tabId: 123 })
await computer({ action: "zoom", region: [button.x, button.y, button.x + 200, button.y + 50], tabId: 123 })
// ... continue for active, disabled states
Result: 4/4 states passed strict threshold
Execution Time: 30 seconds
Complexity: High (GIF recording, temporal comparison)
Task: Capture and validate skeleton-to-content loading animation
Planning Output:
Thought 1/6: Need to capture loading animation as GIF
Thought 2/6: Trigger reload to capture full sequence
Thought 3/6: Start GIF recording before reload
Thought 4/6: Wait for content load completion
Thought 5/6: Stop recording and export GIF
Thought 6/6: Use animation threshold (5%) for comparison
Execution:
// 1. Start GIF recording
await gif_creator({ action: "start_recording", tabId: 123 })
await computer({ action: "screenshot", tabId: 123 }) // Initial frame
// 2. Trigger reload
await navigate({ url: "https://example.com/dashboard", tabId: 123 })
// 3. Wait for load sequence
await computer({ action: "wait", duration: 3, tabId: 123 })
await computer({ action: "screenshot", tabId: 123 }) // Final frame
// 4. Stop recording and export
await gif_creator({ action: "stop_recording", tabId: 123 })
await gif_creator({ action: "export", download: true, filename: "loading-animation.gif", tabId: 123 })
Result: Animation captured successfully, 2.3% diff from baseline (within 5% animation threshold)
Execution Time: 15 seconds
| Issue | Cause | Solution |
|---|---|---|
| Screenshots are blank/black | Page not fully loaded | Add wait after navigation, check for lazy loading |
| Diff always fails | Threshold too strict | Increase threshold or configure ignore regions |
| Viewport resize not working | Tab permission issue | Create new tab with tabs_create_mcp |
| GIF not recording | Recording not started | Call gif_creator start_recording before actions |
| Baseline not found | Wrong namespace key | Verify page/viewport in Memory MCP query |
| Zoom captures wrong region | Coordinates shifted | Recalculate region after viewport resize |
Enable verbose output for troubleshooting:
const DEBUG_MODE = true;
if (DEBUG_MODE) {
console.log("Viewport:", viewport);
console.log("Page URL:", url);
console.log("Capture timestamp:", new Date().toISOString());
console.log("Baseline exists:", baselineExists);
console.log("Diff result:", diffResult);
}
Visual Testing provides systematic screenshot-based regression detection that complements functional testing. By comparing actual rendered output against approved baselines across multiple viewports, this skill catches UI regressions that unit and integration tests miss.
The key differentiators are:
When integrated with the CI/CD pipeline, visual testing serves as a deployment gate that prevents visual regressions from reaching production. Combined with Memory MCP for persistent baselines, the system maintains consistent quality across releases.
Quality Thresholds:
Failure Indicators:
VISUAL_TESTING_VERILINGUA_VERIX_COMPLIANT