Run any Skill in Manus with one click

playwright-vlm

VLM-enhanced visual regression review using Azure OpenAI GPT-4o vision. Use when: user asks for VLM review, semantic visual diff, smart noise filter, vision language model, GPT-4o visual comparison, VLM confidence threshold, visual regression adjudication, VLM cost controls, or VLM-enhanced screenshot analysis. Provides a self-contained, cross-project portable VLM integration with templates that can be copied into any Playwright project.

Run Skill in Manus

Overview

Install command

npx skills add https://github.com/nickhou1983/UI-test-Demo --skill playwright-vlm

Copy and paste this command into Claude Code to install the skill

Source

nickhou1983/UI-test-Demo

Stars3

Forks2

UpdatedMarch 20, 2026 at 06:44

File Explorer

6 files

SKILL.md

readonly

Playwright VLM Visual Regression Review

⚠️ Report-Only Policy

This skill does NOT auto-update baselines or modify test files.

VLM review is advisory — it classifies pixel diffs as noise or real regressions
When a breaking regression is detected, the test fails; this skill does NOT fix it
Baseline updates still require explicit tester approval regardless of VLM verdict

Purpose

This skill owns VLM-enhanced visual regression review:

pixel-first, VLM-fallback strategy
Azure OpenAI GPT-4o vision integration
severity classification and judgment semantics
confidence threshold and cost controls
prompt engineering for visual diff analysis
reporter output format (vlm-review-report.json)

It does not own:

core pixel-level screenshot comparison → playwright-visual
Azure Playwright Workspace infrastructure → playwright-azure
CI merge gating and baseline authority policies → ui-test-governance

Self-Contained Templates

This skill ships with ready-to-install templates in the templates/ subdirectory:

templates/
├── utils/
│   ├── vlm-prompts.ts         # System & user prompt for GPT-4o vision
│   ├── vlm-reviewer.ts        # Core review logic + Azure OpenAI client
│   ├── vlm-reporter.ts        # Playwright custom reporter → JSON + ANSI console
│   └── vlm-html-reporter.ts   # Standalone HTML visual report with screenshots
└── fixtures/
    └── visual-test.ts          # Hybrid pixel-first / VLM-fallback fixture

To install in a new project, see the Setup section below.

Prerequisites

A Playwright visual testing project already configured (use playwright-config or playwright-visual first)
Azure OpenAI resource with a GPT-4o (vision-capable) deployment
Either an API key or Azure Entra ID credentials for authentication

Input Contract

Input	Required	Example
VLM enabled	yes	`VLM_REVIEW=true`
Azure OpenAI endpoint	yes	`https://my-resource.openai.azure.com`
Authentication	yes	API key or Entra ID
Deployment name	no (default: `gpt-4o`)	`gpt-4o`
Max API calls per run	no (default: 10)	`VLM_MAX_CALLS=20`
Confidence threshold	no (default: 0.7)	`VLM_CONFIDENCE_THRESHOLD=0.8`

Strategy: Pixel-First, VLM-Fallback

                    Screenshot Assertion
                           │
                 ┌─────────┴─────────┐
                 ▼                   ▼
          Pixel PASS            Pixel FAIL
             │                      │
             ▼                      ▼
         Test passes        VLM_REVIEW=true?
                            │            │
                           NO           YES
                            │            │
                            ▼            ▼
                       Test fails   Call Azure OpenAI
                                    GPT-4o Vision
                                         │
                                    ┌────┴────┐
                                    ▼         ▼
                              confidence   confidence
                               ≥ 0.7       < 0.7
                                 │            │
                            ┌────┴────┐       ▼
                            ▼         ▼   Test fails
                       none/cosmetic  minor  breaking
                            │         │       │
                            ▼         ▼       ▼
                        Pass+note  Pass+warn  Test fails

The native Playwright toHaveScreenshot() always runs first. VLM is only invoked when pixel comparison fails and VLM_REVIEW=true.

Environment Variables

Variable	Default	Description
`VLM_REVIEW`	`false`	Enable VLM fallback review (`true` / `false`)
`VLM_MAX_CALLS`	`10`	Maximum VLM API calls per test run (cost control)
`VLM_CONFIDENCE_THRESHOLD`	`0.7`	Minimum confidence to trust VLM verdict
`AZURE_OPENAI_ENDPOINT`	—	Azure OpenAI resource endpoint (required)
`AZURE_OPENAI_API_KEY`	—	API key (if not using Entra ID)
`AZURE_OPENAI_DEPLOYMENT`	`gpt-4o`	Model deployment name
`VLM_REPORT_INLINE`	`false`	Embed base64 images in HTML report (`true` / `false`)

Severity Levels

Severity	Recommendation	Action	Examples
`none`	`pass`	Test passes + `vlm-override` annotation	Identical screenshots, sub-pixel differences
`cosmetic`	`pass`	Test passes + `vlm-override` annotation	Font anti-aliasing, ±1px shadow, compression artifacts
`minor`	`warn`	Test passes + `vlm-warning` annotation	Spacing <5px, subtle color shifts, minor icon variation
`breaking`	`fail`	Test fails + `vlm-breaking` annotation	Missing elements, text truncation, layout collapse >10px

When confidence is below the threshold, the test fails with a vlm-low-confidence annotation regardless of severity.

Authentication

Option 1: API Key

AZURE_OPENAI_ENDPOINT=https://my-resource.openai.azure.com
AZURE_OPENAI_API_KEY=your-api-key
AZURE_OPENAI_DEPLOYMENT=gpt-4o

Option 2: Azure Entra ID (Managed Identity)

AZURE_OPENAI_ENDPOINT=https://my-resource.openai.azure.com
AZURE_OPENAI_DEPLOYMENT=gpt-4o
# No API key — uses DefaultAzureCredential automatically

Entra ID requires the Cognitive Services OpenAI User role on the Azure OpenAI resource.

Fixture Usage

Import the hybrid fixture instead of the standard Playwright test:

import { test } from '../fixtures/visual-test';

test('home page', async ({ page, assertScreenshotWithVlm }) => {
  await page.goto('./');
  await assertScreenshotWithVlm({
    name: 'home-main.png',
    target: page,
    screenshotOptions: { animations: 'disabled', maxDiffPixelRatio: 0.01 },
    pageName: 'Home Page',
    route: '/',
    viewport: '1280x720',
  });
});

The assertScreenshotWithVlm fixture:

Runs native toHaveScreenshot() first
On pixel failure + VLM_REVIEW=true, calls GPT-4o vision
Applies confidence threshold and severity classification
Annotates the test result for the VLM reporter

Reporter Configuration

Add the VLM reporter conditionally in playwright.config.ts:

const reporter: ReporterDescription[] = [['html', { open: 'never' }]];

if (process.env.VLM_REVIEW === 'true') {
  reporter.push(['./tests/utils/vlm-reporter.ts']);
  reporter.push(['./tests/utils/vlm-html-reporter.ts']);
}

export default defineConfig({
  reporter,
  // ...
});

Report Output

The reporters produce two output files:

vlm-review-report.json — machine-readable JSON with aggregation and pattern detection
vlm-visual-report.html — standalone HTML dashboard with screenshot side-by-side cards

JSON Report

{
  "generatedAt": "2026-03-19T08:00:00.000Z",
  "testRunStatus": "passed",
  "vlmEnabled": true,
  "vlmCallCount": 3,
  "maxCalls": 10,
  "confidenceThreshold": 0.7,
  "summary": {
    "total": 3,
    "passed_vlm": 2,
    "warned_vlm": 1,
    "failed": 0
  },
  "entries": [
    {
      "testTitle": "home page baseline",
      "testFile": "tests/visual/pages.visual.spec.ts",
      "screenshotName": "home-page.png",
      "pixelDiffFailed": true,
      "vlmSeverity": "cosmetic",
      "vlmDescription": "Subtle font rendering variation in navbar",
      "vlmAreas": ["navbar"],
      "vlmChangedProperties": ["color"],
      "vlmRecommendation": "pass",
      "vlmConfidence": 0.95,
      "action": "passed_vlm",
      "baselinePath": "test-results/.../baseline.png",
      "actualPath": "test-results/.../actual.png",
      "diffPath": "test-results/.../diff.png"
    }
  ],
  "patterns": [
    {
      "type": "global-property",
      "label": "color",
      "affectedCount": 6,
      "totalCount": 8,
      "percentage": 75,
      "details": "\"color\" change detected across 6/8 pages (75%)"
    }
  ],
  "aggregation": {
    "bySeverity": { "cosmetic": 2, "minor": 1 },
    "byArea": { "navbar": 3, "hero": 1 },
    "byProperty": { "color": 6, "typography": 1 },
    "byPage": {}
  }
}

HTML Report

The HTML report (vlm-visual-report.html) features:

Dashboard: severity distribution cards + detected cross-page patterns
Entry cards: baseline / actual screenshots side-by-side with severity badge, confidence progress bar, areas and changedProperties tags
Image modes: external file:// refs by default (smaller HTML); set VLM_REPORT_INLINE=true for base64 inline (portable)

Playwright HTML Reporter Integration

Screenshots are also attached via testInfo.attach(), making them visible inside Playwright's built-in HTML reporter per-test view.

ANSI Terminal Output

The console table output is now color-coded:

Breaking → red, Minor → yellow, Cosmetic → green
Cross-page pattern detection summary at the bottom


## Prompt Engineering

The system prompt (in `templates/utils/vlm-prompts.ts`) instructs GPT-4o to:

1. Compare BASELINE and ACTUAL screenshots side by side
2. Classify differences into exactly one severity level
3. Return a structured JSON verdict with `severity`, `description`, `areas`, `changedProperties`, `recommendation`, `confidence`
4. Focus on what a human tester would notice at normal viewing distance
5. Ignore OS/browser rendering engine variations

The user prompt includes page context (name, route, viewport, language) to help the model understand what it is reviewing.

### Customizing Prompts

To adjust severity thresholds or add domain-specific rules, edit the `VLM_SYSTEM_PROMPT` in `vlm-prompts.ts`. For example, to make the model stricter about spacing:

Replace: "Spacing changes < 5px" under "minor" With: "Spacing changes < 3px" under "minor" and ">= 3px" under "breaking"


## Execution Commands

| Action | Command |
|--------|---------|
| Visual tests without VLM | `npm run test:visual` |
| Visual tests with VLM | `npm run test:visual:vlm` |
| Azure cloud visual without VLM | `npm run test:azure:visual` |
| Azure cloud visual with VLM | `npm run test:azure:visual:vlm` |

Equivalent manual commands:

```bash
# Local with VLM
VLM_REVIEW=true npx playwright test --project=visual

# Azure with VLM
VLM_REVIEW=true npx playwright test --config=playwright.service.config.ts --project=visual

Setup — Installing in a New Project

Follow these steps to add VLM review to any Playwright project:

1. Copy Templates

# From .github/skills/playwright-vlm/templates/
cp templates/utils/vlm-prompts.ts   <project>/tests/utils/vlm-prompts.ts
cp templates/utils/vlm-reviewer.ts  <project>/tests/utils/vlm-reviewer.ts
cp templates/utils/vlm-reporter.ts       <project>/tests/utils/vlm-reporter.ts
cp templates/utils/vlm-html-reporter.ts  <project>/tests/utils/vlm-html-reporter.ts
cp templates/fixtures/visual-test.ts     <project>/tests/fixtures/visual-test.ts

2. Install npm Dependencies

npm install openai @azure/identity

3. Configure Reporter

Add to playwright.config.ts:

const reporter: ReporterDescription[] = [['html', { open: 'never' }]];

if (process.env.VLM_REVIEW === 'true') {
  reporter.push(['./tests/utils/vlm-reporter.ts']);
  reporter.push(['./tests/utils/vlm-html-reporter.ts']);
}

4. Set Environment Variables

Create or update .env:

VLM_REVIEW=true
VLM_MAX_CALLS=10
VLM_CONFIDENCE_THRESHOLD=0.7
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_API_KEY=your-key
AZURE_OPENAI_DEPLOYMENT=gpt-4o

5. Add npm Scripts (Optional)

{
  "test:visual": "VLM_REVIEW=false playwright test --project=visual",
  "test:visual:vlm": "VLM_REVIEW=true playwright test --project=visual"
}

6. Add to .gitignore

vlm-review-report.json
vlm-visual-report.html

Cost Controls

VLM_MAX_CALLS caps the number of GPT-4o API calls per test run (default: 10)
VLM is only invoked when pixel comparison fails — passing tests never call the API
Each call sends two images (baseline + actual) at detail: high
Token usage is attached to the review result for cost tracking
Use VLM_REVIEW=false (the default) to disable VLM entirely

Governance Boundary

This skill owns:

VLM reviewer implementation and integration patterns
Prompt engineering for visual diff classification
VLM confidence and cost control configuration
VLM reporter output format

This skill delegates:

When/whether to enable VLM in CI pipelines → ui-test-governance
Core pixel-level screenshot strategy → playwright-visual
Azure OpenAI resource provisioning → playwright-azure
PR merge gating based on VLM results → ui-test-governance

Definition Of Done

This skill is complete when:

VLM templates are installed in the target project's tests/ directory
Environment variables are configured with valid Azure OpenAI credentials
The VLM reporter is conditionally loaded in playwright.config.ts
Running VLM_REVIEW=true npx playwright test --project=visual produces vlm-review-report.json and vlm-visual-report.html
Pixel-passing tests do not invoke VLM (zero API cost for green runs)

playwright-vlm

More from this repository

More from this repository

Playwright VLM Visual Regression Review

⚠️ Report-Only Policy

Purpose

Self-Contained Templates

Prerequisites

Input Contract

Strategy: Pixel-First, VLM-Fallback

Environment Variables

Severity Levels

Authentication

Option 1: API Key

Option 2: Azure Entra ID (Managed Identity)

Fixture Usage

Reporter Configuration

Report Output

JSON Report

HTML Report

Playwright HTML Reporter Integration

ANSI Terminal Output

Setup — Installing in a New Project

1. Copy Templates

2. Install npm Dependencies

3. Configure Reporter

4. Set Environment Variables

5. Add npm Scripts (Optional)

6. Add to .gitignore

Cost Controls

Governance Boundary

Definition Of Done

Playwright VLM Visual Regression Review

⚠️ Report-Only Policy

Purpose

Self-Contained Templates

Prerequisites

Input Contract

Strategy: Pixel-First, VLM-Fallback

Environment Variables

Severity Levels

Authentication

Option 1: API Key

Option 2: Azure Entra ID (Managed Identity)

Fixture Usage

Reporter Configuration

Report Output

JSON Report

HTML Report

Playwright HTML Reporter Integration

ANSI Terminal Output

Setup — Installing in a New Project

1. Copy Templates

2. Install npm Dependencies

3. Configure Reporter

4. Set Environment Variables

5. Add npm Scripts (Optional)

6. Add to .gitignore

Cost Controls

Governance Boundary

Definition Of Done