Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

$pwd:

vrt-css-fix-loop

Name: Vrt Css Fix Loop
Author: mizchi

// Closed-loop CSS auto-repair. Given a fixture with a known regression (one CSS property or one selector block removed), iterate with a VLM that proposes the missing fix from the diff screenshot, apply it, and re-run until the diff falls below a threshold. Currently scoped to the CSS-challenge fixture set in `src/experiments/css-challenge/`; adapting to an arbitrary repo requires writing a fixture entry. Use when measuring whether a VLM model can recover a known regression, not for production self-healing.

Exécuter dans Manus

$ git log --oneline --stat

stars:2

forks:0

updated:19 mai 2026 à 10:08

SKILL.md

readonly

related-skills.json

même dépôt

vrt.md

from "mizchi/vlmkit"

Reference for the `vrt` CLI — Visual Regression Testing combined with accessibility (a11y) semantic verification. Use when running `vrt-test` / `vrt` (Playwright VRT) / `vrt-update` / `vrt compare` / `vrt snapshot`, configuring the fix-loop CSS challenge benchmark, or picking a VLM model for diff analysis.

2026-05-232

vrt-regression-watch.md

from "mizchi/vlmkit"

Run vrt diff in a stateful loop where each run is compared against the previous run's persisted summary, surfacing a `⚠ REGRESSION` banner when the majority of viewports get worse. Designed for periodic CI gates (per-PR or scheduled) where you want "did this change make things worse" as a binary signal, not a one-shot snapshot. Stores summary at `.vrt/last-diff-for-agent.json` by default.

2026-05-192

vrt-visual-diff.md

from "mizchi/vlmkit"

Compare two rendered pages (URL pairs or local HTML) and produce a structured Markdown report tailored for coding agents — pixel diff per viewport, per-section diffRatio, computed-style diff split into universal vs. breakpoint-gated, and worst-viewport screenshot paths. Use when the agent just made a UI change and needs to know whether it altered visible output, where it altered it, and which CSS properties drove the change. Works on a single HTML/URL pair without baseline state.

2026-05-192

vrt-migration-eval.md

from "mizchi/vlmkit"

Evaluate whether a framework / CSS-library / build-system swap (Tailwind → vanilla CSS, reset-css switch, bundler swap, etc.) produced a visually-equivalent result. Three modes — compare (deterministic pixel + CSD), blind (agent runs without baseline reference), subagent (dispatched-agent verification) — let the caller pick the rigour level. Use when the diff is large by construction (the markup was rewritten) and a flat pixel diff would drown the actual regressions in noise.

2026-05-192

vrt-markup-synth.md

from "mizchi/vlmkit"

Five DOM/pixel-based signal tools for markup work — turn a screenshot + HTML scaffold into a converging pixel diff (build component), crop a page screenshot into per-component PNGs (scan component), audit hard-coded values against a design-token scale (check tokens), verify theme parity (check theme — light vs dark render), and stress-test layout under inflated text (stress i18n). All five are pure DOM + Playwright + pixel processing — **no VLM / no API key required**. The agent supplies the markup reasoning; the tool surfaces the signal. Use when you've authored HTML/CSS and want signal back without paying VLM latency or cost.

2026-05-192

agent-validation-loop.md

from "mizchi/vlmkit"

Improve a developer-facing tool (CLI, library, agent harness) by running closed-loop validation with disposable subagents, treating their friction as the deliverable, fixing the friction, and re-running. Use when the user wants to evaluate or harden a tool whose value depends on whether an agent can use it — vrt-like signal loops, agent SDKs, CLI ergonomics, prompt scaffolding, IDE integrations. The loop produces both a stronger tool and a written record (per-run reports + tracked issues) of what was learned and why.

2026-05-162

package.json

"author": "mizchi"

"repository": "mizchi/vlmkit"

Ouvrir le dépôt GitHub Voir les dépôts du créateur

$ install --global

$ download --local

Exécuter dans Manus

$ useful --forSOC

Analystes en assurance qualité des logiciels et testeursProfessions informatiques et mathématiques15-1253L4

name

vrt-css-fix-loop

description

Closed-loop CSS auto-repair. Given a fixture with a known regression (one CSS property or one selector block removed), iterate with a VLM that proposes the missing fix from the diff screenshot, apply it, and re-run until the diff falls below a threshold. Currently scoped to the CSS-challenge fixture set in `src/experiments/css-challenge/`; adapting to an arbitrary repo requires writing a fixture entry. Use when measuring whether a VLM model can recover a known regression, not for production self-healing.

vrt-css-fix-loop

fix-loop is the harness behind every VLM benchmark in this repo (docs/reports/2026-05-18-vlm-claude-vs-openrouter-vs-newcomers.md etc.). It takes a fixture, deliberately mutates the CSS (property mode deletes one property; selector mode deletes one selector block), then runs a two-stage AI pipeline to propose a fix from the rendered diff.

The pipeline is VLM → LLM, not VLM alone:

Apply current candidate CSS to the variant page.
Render baseline + variant; compute pixel diff.
Stage 1 (VLM): send the diff overlay to the VLM with a structured "list the changes" prompt; parse a CHANGE list (selector { prop: from → to } rows).
Stage 2 (LLM): hand the CHANGE list + current CSS to an LLM (default claude-sonnet-*); LLM emits the actual CSS edits to apply. The LLM compensates for VLM imprecision — it may rewrite, merge, or discard VLM proposals based on the CSS context.
Re-run; stop when diffRatio falls below threshold (= FIXED) or --max-rounds is exhausted.

FIXED means "the pipeline converged," not "the VLM understood the diff." It's common to see VLM propose selectors unrelated to the deleted block while the LLM still emits a working fix. When using this harness to benchmark a VLM model, compare CHANGE-list quality between models — don't read FIXED as a VLM verdict on its own. The banner line VLM=<id> | LLM=<id> makes the two stages explicit.

Invocation

fix-loop is run directly from source (not via the vrt CLI):

node --experimental-strip-types src/experiments/css-challenge/fix-loop.ts <flags>

It does not ship in ./dist/vrt.mjs. The source form above is the only supported entry point.

When to use

Evaluating a new VLM model on UI-domain understanding.
Comparing two model providers on the same fixture (controlled benchmark).
Understanding "what kind of fix can a VLM actually propose?" before shipping a self-healing feature.

When NOT to use

Production self-repair on an arbitrary user repo — the harness only knows fixtures registered in css-challenge-fixtures.ts. Adapting to a new repo means writing a fixture entry + goal CSS first.
Bulk regression triage: use vrt-visual-diff for one-shot reads.
Per-PR CI gate: use vrt-regression-watch.

Quickstart

This repo uses direnv; .envrc auto-loads .env.local (where OPENROUTER_API_KEY / ANTHROPIC_API_KEY live). If you're shelling outside the direnv context, run set -a; source .env.local; set +a first.

# Property mode (default): delete one CSS property; pipeline proposes restoration
node --experimental-strip-types src/experiments/css-challenge/fix-loop.ts \
  --fixture page --seed 42

# Selector mode: delete one full selector block (harder)
node --experimental-strip-types src/experiments/css-challenge/fix-loop.ts \
  --fixture page --seed 11 --mode selector --max-rounds 3

# Run with a specific VLM model (any OpenRouter id, `gemini:*`, or `claude:*`).
# The Stage-2 LLM is held constant across VLM swaps so VLM quality
# can be compared apples-to-apples.
VRT_VLM_MODEL="bytedance/ui-tars-1.5-7b" \
  node --experimental-strip-types src/experiments/css-challenge/fix-loop.ts \
  --fixture page --seed 11 --mode selector

Available fixtures

Listed in src/experiments/css-challenge/css-challenge-fixtures.ts. Common entries:

Fixture	Layout	Typical seeds
`page`	README-style article + sidebar	1-99
(others registered in the file)	…	…

The seed maps deterministically to "which property / selector got deleted." Seed 11 in selector mode is the canonical hard case (.readme-body pre losing 6 properties → 4.1% diffRatio) used in VLM benchmarks.

VLM model selection

The harness honours VRT_VLM_MODEL. Prefix selects the provider:

Prefix	Provider	Example
(no prefix)	OpenRouter	`bytedance/ui-tars-1.5-7b`
`gemini:`	Google AI	`gemini:gemini-2.5-flash`
`claude:`	Anthropic	`claude:claude-haiku-4-5-20251001`

Current recommendations (from .claude/CLAUDE.md):

Default: bytedance/ui-tars-1.5-7b (UI-domain-trained, ~1.4s single-call, $0/call). Verified FIXED in round 1 on the canonical hard case (seed 11, .readme-body pre {6 props}, 4.1% diff → 0.0%).
Stable / detailed: qwen/qwen3-vl-30b-a3b-instruct (emits hex codes directly).
Baseline fallback: amazon/nova-lite-v1.
High coverage + prose root-cause: claude:claude-haiku-4-5-20251001 (~4.2s single-call, ~$2e-6/call; also FIXED in round 1 on seed 11 — works as Stage-1 VLM despite format divergence, Stage-2 LLM handles it).

Avoid: meta-llama/llama-4-scout (regressed; verbose), meta-llama/llama-4-maverick (returns "image not available"), google/gemini-2.5-flash-lite (hallucinates uniform deltas).

See docs/reports/2026-05-19-vlm-haiku-vs-uitars.md for the latest 2-way re-bench; docs/reports/2026-05-18-vlm-claude-vs-openrouter-vs-newcomers.md for the 8-way bench from the prior week.

Flags

Flag	Default	Purpose
`--fixture <name>`	—	Required. Fixture id from `css-challenge-fixtures.ts`
`--seed <int>`	—	Required. Seeds the deterministic mutation
`--mode <property\|selector>`	`property`	Mutation granularity
`--max-rounds <int>`	5	Hard ceiling on iterations
`--threshold <float>`	0.001	diffRatio at which FIXED is declared
`--no-db`	off	Skip writing the benchmark DB row

Environment

Variable	Required when
`VRT_VLM_MODEL`	Always (defaults if unset). Provider auto-detected from prefix
`OPENROUTER_API_KEY`	Unprefixed model id
`GEMINI_API_KEY`	`gemini:` prefix
`ANTHROPIC_API_KEY`	`claude:` prefix
`DEBUG_VRT=1`	Verbose VLM round logging

Reading the output

Real banner + first round of a real run on seed 11 selector mode:

VLM=bytedance/ui-tars-1.5-7b | LLM=claude-sonnet-4-20250514
Removed block: .readme-body pre { 6 props }

Round 1:
  VLM: 5 changes (3383ms)
    .main         { padding: 16px → 24px }
    .sidebar      { width: 100% → 296px }
    .header-nav   { display: none → flex }
    .header-search{ max-width: none → 320px }
    .tabs         { padding: 0 16px → 0 24px }
  LLM: 6 fixes proposed
  diff: 4.12% → 0.00% (FIXED ✓)

Things to read:

VLM=<id> | LLM=<id> — confirms which two models drove the pipeline.
VLM: N changes (Tms) — VLM stage's wall-clock + the CHANGE list. Per-row format selector { prop: from → to }.
LLM: M fixes proposed — Stage-2 emitted M CSS edits; usually M ≥ N (LLM expands / corrects).
diff: x% → y% — diffRatio before this round's edits vs after.
FIXED ✓ when y falls below --threshold (default 0.001).

Pipeline divergence warning: if Removed block: names something the VLM never mentions in its CHANGE list, but the pipeline still hits FIXED, the LLM compensated. Treat that run as a win for the pipeline, not for the VLM. To grade the VLM in isolation, score the CHANGE list against the known-removed block (e.g. selector recall, property recall).

A "stalled" run shows diffRatio holding steady across rounds — the VLM's proposals aren't parseable, or the LLM's emitted fixes aren't structurally valid CSS. Set DEBUG_VRT=1 to see both stages' raw output.

At the end of the run, a summary table prints one row per round with columns:

Column	Meaning
`Round`	1-indexed iteration number
`Diff`	diffRatio after this round's edits applied
`Changes`	rows in VLM's CHANGE list (Stage 1)
`Fixes`	CSS edits emitted by LLM (Stage 2). Usually `Fixes ≥ Changes`
`Escalated`	`false` if the default LLM tier handled it; `true` if the harness fell back to a higher-capability model. A run with `Escalated=true` cost more — relevant for cross-model benchmarks.

Adapting to a new repo

The harness is fixture-bound — to run on a user repo:

Add a fixture entry in css-challenge-fixtures.ts describing the page (HTML + goal CSS + variant CSS template).
Confirm baseline + variant render the same when seed maps to a no-op (sanity check).
Run the loop.

If the new repo is large enough that fixture-style isolation isn't viable, this skill is the wrong tool — use vrt-visual-diff to surface the regression and edit by hand.

Costs (rough)

Per call, based on the 2026-05-18 bench:

bytedance/ui-tars-1.5-7b: ~$0.1e-6 / $0.2e-6 (input / output).
claude:claude-haiku-4-5-*: ~$0.002 / call.

Budget consideration: a 3-round fix-loop on Haiku ≈ $0.006 / run; on ui-tars-1.5-7b ≈ negligible. For batch benchmark runs (>100 calls), prefer the OpenRouter models.

vrt-css-fix-loop

Plus depuis ce dépôt

Plus depuis ce dépôt

vrt-css-fix-loop

Invocation

When to use

When NOT to use

Quickstart

Available fixtures

VLM model selection

Flags

Environment

Reading the output

Adapting to a new repo

Costs (rough)

vrt-css-fix-loop

Invocation

When to use

When NOT to use

Quickstart

Available fixtures

VLM model selection

Flags

Environment

Reading the output

Adapting to a new repo

Costs (rough)