Run any Skill in Manus with one click

$pwd:

triaging-visual-review-runs

Name: Triaging Visual Review Runs
Author: PostHog

// Inspects PostHog Visual Review (VR) runs that gate PR merges with screenshot regression checks. Use when the user mentions "visual review", "VR", "snapshot diff", "screenshot test", "storybook regression", "playwright snapshot", asks why a PR is blocked or what changed visually, wants to triage the VR backlog, decide whether a snapshot diff is real vs flaky, or check whether a story has been changing across runs. Also invoke when a PR has a failing `visual-review` status check, when a PR comment mentions "Visual review", or when the user is on a branch with an open VR run.

Run Skill in Manus

$ git log --oneline --stat

stars:34,779

forks:2,794

updated:May 30, 2026 at 13:37

SKILL.md

readonly

related-skills.json

same repository

signals-scout-general.md

from "PostHog/posthog"

Generic Signals scout — examines a PostHog project end-to-end (errors, replays, web analytics, experiments, warehouse, integrations) and emits a small number of high-confidence findings via emit_finding(). Use when you want a broad first-pass look at a project to surface anything worth a closer look. Designed for the headless Signals agent harness, but useful as a manual starting point for any agent exploring a new PostHog project.

2026-05-3034.8k

improving-drf-endpoints.md

from "PostHog/posthog"

Use when editing, reviewing, or auditing DRF viewsets and serializers in PostHog. Triggers on files in posthog/api/, products/*/backend/api/, products/*/backend/presentation/, or any file importing rest_framework. Covers field typing, schema annotations, enum collision fixes, and OpenAPI spec quality — everything that flows downstream into generated TypeScript types and MCP tools.

2026-05-2834.8k

modifying-taxonomic-filter.md

from "PostHog/posthog"

Guides safe modification of the TaxonomicFilter — PostHog's multi-tab picker for events, actions, properties, cohorts, and more. Front-loads the empirical product reality (what users actually pick and search for) so changes can be judged against real behavior, not architectural taste. Use when adding features, fixing bugs, or refactoring TaxonomicFilter.

2026-05-2834.8k

writing-kea-logics.md

from "PostHog/posthog"

Guide for writing or reviewing PostHog kea logic files (`*Logic.ts` / `*Logic.tsx`). Use when creating a new logic, adding actions/reducers/selectors/listeners/loaders/forms/router bindings, choosing between reducer vs selector vs cache, deciding between listeners and `kea-subscriptions`, wiring React with `useValues`/`useActions`/`BindLogic`, or onboarding to kea conventions. Read keajs.org for upstream API; this skill captures PostHog-specific conventions and idioms.

2026-05-2834.8k

optimizing-clickhouse-and-hogql-queries.md

from "PostHog/posthog"

Workflow for optimizing ClickHouse and HogQL queries. Use when a HogQL query, query runner, insight, or report is too slow; when a hand-written ClickHouse query (via `sync_execute` or in a migration) is too slow; when ClickHouse times out or hits memory limits; when investigating a slow `system.query_log` row; or when reviewing a proposed HogQL printer change for performance. Covers extracting the ClickHouse SQL (for HogQL queries), common smells (`FROM ... FINAL`, `JSONExtract` over properties, missing skip indexes, self-joins, CTE blow-up), measuring against a real cluster, and applying the fix at the right layer (printer, query runner, or ClickHouse migration). Does NOT cover Postgres / Django ORM / app-database queries; those need pganalyze and the Postgres section of `query-performance-optimization.md`, not this skill.

2026-05-2834.8k

writing-clickhouse-queries.md

from "PostHog/posthog"

Guide for writing performant ClickHouse queries in PostHog product code. Use when writing HogQL query runners, designing a ClickHouse table for a new product, adding materialized columns or skip indexes, or choosing a row ID format. For optimizing an existing query that is already too slow, use `/optimizing-clickhouse-and-hogql-queries` instead.

2026-05-2834.8k

package.json

"author": "PostHog"

"repository": "PostHog/posthog"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software Quality Assurance Analysts and TestersComputer and Mathematical Occupations15-1253L4

name

triaging-visual-review-runs

description

Inspects PostHog Visual Review (VR) runs that gate PR merges with screenshot regression checks. Use when the user mentions "visual review", "VR", "snapshot diff", "screenshot test", "storybook regression", "playwright snapshot", asks why a PR is blocked or what changed visually, wants to triage the VR backlog, decide whether a snapshot diff is real vs flaky, or check whether a story has been changing across runs. Also invoke when a PR has a failing `visual-review` status check, when a PR comment mentions "Visual review", or when the user is on a branch with an open VR run.

Triaging visual review runs

Visual Review is PostHog's screenshot-regression product: CI captures storybook + playwright screenshots, diffs them against committed baseline hashes, and gates the PR until a human approves the visible changes. A PR with visual changes carries a visual-review GitHub status check that stays red until each diffed snapshot is approved or tolerated in the VR UI.

This skill teaches an agent how to answer the questions a human reviewer would actually ask, by chaining the VR MCP tools — instead of reaching for gh pr view and tab-hopping to the VR web UI. The read tools cover status / scope / history / triage; two write tools (approve-create, tolerate-create) let the agent act once the user confirms.

When this skill applies

Trigger this skill on any of:

A PR number, branch name, or commit SHA paired with words like visual review, VR, snapshot, screenshot, storybook diff, playwright snapshot, baseline, approve, tolerated, quarantine.
Questions about why a PR is blocked, what visually changed, or whether a diff is real.
"Is my run done?" / "What's left to review?" / "Has this story flaked recently?"
A failing visual-review GitHub check or a PR comment from the posthog-bot mentioning visual review.

When the user asks for the rendered diff image itself, the VR web UI is faster — direct them there. This skill is for everything around the diff: status, scope, history, triage.

Tools

Read tools (safe to call freely):

Tool	Purpose
`posthog:visual-review-runs-list`	List runs, filter by `pr_number` / `commit_sha` / `branch` / `review_state`. Start here.
`posthog:visual-review-runs-retrieve`	Full detail for a single run (status, summary counts, supersession).
`posthog:visual-review-runs-snapshots-list`	Per-snapshot results inside a run: identifier, `result`, diff %, classification, baseline + current artifact URLs.
`posthog:visual-review-runs-snapshot-history-list`	A single story's last N runs across master/PRs — the flake check.
`posthog:visual-review-runs-counts-retrieve`	Aggregate counts for queue triage (how many runs in `needs_review`, etc.).
`posthog:visual-review-runs-tolerated-hashes-list`	Hashes the team has explicitly accepted as "known flake / acceptable variation".
`posthog:visual-review-repos-list`	Repos (one per GitHub repo) — usually only one matters; useful for filtering.
`posthog:visual-review-repos-retrieve`	Repo metadata: baseline file paths, PR-comment configuration.

Write tools (require explicit user confirmation — these ship the visual change):

Tool	Purpose
`posthog:visual-review-runs-approve-create`	Approve `changed` / `new` snapshots in a run. Updates the baseline YAML and (by default) pushes a commit to the PR.
`posthog:visual-review-runs-tolerate-create`	Mark a single changed snapshot as a known tolerated alternate. Does NOT change the baseline — use for benign variants.

Approval call shape:

id (required) — the run UUID. It's the route parameter, so the call fails without it.
approve_all: true — approves every changed and new snapshot in the run. Convenient when you've verified every diff is intended.
snapshots: [{identifier, new_hash}] — explicit list. new_hash is the content_hash of each snapshot's current_artifact. Prefer this when only some diffs are intended.
commit_to_github: true (default) — pushes the baseline-YAML update straight to the PR branch. Set false to record the approval without a commit.

Toleration call shape — both fields are required:

id (required) — the run UUID. It's the route parameter, so the call fails without it.
snapshot_id (required) — the UUID of the individual snapshot to tolerate (from visual-review-runs-snapshots-list). This identifies which snapshot inside the run; it does not replace the run id.

If approval fails with 409 stale_run, the run has been superseded — visual-review-runs-list { pr_number } and approve the newest one. A successful approval often kicks off a fresh CI run, which is normal.

Vocabulary cheat sheet

These appear in tool output and matter for interpretation:

Run review_state: needs_review (open, awaiting human), clean (zero diffs), processing (CI still uploading), stale (a newer run on the same PR has superseded this one — check superseded_by_id).
Run run_type: storybook (component snapshots) or playwright (full-page e2e snapshots).
Snapshot result: unchanged, changed (real diff), new (no baseline yet), removed.
Snapshot classification_reason: tolerated_hash (matches a known-tolerated hash, no action needed), below_threshold (under the noise floor), exact (byte-identical), "" (real diff requiring review).
Snapshot review_state: pending or approved.
Run summary: total / changed / new / removed / unchanged / unresolved / tolerated_matched — unresolved is what's actually blocking review.

Workflows

"What's the VR status of this PR?"

The single most common job. Map a PR number to its run state in two calls.

posthog:visual-review-runs-list { pr_number: <n>, limit: 5 } — sort by created_at desc, take the latest non-stale one.
If the run has summary.changed > 0 or summary.unresolved > 0, drill in: posthog:visual-review-runs-snapshots-list { id: <run_id> } and report the changed snapshots.

Report back: PR number, run UUID, review_state, summary counts, and the _posthogUrl deep link so the user can click straight to the diff viewer.

"Is the diff real or unrelated?"

The most useful judgment a code-aware agent can add. Combine three signals: scope match, flake history, and the actual rendered images. The agent should look at the screenshots — not just describe metadata.

Scope check — git diff master...HEAD --stat (or against the PR's base branch) → list of touched paths. Cross-reference with posthog:visual-review-runs-snapshots-list { id } filtered to result: changed → story identifiers. Stories are namespaced like <area>-<scene>--<story>--<theme>; e.g. scenes-app-settings-user--settings-user-profile--dark maps to frontend/src/scenes/settings/user/.... Use this to translate story id → likely source path.
Visual inspection — for each changed snapshot, the tool result contains current_artifact.download_url and baseline_artifact.download_url. These are pre-signed S3 URLs to PNG files; pull them and look:
```
curl -s -o /tmp/vr-baseline.png "<baseline_artifact.download_url>"
curl -s -o /tmp/vr-current.png "<current_artifact.download_url>"
```
Then Read both files (the Read tool renders images visually) and compare. Things to call out:
- The actual visible delta (text changed, button moved, layout shift, color drift, missing element).
- Whether the change is consistent with the diff_pixel_count and diff_percentage in the metadata (e.g. 54% diff but the images look near-identical → screenshot framing changed, not the UI).
- Whether the baseline and current have different dimensions (width / height fields). Mismatched dimensions usually mean the story rendered to a different viewport or didn't fully render before screenshot — a flake signal, not a regression.
Flake history — run the flake check below for any story that looks suspect.
Verdict — combine all three:
- Scope plausible + visible regression matches the code change → real diff, recommend approval.
- Scope mismatch + dimensions mismatch + frequent prior changes → flake, recommend tolerating the hash.
- Scope plausible + visible regression looks unintended → push a fix; do not approve.

Always include a one-line description of what you saw in the images — the user uses this to decide whether to trust your verdict without opening the VR UI themselves.

Flake check: "Has this story been changing?"

Once you have a suspect snapshot identifier:

posthog:visual-review-runs-snapshot-history-list { id: <snapshot_id> } → returns prior outcomes for the same story.

Verdicts:

Mostly unchanged and this run's diff is the outlier → likely a real regression caused by this PR.
Frequent changed across unrelated branches/master → flaky story; recommend tolerating the hash via the UI.
Recent removed or large-jump dimension change → baseline likely stale; recommend re-baselining on master.

Triaging the queue

When the user is doing housekeeping rather than asking about a specific PR:

posthog:visual-review-runs-counts-retrieve → total queue size.
posthog:visual-review-runs-list { review_state: needs_review, limit: 50 } (paginate if needed).
Group by branch author or run_type to surface clusters (e.g., "12 PRs blocked on the same shared component change" usually means a single underlying root cause to address).
Prefer surfacing runs whose summary.changed > 0 over runs that are only new — new means no baseline yet, which is usually trivial to approve; changed is the real review work.

Output expectations

For PR-status questions, lead with the verdict in one line, then 2-4 bullets of supporting context. Always include the _posthogUrl deep link to the run — humans need to see the rendered images to make the call, the agent can only describe the metadata.

For triage / aggregate questions, a short table beats prose. Group by what the user is going to act on.

What NOT to do

Do not approve or tolerate without explicit user confirmation. The verdict is yours to recommend; the decision to ship belongs to the user. Once they say "approve those" / "tolerate that", call the tool.
Do not assume the failing GitHub check on a PR is unrelated to VR — if a visual-review check is red on a PR you're working on, that's the trigger to run this skill.
Do not declare a verdict from metadata alone when result: changed. Pull the baseline and current PNGs and look at them; metadata can only say "something changed", not whether the change is intended.

triaging-visual-review-runs

More from this repository

More from this repository

Triaging visual review runs

When this skill applies

Tools

Vocabulary cheat sheet

Workflows

"What's the VR status of this PR?"

"Is the diff real or unrelated?"

Flake check: "Has this story been changing?"

Triaging the queue

Output expectations

What NOT to do

Triaging visual review runs

When this skill applies

Tools

Vocabulary cheat sheet

Workflows

"What's the VR status of this PR?"

"Is the diff real or unrelated?"

Flake check: "Has this story been changing?"

Triaging the queue

Output expectations

What NOT to do