一键在 Manus 中运行任何 Skill

$pwd:

comfy-qa

Name: Comfy Qa
Author: Comfy-Org

// Generate high-quality narrated QA demo videos for Comfy-Org products. Given a target (PR, issue, feature doc, or product URL), generates a story, Playwright spec, and MP4 video with TTS narration. Enforces user-journey-first authoring: probe → story → videoScript → playwright.

在 Manus 中运行

$ git log --oneline --stat

stars:2

forks:0

updated:2026年4月13日 09:23

SKILL.md

readonly

related-skills.json

同仓库

comfy-qa.md

from "Comfy-Org/Comfy-QA"

Generate high-quality narrated QA demo videos for Comfy-Org products. Takes a PR, issue, feature doc, or product URL as input and produces a story, Playwright spec, and MP4 video with TTS narration and HUD overlay.

2026-05-022

comfy-qa-setup.md

from "Comfy-Org/Comfy-QA"

Set up the comfy-qa skill for a frontend repository. Adds .claude/skills/comfy-qa/SKILL.md, playwright.qa.config.ts, and a starter qa.spec.ts tailored to the repo stack.

2026-04-082

package.json

"author": "Comfy-Org"

"repository": "Comfy-Org/Comfy-QA"

打开 GitHub 仓库查看创作者相关仓库

$ install --global

$ download --local

在 Manus 中运行

$ useful --forSOC

软件质量保证分析师与测试员计算机与数学类职业15-1253L4

一键运行任何 Skill

name	comfy-qa
description	Generate high-quality narrated QA demo videos for Comfy-Org products. Given a target (PR, issue, feature doc, or product URL), generates a story, Playwright spec, and MP4 video with TTS narration. Enforces user-journey-first authoring: probe → story → videoScript → playwright.

comfy-qa

Generate narrated QA demo videos for Comfy-Org products. The full pipeline: probe a live site → write a user-journey story → generate a VIDEO_SCRIPT → translate to Playwright → render MP4 with TTS narration and cursor HUD.

When to use

The user asks for a demo of a Comfy-Org product (or a feature within one)
You're adding a new file to demo/
You're updating an existing demo to cover a new feature
The QA target is a feature, in which case the "story" is "how a real user would actually use this feature"

Workflow (5 phases — do them in order)

1. Probe         → Understand what exists on the live site
2. Story         → Write demo/stories/<product>.story.md (happy paths in plain English)
3. Video Script  → Write VIDEO_SCRIPT constant at top of demo/<product>.spec.ts
4. Review        → Read the video script aloud in your head — does it flow?
5. Playwright    → Translate VIDEO_SCRIPT into createVideoScript() calls below it

Skip no phase. Do not write Playwright code in phase 1–4.

Phase 1 — Probe

Use a throwaway script to discover the page structure. This is the only time you touch a browser before writing code.

cat > /tmp/probe.mjs << 'EOF'
import { chromium } from 'playwright';
const b = await chromium.launch({ headless: true });
const p = await (await b.newContext()).newPage();
await p.goto('https://<target-url>/', { waitUntil: 'domcontentloaded' });
await p.waitForTimeout(3000);
console.log('Title:', await p.title());
console.log('H1:', await p.locator('h1').first().textContent().catch(() => null));
console.log('Visible buttons/links:', await p.$$eval('button, a', els =>
  els.filter(e => e.offsetParent).slice(0, 30).map(e => e.textContent.trim().slice(0,60))));
console.log('Inputs:', await p.$$eval('input', els =>
  els.filter(e => e.offsetParent).map(e => ({type: e.type, ph: e.placeholder}))));
await b.close();
EOF
bun /tmp/probe.mjs

Goal: identify the 5–10 features a real user would actually use, in the order they'd encounter them.

Phase 2 — Story

Write demo/stories/<product>.story.md with the happy path(s) as a plain-English user journey. Treat the user as a real person with a goal, not a feature checklist.

Template:

# <Product Name> — User Story

## Who
A <user persona> (e.g., "a ComfyUI developer looking for an LLM node implementation", "a stakeholder evaluating the platform").

## Goal
What the user wants to accomplish in one sentence.

## Happy path
1. **Land on home page** — The user arrives via a Google search or a link in Discord. They see <hero element> and immediately understand what the site does.
2. **Find the search box** — They notice the prominent search input at the top.
3. **Type their query** — They type "controlnet sdxl" and hit Enter.
4. **Scan results** — Results appear instantly. They see file names, line numbers, and a snippet of matched code.
5. **Click into a result** — They click the most relevant entry to see the full file.
6. ... etc

## Key features to demo
- Search across thousands of repos
- Live result streaming
- Click-through to source
- Repository filter / language filter
- ...

## Implicit narration (what the voiceover should explain)
- WHY the site exists (problem it solves)
- WHO it's for
- HOW it differs from grep/GitHub search
- WHAT the next step is for the viewer

The story is for humans (and future you), not the agent. It captures intent before detail.

User journey constraint (mandatory)

Every story must describe a complete user workflow — not a museum tour. A user journey has:

An entry point — how the user arrives (Google search, Discord link, bookmark)
A concrete goal — "install a ControlNet node" not "explore the page"
Real actions to reach that goal — search, click result, read detail, copy install command
A measurable outcome — the user has accomplished something (copied a command, submitted a form, downloaded a file)

The "Could I follow along?" test

Read your happy path steps aloud. If a viewer could follow the same steps on their own computer and achieve the same outcome, the story passes. If the viewer would just be watching someone hover over things, rewrite it.

BAD: museum tour (DON'T)

1. Land on homepage → hover heading
2. Hover over search bar
3. Hover over nav links
4. Scroll down → hover pagination
5. Hover over login button

GOOD: real workflow (DO)

1. Land on homepage → read the hero
2. Click search bar → type "controlnet" → scan results
3. Click the top result → land on detail page
4. Read install command → click Copy button
5. Scroll to version history → confirm active maintenance
6. Click logo → return to homepage

Phase 3 — Video Script (THE source of truth)

At the top of demo/<product>.spec.ts, define a VIDEO_SCRIPT constant. Every word in narration will be read aloud by Gemini TTS during recording. Write it as a continuous, natural voiceover — not a list of feature labels.

/**
 * VIDEO SCRIPT — every word in `narration` is read aloud verbatim by Gemini TTS.
 *
 * Read this aloud in your head before writing any Playwright code.
 * If it sounds like a robot reading bullet points, rewrite it.
 * If it sounds like a human walking a friend through the site, ship it.
 */
const VIDEO_SCRIPT = [
  {
    kind: "title",
    text: "Comfy Code Search",
    subtitle: "Find any node in seconds",
    durationMs: 2000,
  },
  {
    kind: "segment",
    narration: "Welcome to Comfy Code Search — the fastest way to find code across every ComfyUI custom node ever published.",
    visuals: ["safeMove h1", "safeMove main"],
  },
  {
    kind: "segment",
    narration: "I'm a developer looking for an example of how to load a ControlNet model. Let me click the search box at the top of the page.",
    visuals: ["safeMove input[type=search]"],
  },
  {
    kind: "segment",
    narration: "Now I'm typing 'controlnet loader'. Notice how the results stream in live as I type — no need to hit enter.",
    visuals: ["typeKeys 'controlnet loader'"],
  },
  {
    kind: "segment",
    narration: "Hundreds of matches across different repositories. Each result shows the file name, the matching line, and the surrounding context.",
    visuals: ["wheel +400"],
  },
  // ...more segments
  {
    kind: "outro",
    text: "Comfy Code Search",
    subtitle: "cs.comfy.org",
    durationMs: 2000,
  },
] as const;

Narration writing rules

First person, present tense. "I'm clicking the search box" not "User clicks search box".
Every word of narration must pair with a visual action. If the narrator is talking, the cursor must be moving, hovering, scrolling, or typing. Zero idle frames. A segment where the mouse sits still while the voiceover explains "why this matters" is a failed segment — rewrite it to hover related elements while explaining, or cut it.
Connect segments with transitional phrases. "Let's start by…", "Next…", "While we're here…", "Before we wrap up…", "Finally…". This eliminates the choppy "label-label-label" feel.
Explain WHY, not just WHAT. "Let me search for ControlNet because that's the most common feature people look for" beats "I'm searching for ControlNet."
Sentences read aloud should feel natural at 1× speed. ~140 words per minute is normal speech. A 5-second segment fits ~12 words.
Avoid jargon the target audience wouldn't know unless you immediately explain it.
End with a CTA or summary. The outro should give the viewer somewhere to go next.
If a segment has nothing to show, cut it. Pure commentary segments with no UI to demonstrate should be merged into adjacent segments that do have visual actions. Never write a segment whose visuals hint is just "pause" or "mouse.move center".

`visuals` — light hints only

visuals is a non-executable comment array that documents what should be on screen during this narration. The actual Playwright work happens in phase 5. Keep visuals notes brief — they're just a reminder for the implementer.

Segment classification (mandatory)

Every segment must be one of three types. Annotate each VIDEO_SCRIPT entry with a // TYPE comment.

Type	What happens	`setup` callback	`action` callback	Example
NAVIGATE	Page changes URL	`page.goto()` or nav click	Visual hover on new page	Click node card → detail page loads
INTERACT	User input changes state	Optional	`typeKeys`, `mouse.click`, `keyboard.press`	Type search query, click Copy, select time range
OBSERVE	Read-only viewing	None	`safeMove`, `mouse.wheel`, `mouse.move`	Scroll README, hover metadata

Minimum interaction ratio (non-negotiable)

Rule	Threshold
NAVIGATE + INTERACT segments	≥ 50% of total
OBSERVE-only segments	≤ 30% of total
Consecutive OBSERVE segments	≤ 2 in a row

If your 10-segment script has 7 OBSERVE segments, it's a hover tour. Go back to Phase 2 and redesign the user journey.

Why? Gemini 3.1 Pro scored our existing demos 3.7–6.3/10 because 93.6% of segments were OBSERVE-only.

Example annotated script

const VIDEO_SCRIPT = [
  { kind: "title", ... },
  { kind: "segment", narration: "Welcome to the registry...",              /* OBSERVE */ },
  { kind: "segment", narration: "Let me type controlnet in the search...", /* INTERACT */ },
  { kind: "segment", narration: "Results stream in instantly...",          /* OBSERVE */ },
  { kind: "segment", narration: "I'll click KJNodes to see details...",    /* NAVIGATE */ },
  { kind: "segment", narration: "Here's the install command — Copy...",    /* INTERACT */ },
  { kind: "segment", narration: "Version history shows 37 releases...",    /* OBSERVE */ },
  { kind: "segment", narration: "Clicking the logo takes me home...",      /* NAVIGATE */ },
  { kind: "outro", ... },
] as const;
// Ratio: 2 NAVIGATE + 2 INTERACT + 3 OBSERVE = 57% interactive ✅

Phase 4 — Review the script

Before writing any Playwright code:

Read every narration string aloud in your head. Time yourself with a stopwatch. Each segment should be 4–10 seconds at conversational speed.
Check the flow. Does sentence N+1 follow naturally from sentence N? Or does it jump?
Prune anything redundant. If two consecutive segments say the same thing, merge them.
Verify total length. A good demo is 30–90 seconds. 21 segments × 6 seconds each = 126 seconds (too long). Aim for 6–10 segments unless the product is dense.
Never claim something the demo can't show. If you say "now I'll click the buy button", the visuals had better actually click it.
Count segment types. Tag each segment as NAVIGATE, INTERACT, or OBSERVE. If NAVIGATE + INTERACT < 50%, go back to Phase 2 and add real user actions.
Check for museum-tour streaks. If you have 3+ consecutive OBSERVE segments, merge or replace them with an INTERACT segment.

If the script doesn't pass this review, rewrite it. Do not proceed to Phase 5 with a weak script. A weak script ⇒ a boring video ⇒ wasted Gemini TTS API quota and human review time.

Phase 5 — Playwright implementation

Now translate VIDEO_SCRIPT into actual createVideoScript() chains. The implementation lives below the VIDEO_SCRIPT constant in the same file.

import { test, safeMove } from "../../demo/fixtures/fixture";
import { createVideoScript } from "../../lib/demowright/dist/index.mjs";

const VIDEO_SCRIPT = [/* ...from phase 3... */] as const;

test("comfy code search tour", async ({ page }) => {
  test.setTimeout(5 * 60_000);

  // Pre-navigate (heavy work BEFORE script.render — see hard rules below)
  await page.goto("https://cs.comfy.org/", { waitUntil: "domcontentloaded" });
  await page.waitForTimeout(1500);

  // Translate VIDEO_SCRIPT → createVideoScript builder
  const script = createVideoScript()
    .title(VIDEO_SCRIPT[0].text, {
      subtitle: VIDEO_SCRIPT[0].subtitle,
      durationMs: VIDEO_SCRIPT[0].durationMs,
    })
    .segment(VIDEO_SCRIPT[1].narration, async (pace) => {
      await safeMove(page, "h1");
      await pace();
      await safeMove(page, "main");
      await pace();
    })
    .segment(VIDEO_SCRIPT[2].narration, async (pace) => {
      await safeMove(page, 'input[type="search"]');
      await pace();
    })
    .segment(VIDEO_SCRIPT[3].narration, async (pace) => {
      await typeKeys(page, "controlnet loader", 80);
      await pace();
    })
    // ... etc
    .outro({
      text: VIDEO_SCRIPT[VIDEO_SCRIPT.length - 1].text,
      subtitle: VIDEO_SCRIPT[VIDEO_SCRIPT.length - 1].subtitle,
      durationMs: VIDEO_SCRIPT[VIDEO_SCRIPT.length - 1].durationMs,
    });

  await script.prepare(page);
  await script.render(page, { baseName: "comfy-codesearch", outputDir: ".comfy-qa/videos" });
});

The narration text in .segment(...) calls must reference VIDEO_SCRIPT (e.g., VIDEO_SCRIPT[1].narration) rather than being inlined as string literals. This guarantees the rendered audio matches the script of record.

Coverage scoring (non-negotiable principle)

Never evaluate demo quality as simple "pass/fail". Instead, enumerate every possible operation on the target site and score coverage as a ratio.

How to score

Enumerate all operations — List every CRUD operation, route, UI component, and interactive feature the site exposes. Sources (in priority order):
- OpenAPI spec or TypeDefs (if source code is available)
- YAML checklist (demo/checklists/*.yaml)
- Route/URL enumeration from the live site
- Component/feature counting from demo/stories/*.story.md
Map demo coverage — For each operation, mark whether the demo actually demonstrates it:
- A full CRUD entity = 4 operations (Create, Read, Update, Delete)
- A video showing CRU but not D = 3/4
- A read-only page with 5 sections, demo shows 3 = 3/5
Calculate ratio — demonstrated / total_possible
Report line by line — Never summarize as a single number. Show the full matrix so gaps are visible.

Example

## registry.comfy.org coverage: 14/22 (64%)

| Feature              | C | R | U | D | Notes           |
|----------------------|---|---|---|---|-----------------|
| Search nodes         |   | ✅ |   |   | types + results |
| Node detail page     |   | ✅ |   |   |                 |
| Publish a node       | ❌ | ❌ | ❌ | ❌ | auth required   |
| Filter by OS         |   | ✅ |   |   |                 |
| Sort results         |   | ✅ |   |   |                 |
| Pagination           |   | ✅ |   |   |                 |
| ...                  |   |   |   |   |                 |

This principle applies to all phases: story writing (phase 2 must acknowledge total surface area), video scripting (phase 3 must justify what's omitted), and project status reporting.

Anti-patterns (things that produce bad demos)

These were identified by Gemini 3.1 Pro review of generated videos (avg 3.7–6.3/10). Avoid them.

1. Museum tour — hover everything, touch nothing

// BAD: 50 segments of safeMove, nothing clicked
.segment("Here's the search bar.", async (pace) => {
  await safeMove(page, 'input[placeholder*="Search"]'); await pace();
})
.segment("Here's the login button.", async (pace) => {
  await safeMove(page, 'button:has-text("Login")'); await pace();
})

Fix: Actually USE the search bar. Type a query. Click a result. One INTERACT segment replaces 5 OBSERVE segments.

2. Narrating intent instead of doing it

// BAD: Two segments where one would do
.segment("I'm going to click the search box now.", async (pace) => { ... })
.segment("Now I'm typing controlnet.", async (pace) => { ... })

Fix: One segment: "Let me search for controlnet — the most common query." with typeKeys in the action.

3. "Out of scope" for navigable features

## Out of scope
- Node detail pages (can't navigate inside segments)  ← WRONG

Fix: With setup callbacks, you CAN navigate. Remove artificial scope limits.

4. Commentary without action

// BAD: narrator talks but cursor just sits at center
.segment("The registry is valuable because...", async (pace) => {
  await page.mouse.move(640, 360); await pace();
})

Fix: Merge commentary into an INTERACT segment. Explain WHY while the user is actively doing something.

5. Segments that only wait

// BAD: just waiting for results to appear
.segment("Results appear.", async (pace) => {
  await page.waitForTimeout(1500); await pace();
})

Fix: Merge into the typing segment — results appear as the user types.

Hard rules (non-negotiable)

These are encoded in past commit history and bug reports:

Rule	Why
No `page.goto` inside `.segment()` ACTION callbacks	Heavy nav overruns audio length → silent gaps in final video (demowright #10). Use the `setup` callback instead: `.segment("narration", { setup: async () => { await page.goto(...) }, action: async (pace) => { /* visual only */ } })`
No `page.click()` that triggers navigation inside ACTION callbacks	Same overrun problem. Navigation clicks go in `setup`, visual-only clicks go in `action`.
Use `safeMove(page, sel)` not `moveToEl`	`moveToEl` throws on missing selectors (demowright #2)
Title/outro `durationMs: 2000`	Long enough to read, short enough not to bore
Pre-navigate BEFORE `script.render()`	Wall clock counts toward segment timing once render starts
Always call `script.prepare(page)` first	Pre-fetches ALL TTS in parallel — no API latency mid-test
Only `safeMove` / `mouse.wheel` / `mouse.move` / `hover` / `typeKeys` in segments	These don't call `page.evaluate` which can hang on SSR sites (demowright #3)
No `element.focus()` inside segments	`focus()` calls `page.evaluate()` internally → hangs on many sites. Use `mouse.click(boundingBox)` instead.
For SSR sites that hang anyway (e.g., www.comfy.org)	Block all `<script>` resources via `page.route`, see `comfy-website.spec.ts`
ZERO idle frames — always move the mouse	Every segment must have continuous visual action. Never let the cursor sit still while narrating. If you're explaining something, hover related elements. If there's nothing to show, shorten or cut the segment.

Setup callback patterns (reference)

The setup callback runs BEFORE narration starts — use it for page transitions and heavy operations. The action callback runs DURING narration — visual-only actions here.

Pattern 1: Navigate to a new page

.segment(VIDEO_SCRIPT[N].narration, {
  setup: async () => {
    const link = page.locator('a[href*="/nodes/"]').first();
    const box = await link.boundingBox().catch(() => null);
    if (box) await page.mouse.click(box.x + box.width / 2, box.y + box.height / 2);
    await page.waitForTimeout(3000);
  },
  action: async (pace) => {
    await safeMove(page, "h1"); await pace();
    await safeMove(page, "code, pre"); await pace();
  },
})

Pattern 2: Fill a form and submit

.segment(VIDEO_SCRIPT[N].narration, {
  setup: async () => {
    const input = page.locator("input[placeholder*='URL']").first();
    const box = await input.boundingBox().catch(() => null);
    if (box) await page.mouse.click(box.x + box.width / 2, box.y + box.height / 2);
    await input.fill("https://example.com/image.png");
    const btn = page.locator('button:has-text("Load")').first();
    const btnBox = await btn.boundingBox().catch(() => null);
    if (btnBox) await page.mouse.click(btnBox.x + btnBox.width / 2, btnBox.y + btnBox.height / 2);
    await page.waitForTimeout(3000);
  },
  action: async (pace) => {
    await safeMove(page, "[class*='result']"); await pace();
  },
})

Pattern 3: Same-page button (no setup needed)

// Buttons that don't navigate are safe in action callbacks
.segment(VIDEO_SCRIPT[N].narration, async (pace) => {
  await clickTimeRange(page, "1 Month");
  await page.mouse.move(300, 400); await pace();
  await page.mouse.move(700, 400); await pace();
})

When to use setup vs action

Operation	Where	Why
`page.goto()`	`setup`	Network latency is unpredictable
Click that triggers navigation	`setup`	Page load time varies
`waitForTimeout(>500ms)`	`setup`	Long waits = silent gaps in narration
`typeKeys()` (visual typing)	`action`	User sees typing live
`safeMove()` / `mouse.move()`	`action`	Visual cursor movement
`mouse.click()` on same-page button	`action`	Fast, no navigation
`mouse.wheel()`	`action`	Viewer sees scrolling

Output paths

All generated output goes to .comfy-qa/ (gitignored):

.comfy-qa/02-stories/<product>.story.md   ← Phase 2 output
.comfy-qa/03-spec/<product>.spec.ts       ← Phases 3–5 output
.comfy-qa/01-checklists/<product>.yaml    ← Feature checklist (optional)
.comfy-qa/04-videos/<baseName>.mp4        ← Final video
.comfy-qa/04-videos/tmp/<base>.wav        ← Intermediate audio
.comfy-qa/.tmp/demos/<test>/           ← Playwright artifacts

Shared infrastructure (committed to repo):

demo/fixtures/fixture.ts               ← Playwright fixture (HUD + TTS)
demo/fixtures/mux-reporter.ts          ← Video + audio → MP4 muxer

Reference templates

Spec	Segs	NAVIGATE	INTERACT	OBSERVE	Ratio	Notes
`.comfy-qa/03-spec/download-data.spec.ts`	8	0	5	3	63% ✅	clicks 4 time-range buttons + chart sweep
`.comfy-qa/03-spec/registry-web.spec.ts`	12	3	5	4	67% ✅	search → detail page → back
`.comfy-qa/03-spec/comfy-vibe.spec.ts`	9	2	5	2	78% ✅	sidebar nav + search + filter tabs

Best example: .comfy-qa/03-spec/comfy-vibe.spec.ts — 78% interactive, sidebar click, search, filter, sort, CTA click. Setup pattern example: .comfy-qa/03-spec/registry-web.spec.ts — navigate to detail page and back via setup callback. SSR workaround: .comfy-qa/03-spec/comfy-website.spec.ts — blocks all <script> for Nuxt sites that hang on page.evaluate.

name	comfy-qa
description	Generate high-quality narrated QA demo videos for Comfy-Org products. Given a target (PR, issue, feature doc, or product URL), generates a story, Playwright spec, and MP4 video with TTS narration. Enforces user-journey-first authoring: probe → story → videoScript → playwright.

comfy-qa

When to use

The user asks for a demo of a Comfy-Org product (or a feature within one)
You're adding a new file to demo/
You're updating an existing demo to cover a new feature
The QA target is a feature, in which case the "story" is "how a real user would actually use this feature"

Workflow (5 phases — do them in order)

1. Probe         → Understand what exists on the live site
2. Story         → Write demo/stories/<product>.story.md (happy paths in plain English)
3. Video Script  → Write VIDEO_SCRIPT constant at top of demo/<product>.spec.ts
4. Review        → Read the video script aloud in your head — does it flow?
5. Playwright    → Translate VIDEO_SCRIPT into createVideoScript() calls below it

Skip no phase. Do not write Playwright code in phase 1–4.

Phase 1 — Probe

Use a throwaway script to discover the page structure. This is the only time you touch a browser before writing code.

cat > /tmp/probe.mjs << 'EOF'
import { chromium } from 'playwright';
const b = await chromium.launch({ headless: true });
const p = await (await b.newContext()).newPage();
await p.goto('https://<target-url>/', { waitUntil: 'domcontentloaded' });
await p.waitForTimeout(3000);
console.log('Title:', await p.title());
console.log('H1:', await p.locator('h1').first().textContent().catch(() => null));
console.log('Visible buttons/links:', await p.$$eval('button, a', els =>
  els.filter(e => e.offsetParent).slice(0, 30).map(e => e.textContent.trim().slice(0,60))));
console.log('Inputs:', await p.$$eval('input', els =>
  els.filter(e => e.offsetParent).map(e => ({type: e.type, ph: e.placeholder}))));
await b.close();
EOF
bun /tmp/probe.mjs

Goal: identify the 5–10 features a real user would actually use, in the order they'd encounter them.

Phase 2 — Story

Write demo/stories/<product>.story.md with the happy path(s) as a plain-English user journey. Treat the user as a real person with a goal, not a feature checklist.

Template:

# <Product Name> — User Story

## Who
A <user persona> (e.g., "a ComfyUI developer looking for an LLM node implementation", "a stakeholder evaluating the platform").

## Goal
What the user wants to accomplish in one sentence.

## Happy path
1. **Land on home page** — The user arrives via a Google search or a link in Discord. They see <hero element> and immediately understand what the site does.
2. **Find the search box** — They notice the prominent search input at the top.
3. **Type their query** — They type "controlnet sdxl" and hit Enter.
4. **Scan results** — Results appear instantly. They see file names, line numbers, and a snippet of matched code.
5. **Click into a result** — They click the most relevant entry to see the full file.
6. ... etc

## Key features to demo
- Search across thousands of repos
- Live result streaming
- Click-through to source
- Repository filter / language filter
- ...

## Implicit narration (what the voiceover should explain)
- WHY the site exists (problem it solves)
- WHO it's for
- HOW it differs from grep/GitHub search
- WHAT the next step is for the viewer

The story is for humans (and future you), not the agent. It captures intent before detail.

User journey constraint (mandatory)

Every story must describe a complete user workflow — not a museum tour. A user journey has:

An entry point — how the user arrives (Google search, Discord link, bookmark)
A concrete goal — "install a ControlNet node" not "explore the page"
Real actions to reach that goal — search, click result, read detail, copy install command
A measurable outcome — the user has accomplished something (copied a command, submitted a form, downloaded a file)

The "Could I follow along?" test

BAD: museum tour (DON'T)

1. Land on homepage → hover heading
2. Hover over search bar
3. Hover over nav links
4. Scroll down → hover pagination
5. Hover over login button

GOOD: real workflow (DO)

1. Land on homepage → read the hero
2. Click search bar → type "controlnet" → scan results
3. Click the top result → land on detail page
4. Read install command → click Copy button
5. Scroll to version history → confirm active maintenance
6. Click logo → return to homepage

Phase 3 — Video Script (THE source of truth)

/**
 * VIDEO SCRIPT — every word in `narration` is read aloud verbatim by Gemini TTS.
 *
 * Read this aloud in your head before writing any Playwright code.
 * If it sounds like a robot reading bullet points, rewrite it.
 * If it sounds like a human walking a friend through the site, ship it.
 */
const VIDEO_SCRIPT = [
  {
    kind: "title",
    text: "Comfy Code Search",
    subtitle: "Find any node in seconds",
    durationMs: 2000,
  },
  {
    kind: "segment",
    narration: "Welcome to Comfy Code Search — the fastest way to find code across every ComfyUI custom node ever published.",
    visuals: ["safeMove h1", "safeMove main"],
  },
  {
    kind: "segment",
    narration: "I'm a developer looking for an example of how to load a ControlNet model. Let me click the search box at the top of the page.",
    visuals: ["safeMove input[type=search]"],
  },
  {
    kind: "segment",
    narration: "Now I'm typing 'controlnet loader'. Notice how the results stream in live as I type — no need to hit enter.",
    visuals: ["typeKeys 'controlnet loader'"],
  },
  {
    kind: "segment",
    narration: "Hundreds of matches across different repositories. Each result shows the file name, the matching line, and the surrounding context.",
    visuals: ["wheel +400"],
  },
  // ...more segments
  {
    kind: "outro",
    text: "Comfy Code Search",
    subtitle: "cs.comfy.org",
    durationMs: 2000,
  },
] as const;

Narration writing rules

First person, present tense. "I'm clicking the search box" not "User clicks search box".
Every word of narration must pair with a visual action. If the narrator is talking, the cursor must be moving, hovering, scrolling, or typing. Zero idle frames. A segment where the mouse sits still while the voiceover explains "why this matters" is a failed segment — rewrite it to hover related elements while explaining, or cut it.
Connect segments with transitional phrases. "Let's start by…", "Next…", "While we're here…", "Before we wrap up…", "Finally…". This eliminates the choppy "label-label-label" feel.
Explain WHY, not just WHAT. "Let me search for ControlNet because that's the most common feature people look for" beats "I'm searching for ControlNet."
Sentences read aloud should feel natural at 1× speed. ~140 words per minute is normal speech. A 5-second segment fits ~12 words.
Avoid jargon the target audience wouldn't know unless you immediately explain it.
End with a CTA or summary. The outro should give the viewer somewhere to go next.
If a segment has nothing to show, cut it. Pure commentary segments with no UI to demonstrate should be merged into adjacent segments that do have visual actions. Never write a segment whose visuals hint is just "pause" or "mouse.move center".

`visuals` — light hints only

Segment classification (mandatory)

Every segment must be one of three types. Annotate each VIDEO_SCRIPT entry with a // TYPE comment.

Type	What happens	`setup` callback	`action` callback	Example
NAVIGATE	Page changes URL	`page.goto()` or nav click	Visual hover on new page	Click node card → detail page loads
INTERACT	User input changes state	Optional	`typeKeys`, `mouse.click`, `keyboard.press`	Type search query, click Copy, select time range
OBSERVE	Read-only viewing	None	`safeMove`, `mouse.wheel`, `mouse.move`	Scroll README, hover metadata

Minimum interaction ratio (non-negotiable)

Rule	Threshold
NAVIGATE + INTERACT segments	≥ 50% of total
OBSERVE-only segments	≤ 30% of total
Consecutive OBSERVE segments	≤ 2 in a row

If your 10-segment script has 7 OBSERVE segments, it's a hover tour. Go back to Phase 2 and redesign the user journey.

Why? Gemini 3.1 Pro scored our existing demos 3.7–6.3/10 because 93.6% of segments were OBSERVE-only.

Example annotated script

const VIDEO_SCRIPT = [
  { kind: "title", ... },
  { kind: "segment", narration: "Welcome to the registry...",              /* OBSERVE */ },
  { kind: "segment", narration: "Let me type controlnet in the search...", /* INTERACT */ },
  { kind: "segment", narration: "Results stream in instantly...",          /* OBSERVE */ },
  { kind: "segment", narration: "I'll click KJNodes to see details...",    /* NAVIGATE */ },
  { kind: "segment", narration: "Here's the install command — Copy...",    /* INTERACT */ },
  { kind: "segment", narration: "Version history shows 37 releases...",    /* OBSERVE */ },
  { kind: "segment", narration: "Clicking the logo takes me home...",      /* NAVIGATE */ },
  { kind: "outro", ... },
] as const;
// Ratio: 2 NAVIGATE + 2 INTERACT + 3 OBSERVE = 57% interactive ✅

Phase 4 — Review the script

Before writing any Playwright code:

Read every narration string aloud in your head. Time yourself with a stopwatch. Each segment should be 4–10 seconds at conversational speed.
Check the flow. Does sentence N+1 follow naturally from sentence N? Or does it jump?
Prune anything redundant. If two consecutive segments say the same thing, merge them.
Verify total length. A good demo is 30–90 seconds. 21 segments × 6 seconds each = 126 seconds (too long). Aim for 6–10 segments unless the product is dense.
Never claim something the demo can't show. If you say "now I'll click the buy button", the visuals had better actually click it.
Count segment types. Tag each segment as NAVIGATE, INTERACT, or OBSERVE. If NAVIGATE + INTERACT < 50%, go back to Phase 2 and add real user actions.
Check for museum-tour streaks. If you have 3+ consecutive OBSERVE segments, merge or replace them with an INTERACT segment.

If the script doesn't pass this review, rewrite it. Do not proceed to Phase 5 with a weak script. A weak script ⇒ a boring video ⇒ wasted Gemini TTS API quota and human review time.

Phase 5 — Playwright implementation

Now translate VIDEO_SCRIPT into actual createVideoScript() chains. The implementation lives below the VIDEO_SCRIPT constant in the same file.

import { test, safeMove } from "../../demo/fixtures/fixture";
import { createVideoScript } from "../../lib/demowright/dist/index.mjs";

const VIDEO_SCRIPT = [/* ...from phase 3... */] as const;

test("comfy code search tour", async ({ page }) => {
  test.setTimeout(5 * 60_000);

  // Pre-navigate (heavy work BEFORE script.render — see hard rules below)
  await page.goto("https://cs.comfy.org/", { waitUntil: "domcontentloaded" });
  await page.waitForTimeout(1500);

  // Translate VIDEO_SCRIPT → createVideoScript builder
  const script = createVideoScript()
    .title(VIDEO_SCRIPT[0].text, {
      subtitle: VIDEO_SCRIPT[0].subtitle,
      durationMs: VIDEO_SCRIPT[0].durationMs,
    })
    .segment(VIDEO_SCRIPT[1].narration, async (pace) => {
      await safeMove(page, "h1");
      await pace();
      await safeMove(page, "main");
      await pace();
    })
    .segment(VIDEO_SCRIPT[2].narration, async (pace) => {
      await safeMove(page, 'input[type="search"]');
      await pace();
    })
    .segment(VIDEO_SCRIPT[3].narration, async (pace) => {
      await typeKeys(page, "controlnet loader", 80);
      await pace();
    })
    // ... etc
    .outro({
      text: VIDEO_SCRIPT[VIDEO_SCRIPT.length - 1].text,
      subtitle: VIDEO_SCRIPT[VIDEO_SCRIPT.length - 1].subtitle,
      durationMs: VIDEO_SCRIPT[VIDEO_SCRIPT.length - 1].durationMs,
    });

  await script.prepare(page);
  await script.render(page, { baseName: "comfy-codesearch", outputDir: ".comfy-qa/videos" });
});

Coverage scoring (non-negotiable principle)

Never evaluate demo quality as simple "pass/fail". Instead, enumerate every possible operation on the target site and score coverage as a ratio.

How to score

Enumerate all operations — List every CRUD operation, route, UI component, and interactive feature the site exposes. Sources (in priority order):
- OpenAPI spec or TypeDefs (if source code is available)
- YAML checklist (demo/checklists/*.yaml)
- Route/URL enumeration from the live site
- Component/feature counting from demo/stories/*.story.md
Map demo coverage — For each operation, mark whether the demo actually demonstrates it:
- A full CRUD entity = 4 operations (Create, Read, Update, Delete)
- A video showing CRU but not D = 3/4
- A read-only page with 5 sections, demo shows 3 = 3/5
Calculate ratio — demonstrated / total_possible
Report line by line — Never summarize as a single number. Show the full matrix so gaps are visible.

Example

## registry.comfy.org coverage: 14/22 (64%)

| Feature              | C | R | U | D | Notes           |
|----------------------|---|---|---|---|-----------------|
| Search nodes         |   | ✅ |   |   | types + results |
| Node detail page     |   | ✅ |   |   |                 |
| Publish a node       | ❌ | ❌ | ❌ | ❌ | auth required   |
| Filter by OS         |   | ✅ |   |   |                 |
| Sort results         |   | ✅ |   |   |                 |
| Pagination           |   | ✅ |   |   |                 |
| ...                  |   |   |   |   |                 |

This principle applies to all phases: story writing (phase 2 must acknowledge total surface area), video scripting (phase 3 must justify what's omitted), and project status reporting.

Anti-patterns (things that produce bad demos)

These were identified by Gemini 3.1 Pro review of generated videos (avg 3.7–6.3/10). Avoid them.

1. Museum tour — hover everything, touch nothing

// BAD: 50 segments of safeMove, nothing clicked
.segment("Here's the search bar.", async (pace) => {
  await safeMove(page, 'input[placeholder*="Search"]'); await pace();
})
.segment("Here's the login button.", async (pace) => {
  await safeMove(page, 'button:has-text("Login")'); await pace();
})

Fix: Actually USE the search bar. Type a query. Click a result. One INTERACT segment replaces 5 OBSERVE segments.

2. Narrating intent instead of doing it

// BAD: Two segments where one would do
.segment("I'm going to click the search box now.", async (pace) => { ... })
.segment("Now I'm typing controlnet.", async (pace) => { ... })

Fix: One segment: "Let me search for controlnet — the most common query." with typeKeys in the action.

3. "Out of scope" for navigable features

## Out of scope
- Node detail pages (can't navigate inside segments)  ← WRONG

Fix: With setup callbacks, you CAN navigate. Remove artificial scope limits.

4. Commentary without action

// BAD: narrator talks but cursor just sits at center
.segment("The registry is valuable because...", async (pace) => {
  await page.mouse.move(640, 360); await pace();
})

Fix: Merge commentary into an INTERACT segment. Explain WHY while the user is actively doing something.

5. Segments that only wait

// BAD: just waiting for results to appear
.segment("Results appear.", async (pace) => {
  await page.waitForTimeout(1500); await pace();
})

Fix: Merge into the typing segment — results appear as the user types.

Hard rules (non-negotiable)

These are encoded in past commit history and bug reports:

Rule	Why
No `page.goto` inside `.segment()` ACTION callbacks	Heavy nav overruns audio length → silent gaps in final video (demowright #10). Use the `setup` callback instead: `.segment("narration", { setup: async () => { await page.goto(...) }, action: async (pace) => { /* visual only */ } })`
No `page.click()` that triggers navigation inside ACTION callbacks	Same overrun problem. Navigation clicks go in `setup`, visual-only clicks go in `action`.
Use `safeMove(page, sel)` not `moveToEl`	`moveToEl` throws on missing selectors (demowright #2)
Title/outro `durationMs: 2000`	Long enough to read, short enough not to bore
Pre-navigate BEFORE `script.render()`	Wall clock counts toward segment timing once render starts
Always call `script.prepare(page)` first	Pre-fetches ALL TTS in parallel — no API latency mid-test
Only `safeMove` / `mouse.wheel` / `mouse.move` / `hover` / `typeKeys` in segments	These don't call `page.evaluate` which can hang on SSR sites (demowright #3)
No `element.focus()` inside segments	`focus()` calls `page.evaluate()` internally → hangs on many sites. Use `mouse.click(boundingBox)` instead.
For SSR sites that hang anyway (e.g., www.comfy.org)	Block all `<script>` resources via `page.route`, see `comfy-website.spec.ts`
ZERO idle frames — always move the mouse	Every segment must have continuous visual action. Never let the cursor sit still while narrating. If you're explaining something, hover related elements. If there's nothing to show, shorten or cut the segment.

Setup callback patterns (reference)

The setup callback runs BEFORE narration starts — use it for page transitions and heavy operations. The action callback runs DURING narration — visual-only actions here.

Pattern 1: Navigate to a new page

.segment(VIDEO_SCRIPT[N].narration, {
  setup: async () => {
    const link = page.locator('a[href*="/nodes/"]').first();
    const box = await link.boundingBox().catch(() => null);
    if (box) await page.mouse.click(box.x + box.width / 2, box.y + box.height / 2);
    await page.waitForTimeout(3000);
  },
  action: async (pace) => {
    await safeMove(page, "h1"); await pace();
    await safeMove(page, "code, pre"); await pace();
  },
})

Pattern 2: Fill a form and submit

.segment(VIDEO_SCRIPT[N].narration, {
  setup: async () => {
    const input = page.locator("input[placeholder*='URL']").first();
    const box = await input.boundingBox().catch(() => null);
    if (box) await page.mouse.click(box.x + box.width / 2, box.y + box.height / 2);
    await input.fill("https://example.com/image.png");
    const btn = page.locator('button:has-text("Load")').first();
    const btnBox = await btn.boundingBox().catch(() => null);
    if (btnBox) await page.mouse.click(btnBox.x + btnBox.width / 2, btnBox.y + btnBox.height / 2);
    await page.waitForTimeout(3000);
  },
  action: async (pace) => {
    await safeMove(page, "[class*='result']"); await pace();
  },
})

Pattern 3: Same-page button (no setup needed)

// Buttons that don't navigate are safe in action callbacks
.segment(VIDEO_SCRIPT[N].narration, async (pace) => {
  await clickTimeRange(page, "1 Month");
  await page.mouse.move(300, 400); await pace();
  await page.mouse.move(700, 400); await pace();
})

When to use setup vs action

Operation	Where	Why
`page.goto()`	`setup`	Network latency is unpredictable
Click that triggers navigation	`setup`	Page load time varies
`waitForTimeout(>500ms)`	`setup`	Long waits = silent gaps in narration
`typeKeys()` (visual typing)	`action`	User sees typing live
`safeMove()` / `mouse.move()`	`action`	Visual cursor movement
`mouse.click()` on same-page button	`action`	Fast, no navigation
`mouse.wheel()`	`action`	Viewer sees scrolling

Output paths

All generated output goes to .comfy-qa/ (gitignored):

.comfy-qa/02-stories/<product>.story.md   ← Phase 2 output
.comfy-qa/03-spec/<product>.spec.ts       ← Phases 3–5 output
.comfy-qa/01-checklists/<product>.yaml    ← Feature checklist (optional)
.comfy-qa/04-videos/<baseName>.mp4        ← Final video
.comfy-qa/04-videos/tmp/<base>.wav        ← Intermediate audio
.comfy-qa/.tmp/demos/<test>/           ← Playwright artifacts

Shared infrastructure (committed to repo):

demo/fixtures/fixture.ts               ← Playwright fixture (HUD + TTS)
demo/fixtures/mux-reporter.ts          ← Video + audio → MP4 muxer

Reference templates

Spec	Segs	NAVIGATE	INTERACT	OBSERVE	Ratio	Notes
`.comfy-qa/03-spec/download-data.spec.ts`	8	0	5	3	63% ✅	clicks 4 time-range buttons + chart sweep
`.comfy-qa/03-spec/registry-web.spec.ts`	12	3	5	4	67% ✅	search → detail page → back
`.comfy-qa/03-spec/comfy-vibe.spec.ts`	9	2	5	2	78% ✅	sidebar nav + search + filter tabs

comfy-qa

同仓库更多 Skills

同仓库更多 Skills

comfy-qa

When to use

Workflow (5 phases — do them in order)

Phase 1 — Probe

Phase 2 — Story

User journey constraint (mandatory)

The "Could I follow along?" test

BAD: museum tour (DON'T)

GOOD: real workflow (DO)

Phase 3 — Video Script (THE source of truth)

Narration writing rules

visuals — light hints only

Segment classification (mandatory)

Minimum interaction ratio (non-negotiable)

Example annotated script

Phase 4 — Review the script

Phase 5 — Playwright implementation

Coverage scoring (non-negotiable principle)

How to score

Example

Anti-patterns (things that produce bad demos)

1. Museum tour — hover everything, touch nothing

2. Narrating intent instead of doing it

3. "Out of scope" for navigable features

4. Commentary without action

5. Segments that only wait

Hard rules (non-negotiable)

Setup callback patterns (reference)

Pattern 1: Navigate to a new page

Pattern 2: Fill a form and submit

Pattern 3: Same-page button (no setup needed)

When to use setup vs action

Output paths

Reference templates

comfy-qa

When to use

Workflow (5 phases — do them in order)

Phase 1 — Probe

Phase 2 — Story

User journey constraint (mandatory)

The "Could I follow along?" test

BAD: museum tour (DON'T)

GOOD: real workflow (DO)

Phase 3 — Video Script (THE source of truth)

Narration writing rules

visuals — light hints only

Segment classification (mandatory)

Minimum interaction ratio (non-negotiable)

Example annotated script

Phase 4 — Review the script

Phase 5 — Playwright implementation

Coverage scoring (non-negotiable principle)

How to score

Example

Anti-patterns (things that produce bad demos)

1. Museum tour — hover everything, touch nothing

2. Narrating intent instead of doing it

3. "Out of scope" for navigable features

4. Commentary without action

5. Segments that only wait

Hard rules (non-negotiable)

Setup callback patterns (reference)

Pattern 1: Navigate to a new page

Pattern 2: Fill a form and submit

Pattern 3: Same-page button (no setup needed)

When to use setup vs action

Output paths

Reference templates

`visuals` — light hints only

`visuals` — light hints only