Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

$pwd:

product-demo

Name: Product Demo
Author: jiweiyuan

// Create a polished product demo video with motion graphics intro, narrated audio, and terminal recordings. Use when the user asks to build a demo video, product walkthrough, or promotional clip for a CLI tool or software project.

In Manus ausführen

$ git log --oneline --stat

stars:10

forks:2

updated:21. Februar 2026 um 23:00

SKILL.md

readonly

related-skills.json

gleiches Repository

tts.md

from "jiweiyuan/ttscli"

Convert text to speech using the tts CLI. Use when the user asks to read text aloud, generate audio, speak something, or convert text to speech.

2026-02-2110

package.json

"author": "jiweiyuan"

"repository": "jiweiyuan/ttscli"

GitHub-Repository öffnen Creator-Repositorys ansehen

$ install --global

$ download --local

In Manus ausführen

$ useful --forSOC

Audio- und VideotechnikerKunst, Design, Unterhaltung, Sport und Medien27-4011L4

GrafikdesignerL4

Jeden Skill mit einem Klick ausführen

name	product-demo
description	Create a polished product demo video with motion graphics intro, narrated audio, and terminal recordings. Use when the user asks to build a demo video, product walkthrough, or promotional clip for a CLI tool or software project.

Product Demo Video — End-to-End Skill

Build a professional product demo video combining narrated audio (TTS), motion graphics (Remotion), and terminal recordings (VHS). The final output is a single .mp4 with synced audio.

Architecture Overview

A demo video has three layers assembled in a pipeline:

1. Narration Audio   →  TTS CLI generates speech from scripts
2. Motion Graphics   →  Remotion renders animated intro/transitions
3. Terminal Demos    →  VHS records scripted terminal sessions
4. Assembly          →  ffmpeg concatenates video + merges audio

Directory structure:

demo/
├── build.sh                  # Master build script (orchestrates everything)
├── build_narration.sh        # Narration pipeline: TTS → scribe → cues
├── narration_script.md       # Narration plan & source file list
├── transcript.md             # Final transcript with timestamps & beat markers
├── narration/                # Per-beat narration scripts (one sentence each)
│   ├── manifest.json         # Beat manifest (id, sequence, role, beatIndex, script)
│   ├── 01_hook.txt           # Act 1 beats (story)
│   ├── 02_stars.txt
│   ├── ...
│   ├── 08_engine.txt         # Act 2 beats (tech)
│   ├── ...
│   ├── 12_voice_cloning.txt  # Act 3 beats (features)
│   ├── ...
│   ├── 18_demo_say.txt       # Act 4 beats (demo)
│   ├── ...
│   └── 23_closing.txt        # Act 5 beats (cta)
├── terminal_voices.tape      # VHS tape: install & setup
├── terminal_speech.tape      # VHS tape: voice cloning & speech
├── terminal_config.tape      # VHS tape: generate, export & workflow
├── ttscli_demo.tape          # VHS tape: full demo (alternative single-take)
├── intro/                    # Remotion project
│   ├── package.json
│   ├── remotion.config.ts
│   ├── tsconfig.json
│   ├── tailwind.config.js
│   ├── public/               # Static assets (audio, images)
│   │   └── ttscli_intro.wav
│   └── src/
│       ├── Root.tsx           # Remotion entry — registers compositions
│       ├── TtsIntro.tsx       # Main composition — scene sequencing
│       ├── design.ts          # Shared palette, fonts, shadows
│       ├── narrationCues.ts   # Auto-generated timing constants from scribe
│       ├── index.ts
│       ├── style.css
│       ├── scenes/            # One component per visual act
│       │   ├── OpenClawStory.tsx     # Act 1: AI Agent story (6 beats)
│       │   ├── HowItWorks.tsx        # Act 2: Engine, backends, install (4 beats)
│       │   ├── FeatureHighlights.tsx  # Act 3: 6 unique feature beats
│       │   ├── LiveDemo.tsx          # Act 4: Persistent terminal (3 beats)
│       │   └── CallToAction.tsx      # Act 5: GitHub CTA + logo lock (3 beats)
│       └── effects/           # Reusable visual effects
│           ├── Backdrop.tsx
│           ├── RhythmOverlay.tsx
│           ├── TerminalChrome.tsx   # Shared terminal window chrome
│           └── Waveform.tsx         # Animated waveform SVG
└── out/                       # Build artifacts (gitignored)
    ├── ttscli_demo.mp4
    ├── intro.mp4
    ├── terminal1.mp4
    ├── terminal2.mp4
    ├── terminal3.mp4
    ├── narration/             # Per-segment WAV files
    │   ├── 01_title.wav
    │   ├── 02_tech.wav
    │   └── ...
    ├── narration_transcripts/ # Scribe JSON outputs per segment
    │   ├── 01_title.json
    │   ├── 02_tech.json
    │   └── ...
    └── narration_timestamps.json  # Combined timeline with all beat markers

Step 1: Write the Narration Script

Plan the story arc first. A good demo narration follows this structure:

Act	Purpose	Beats	Duration
Story / Hook	Grab attention, establish the problem	5–7	12–18s
How It Works	Engine, backends, install	3–5	10–14s
Feature Highlights	One unique visual per feature	4–6	14–20s
Live Demo	Terminal with accumulating commands	3	8–12s
CTA	GitHub link + logo lock	2–3	6–10s

Guidelines

Fast-cut structure: ~2–4 seconds per beat, one sentence each.
Split into separate text files per beat (easier to re-generate individually).
Aim for 60–80 seconds total — attention drops fast.
Each act = one Remotion scene component with internal <Sequence> per beat.

Beat manifest (`narration/manifest.json`)

Define all beats, their ordering, and their sequence grouping:

{
  "fps": 30,
  "segments": [
    { "id": "01_hook",     "sequence": "story",    "role": "beat", "beatIndex": 0, "script": "01_hook.txt" },
    { "id": "02_agents",   "sequence": "story",    "role": "beat", "beatIndex": 1, "script": "02_stars.txt" },
    ...
    { "id": "07_engine",   "sequence": "tech",     "role": "beat", "beatIndex": 0, "script": "08_engine.txt" },
    ...
    { "id": "22_closing",  "sequence": "cta",      "role": "beat", "beatIndex": 2, "script": "23_closing.txt" }
  ]
}

Fields:

id — Unique segment identifier (used as filename for WAV + scribe JSON)
sequence — Groups beats into Remotion scenes (story, tech, features, demo, cta)
role — Always "beat" in the fast-cut architecture
beatIndex — Zero-based index within the sequence (drives internal <Sequence> positioning)
script — Filename of the narration text file in narration/

Generate audio with TTS CLI

# Generate per-segment audio
tts generate "Hey, meet TTS CLI, a text-to-speech tool that runs entirely on your machine." \
  --voice james -o demo/out/01_title.wav

# Or from a file
tts generate --file demo/narration/01_title.txt --voice james -o demo/out/01_title.wav

Concatenate segments with ffmpeg

# Build a concat list
for f in demo/out/0*.wav; do echo "file '$f'"; done > demo/out/concat.txt

# Concatenate
ffmpeg -f concat -safe 0 -i demo/out/concat.txt -c copy demo/out/narration.wav

Extract timestamps with Scribe

After generating audio, use scribe to transcribe each segment and extract precise timestamps. Scribe is a CLI that calls the ElevenLabs transcription API and returns word-level timing data — this is what drives the Remotion animation timeline.

Setup

# Install (Node.js CLI)
npm install -g scribe-cli

# Authenticate with ElevenLabs API key (one-time)
scribe auth

Transcribe individual segments

# Transcribe a single audio segment to JSON (includes duration + word timestamps)
scribe transcribe demo/out/narration/01_title.wav -f json -o demo/out/narration_transcripts/

# Output formats: json, md, txt, srt, all
scribe transcribe demo/out/narration/01_title.wav -f all -o demo/out/narration_transcripts/

# Print to stdout instead of file
scribe transcribe demo/out/narration/01_title.wav -f json --stdout

Scribe CLI options

Flag	Description
`-f, --format <type>`	Output format: `json`, `md`, `txt`, `srt`, `all` (default: `json`)
`-o, --output-dir <dir>`	Output directory (default: `.`)
`-d, --diarize`	Enable speaker diarization
`-s, --speakers <count>`	Speaker count hint (1–32)
`-l, --language <code>`	Language code (ISO-639, e.g. `en`, `zh`)
`--stdout`	Print to stdout instead of writing file
`-q, --quiet`	Suppress progress output

JSON output structure

Scribe JSON output contains the metadata needed for timeline sync:

{
  "text": "Meet TTS CLI, a fully local text-to-speech toolkit...",
  "metadata": {
    "duration": 15.30,
    "language": "en"
  },
  "words": [
    { "word": "Meet", "start": 0.0, "end": 0.32, "confidence": 0.98 },
    { "word": "TTS", "start": 0.35, "end": 0.72, "confidence": 0.95 },
    ...
  ]
}

Key fields:

metadata.duration — exact segment length in seconds (more accurate than ffprobe for timing)
text — verified transcript (catches TTS mispronunciations)
words[].start / words[].end — word-level timestamps for fine-grained sync

Batch transcription in the narration pipeline

The demo/build_narration.sh script automates scribe across all segments:

# Transcribe each segment, extract duration + text, accumulate running offset
for id in "${segment_ids[@]}"; do
  tts generate --file "$script_path" --output "$wav_path" --model "$MODEL"

  if [[ "$RUN_SCRIBE" == "1" ]]; then
    scribe transcribe "$wav_path" -f json -o "$TRANS_DIR"
    duration="$(jq -r '.metadata.duration' "$TRANS_DIR/$id.json")"
    text="$(jq -r '.text' "$TRANS_DIR/$id.json")"
  else
    # Fallback: ffprobe for duration, source script for text
    duration="$(ffprobe -v error -show_entries format=duration \
      -of default=nokey=1:noprint_wrappers=1 "$wav_path")"
    text="$(cat "$script_path")"
  fi

  # Compute frame offset: start_frame = running_seconds × fps
  start_frame=$(awk "BEGIN { printf \"%d\", $running_sec * 30 + 0.5 }")
  # ... accumulate into timeline JSON
done

Control with environment variable:

RUN_SCRIBE=1 ./build_narration.sh   # Use scribe (default) — accurate timestamps
RUN_SCRIBE=0 ./build_narration.sh   # Skip scribe — use ffprobe fallback (offline/faster)

From scribe output → beat markers → Remotion cues

The pipeline converts scribe timestamps into three artifacts:

1. Timeline JSON (demo/out/narration_timestamps.json):

{
  "fps": 30,
  "total_seconds": 84.80,
  "total_frames": 2544,
  "segments": [
    {
      "id": "01_title",
      "sequence": "title",
      "text": "Meet TTS CLI...",
      "start_sec": 0.0,
      "end_sec": 15.30,
      "start_frame": 0,
      "end_frame": 459,
      "duration_frames": 459
    },
    ...
  ]
}

2. Transcript markdown (demo/transcript.md):

| Segment | Start | End | Frame | Text |
|---|---:|---:|---:|---|
| 01_title | 0.00s | 15.30s | 0 | Meet TTS CLI... |
| 02_tech | 15.30s | 38.36s | 459 | Under the hood... |

Frame number = start_seconds × 30 (at 30fps).

3. Remotion narration cues (demo/intro/src/narrationCues.ts):

// Auto-generated from scribe transcription timestamps
export const narrationCues = {
  fps: 30,
  totalFrames: 2250,
  scenes: {
    story:    { from: 0,    duration: 480, beatDurations: [75, 90, 105, 60, 75, 75] },
    tech:     { from: 480,  duration: 330, beatDurations: [90, 90, 90, 60] },
    features: { from: 810,  duration: 510, beatDurations: [90, 90, 90, 90, 75, 75] },
    demo:     { from: 1320, duration: 285, beatDurations: [105, 90, 90] },
    cta:      { from: 1605, duration: 240, beatDurations: [75, 75, 90] },
  },
} as const;

Each scene has uniform shape: from (start frame), duration (total frames), beatDurations[] (per-beat frame counts). This is auto-generated by build_narration.sh from scribe timestamps.

This file is imported by TtsIntro.tsx for top-level <Sequence> placement, and by each scene component for internal beat <Sequence> positioning.

Why scribe over ffprobe alone?

	scribe	ffprobe fallback
Duration accuracy	From speech model — accounts for silence trimming	File-level — includes trailing silence
Verified transcript	Catches TTS errors (mispronunciations, skipped words)	Uses source script (assumes TTS was perfect)
Word-level timing	Available — enables per-word animation sync	Not available
Offline use	❌ Requires ElevenLabs API	✅ Fully offline
Speed	~2-5s per segment (API call)	Instant

Recommendation: Use scribe for the final build (accurate timing), use ffprobe fallback during rapid iteration.

Single-segment regeneration (iterating on one beat)

TTS output has randomness — the same text produces different results each run. When a segment sounds bad, generate 3 versions, let the user pick, then patch all downstream artifacts.

1. Generate multiple candidates

# Generate 3 versions for comparison (run in parallel)
tts generate --file demo/narration/05_silence.txt --output demo/out/narration/05_reveal_v1.wav --model 0.6B
tts generate --file demo/narration/05_silence.txt --output demo/out/narration/05_reveal_v2.wav --model 0.6B
tts generate --file demo/narration/05_silence.txt --output demo/out/narration/05_reveal_v3.wav --model 0.6B

Present durations to the user so they can audition and pick.

2. Replace and get new duration

cp demo/out/narration/05_reveal_v2.wav demo/out/narration/05_reveal.wav
ffprobe -v error -show_entries format=duration -of default=nokey=1:noprint_wrappers=1 demo/out/narration/05_reveal.wav

3. Update timestamps JSON (recompute all offsets)

Use jq to patch the single segment's duration and recompute all subsequent offsets:

jq '
  .segments |= (
    map(if .id == "SEGMENT_ID" then .duration_sec = NEW_DUR | .duration_frames = (NEW_DUR * 30 | round) else . end) |
    reduce range(length) as $i (
      .;
      if $i == 0 then
        .[$i].start_sec = 0 | .[$i].start_frame = 0 |
        .[$i].end_sec = .[$i].duration_sec | .[$i].end_frame = .[$i].duration_frames
      else
        .[$i].start_sec = .[$i-1].end_sec | .[$i].start_frame = .[$i-1].end_frame |
        .[$i].end_sec = (.[$i].start_sec + .[$i].duration_sec) |
        .[$i].end_frame = (.[$i].start_frame + .[$i].duration_frames)
      end
    )
  ) |
  .total_seconds = .segments[-1].end_sec |
  .total_frames = .segments[-1].end_frame
' demo/out/narration_timestamps.json > tmp.json && mv tmp.json demo/out/narration_timestamps.json

4. Rebuild all downstream artifacts

After patching timestamps JSON, regenerate these three (can run in parallel):

Re-concat audio — rebuild concat.txt from manifest order, ffmpeg -y -f concat, copy to public/ and root
Regenerate narrationCues.ts — rebuild scene blocks from timeline JSON (same logic as write_cues_ts())
Regenerate transcript.md — rebuild markdown table from timeline JSON (same logic as write_transcript_md())

Summary: single-segment patch checklist

Generate 3 candidate WAVs (parallel)
User picks → copy to official filename
ffprobe → get new duration
jq → patch timestamps JSON + recompute offsets
Re-concat audio + copy to public/ and root
Regenerate narrationCues.ts
Regenerate transcript.md

This avoids re-generating all other segments and takes ~10 seconds vs minutes for the full pipeline.

Step 2: Build Motion Graphics with Remotion

Project setup

cd demo
npx create-video@latest intro --template blank --tailwind
cd intro && npm install

Design system (`design.ts`)

Define a shared palette, fonts, and shadows so all scenes look consistent:

export const palette = {
  ink: "#111827",
  inkMuted: "#5B6475",
  bg: "#FFF8F5",
  bgPanel: "#FFFFFF",
  accent: "#FF6154",
  cool: "#3B82F6",
  // ...
} as const;

export const fonts = {
  display: "'Avenir Next', sans-serif",
  mono: "'JetBrains Mono', monospace",
} as const;

Scene components

Each scene is a React component using Remotion primitives:

useCurrentFrame() — current frame number (drives all animation)
useVideoConfig() — fps, width, height, duration
spring() — physics-based easing for entrances
interpolate() — map frame ranges to CSS values (opacity, translateY, scale)
<Sequence> — place a component at a specific time range

Pattern for a scene component:

import { AbsoluteFill, interpolate, spring, useCurrentFrame, useVideoConfig } from "remotion";

export const TitleCard: React.FC = () => {
  const frame = useCurrentFrame();
  const { fps } = useVideoConfig();

  // Entrance animation
  const enter = spring({ frame: frame - 8, fps, config: { damping: 14, stiffness: 120 } });

  // Fade-out before next scene
  const fadeOut = interpolate(frame, [437, 487], [1, 0], {
    extrapolateLeft: "clamp",
    extrapolateRight: "clamp",
  });

  return (
    <AbsoluteFill style={{ opacity: fadeOut }}>
      <div style={{
        opacity: enter,
        transform: `translateY(${interpolate(enter, [0, 1], [34, 0])}px)`,
        fontSize: 178,
        fontWeight: 800,
      }}>
        TTS CLI
      </div>
    </AbsoluteFill>
  );
};

Audio-synced timeline (`TtsIntro.tsx`)

The main composition sequences scenes using beat markers from the transcript:

import { AbsoluteFill, Audio, Sequence, staticFile } from "remotion";

export const TtsIntro: React.FC = () => (
  <AbsoluteFill>
    <Audio src={staticFile("ttscli_intro.wav")} />
    <Sequence from={0} durationInFrames={520}>
      <TitleCard />
    </Sequence>
    <Sequence from={487} durationInFrames={723}>
      <TechOverview />
    </Sequence>
    {/* ... more scenes ... */}
  </AbsoluteFill>
);

Narration cues file (`narrationCues.ts`)

Auto-generate this from transcript timestamps so scene timing stays in sync:

export const narrationCues = {
  fps: 30,
  totalFrames: 2544,
  scenes: {
    title:    { from: 0,    duration: 520  },
    tech:     { from: 487,  duration: 723  },
    features: { from: 1177, duration: 891  },
    terminal: { from: 2035, duration: 509  },
  },
};

Render

cd demo/intro
npx remotion render TtsIntro --output ../out/intro.mp4 --codec h264

Tips for motion graphics

Start simple — animate one element at a time, iterate.
Use spring() for entrances — feels natural, avoids linear motion.
Cross-fades: overlap scenes by ~30 frames (1s) and use interpolate() for fade-out on the outgoing scene.
Feature pills/badges: small animated labels that pop in one by one with staggered delays.
Avoid small decorative icons next to titles — stroke-based SVG icons (bolt, mic, terminal) render poorly at small sizes in video (broken borders, barely visible). If the scene content already illustrates the concept (e.g., progress bars for streaming, waveforms for voice cloning, terminal chrome for scripting), the title text alone is cleaner. Only use icons when they are the primary visual element (e.g., Apple logo + CPU chip for "MLX Native").
Background effects: subtle particle/grid animations add depth without distracting (see Backdrop.tsx).
Consistent resolution: always 1920×1080 @ 30fps across all segments.

Card layout guidelines

When building capability/feature cards in a row:

Fixed-height illustration boxes — when showing multiple cards side-by-side, give the illustration area a fixed height (e.g. 160px) so all cards match visually. Use display: "flex", alignItems: "center" inside to vertically center varied content.
Card sizing — for a 3-card row on 1920px canvas, 380px per card with connector arrows between them works well. Don't go below 320px or text gets cramped.
Centering SVG + text — when stacking an SVG icon above text, use display: "flex", flexDirection: "column", alignItems: "center" on the container instead of textAlign: "center". The latter won't reliably center inline SVG elements.

Themed scenes (e.g. GitHub-style)

When a scene references an external brand or platform, define a local token object for that theme instead of using the global palette. This keeps the scene self-contained and visually distinct.

// GitHub light theme tokens — scoped to one scene
const gh = {
  bg: "#ffffff",
  bgSubtle: "#f6f8fa",
  cardBg: "#ffffff",
  border: "#d0d7de",
  text: "#1f2328",
  textMuted: "#656d76",
  btnBg: "#f6f8fa",
  btnBorder: "#d0d7de",
  starYellow: "#e3b341",
  link: "#0969da",
} as const;

Tips for themed scenes:

Skip the shared <Backdrop> — use a flat backgroundColor matching the platform's style instead.
Reproduce recognizable UI elements — e.g., GitHub repo card with icon, description, star button, language dot, topic tags. These are instantly familiar and more engaging than abstract placeholders.
Animate interactions — e.g., a star button "click" with spring() pop, a counter rolling from N to N+1. Makes the scene feel alive.
Prefer light themes when the overall video uses a light design system. Dark-themed scenes create jarring contrast.

Concrete vs abstract illustrations

Prefer concrete, terminal-style content inside card illustration boxes over abstract graphics:

Abstract (avoid)	Concrete (prefer)
Neural network dots	Agent thinking steps: 🔍 `read codebase...` → 🧠 `analyzing...` → 📋 `plan: 3 steps`
Floating particles	Code snippet with syntax highlighting
Generic waveform	Terminal pipeline: `$ running...` → `✓ git done` → `✓ test done`

Concrete illustrations are more readable at video resolution and immediately communicate what the feature does.

Step 3: Record Terminal Demos with VHS

VHS records scripted terminal sessions as video.

Install

brew install charmbracelet/tap/vhs

Write a `.tape` file

Each terminal segment gets its own tape file:

# Terminal Scene: Install & Setup
Output out/terminal1.mp4
Set Width 1920
Set Height 1080
Set Framerate 30
Set FontFamily "Menlo"
Set FontSize 22
Set Theme "Github"
Set Padding 40
Set TypingSpeed 30ms
Set CursorBlink true
Set Shell zsh

Sleep 400ms

Type "curl -fsSL https://example.com/install.sh | bash"
Sleep 150ms
Enter
Sleep 4000ms

Type "mytool --version"
Sleep 150ms
Enter
Sleep 1500ms

Sleep 400ms

Tape file guidelines

Setting	Recommended Value	Why
`Width` / `Height`	1920 × 1080	Match Remotion resolution
`Framerate`	30	Match Remotion fps
`Theme`	Github (light) or Dracula (dark)	Consistent look
`TypingSpeed`	30ms	Fast enough to not bore, slow enough to read
`Sleep` after Enter	2000–4000ms	Let output render before next command

Record

vhs terminal_voices.tape
vhs terminal_speech.tape
vhs terminal_config.tape

Tips

Target 16 seconds per segment — trim in the assembly step.
Plan 3 terminal segments covering: setup, core features, advanced/workflow.
Keep commands short and readable — avoid long flags when possible.
End each tape with a Sleep 400ms buffer.

Step 4: Assemble the Final Video

The build script (demo/build.sh) orchestrates everything:

Timing plan

Segment        Start   Duration  Frames
Intro (motion) 0:00    28s       840
Label 1        0:28    2s        60      (optional title card)
Terminal 1     0:30    16s       480
Label 2        0:46    2s        60
Terminal 2     0:48    16s       480
Label 3        1:04    2s        60
Terminal 3     1:06    16s       480

Trim terminals to exact duration

ffmpeg -y -i out/terminal1.mp4 -t 16 \
  -c:v libx264 -preset fast -crf 18 -pix_fmt yuv420p -r 30 -an \
  out/terminal1_trimmed.mp4

Concatenate segments

cat > out/concat_list.txt <<EOF
file 'intro.mp4'
file 'label1.mp4'
file 'terminal1_trimmed.mp4'
file 'label2.mp4'
file 'terminal2_trimmed.mp4'
file 'label3.mp4'
file 'terminal3_trimmed.mp4'
EOF

ffmpeg -y -f concat -safe 0 -i out/concat_list.txt \
  -c:v libx264 -preset fast -crf 18 -pix_fmt yuv420p -r 30 -an \
  out/concat.mp4

Merge narration audio

ffmpeg -y -i out/concat.mp4 -i ttscli_intro.wav \
  -c:v copy -c:a aac -b:a 128k -ar 44100 -ac 2 \
  -shortest -movflags +faststart \
  out/ttscli_demo.mp4

Run the full pipeline

cd demo
./build.sh           # Build everything
./build.sh remotion  # Only re-render motion graphics
./build.sh terminals # Only re-record terminal demos
./build.sh merge     # Only re-assemble final video

Quick-Start Checklist

When asked to create a product demo, follow these steps:

[ ] Write narration script — Plan 4 scenes, ~60–90s total, one sentence per feature. Create narration/manifest.json + per-segment .txt files.
[ ] Generate audio — Use tts generate per segment, concatenate with ffmpeg.
[ ] Transcribe with scribe — Run scribe transcribe on each segment to get accurate durations and verified text. Compute beat markers (frame = seconds × fps).
[ ] Generate timing artifacts — Write narration_timestamps.json, transcript.md, and narrationCues.ts from scribe output. Or run build_narration.sh to automate steps 2–4.
[ ] Create design system — Define palette, fonts, shadows in design.ts.
[ ] Build Remotion scenes — One component per scene, use spring() + interpolate(), sync to audio beat markers from narrationCues.ts.
[ ] Write VHS tapes — One .tape per terminal segment, 1920×1080 @ 30fps, ~16s each.
[ ] Record terminals — vhs <tape>.tape for each.
[ ] Assemble — Trim terminals, concatenate all segments, merge audio with ffmpeg.
[ ] Review & iterate — Watch the full video, adjust timing, re-render individual pieces as needed.

Required Tools

Tool	Install	Purpose
`tts`	`pip install tts-cli`	Narration audio generation
`node` / `npx`	`brew install node`	Remotion rendering
`remotion`	`npx create-video@latest`	Motion graphics
`vhs`	`brew install charmbracelet/tap/vhs`	Terminal recording
`ffmpeg`	`brew install ffmpeg`	Video/audio processing
`scribe`	`npm install -g scribe-cli` + `scribe auth`	Transcription for accurate timestamps (ElevenLabs API)

Reference Files

Existing demo: demo/ directory in this repo
Video build script: demo/build.sh — renders Remotion, records VHS, assembles final MP4
Narration build script: demo/build_narration.sh — TTS generation → scribe transcription → timestamp extraction → narrationCues.ts
Segment manifest: demo/narration/manifest.json — defines segment order, roles, and script files
Narration scripts: demo/narration/0*.txt — one text file per segment
Remotion project: demo/intro/
Remotion timing cues: demo/intro/src/narrationCues.ts (auto-generated from scribe)
VHS tapes: demo/terminal_*.tape
Narration plan: demo/narration_script.md
Transcript & beat markers: demo/transcript.md
Remotion tips: remotion-tip.md

name	product-demo
description	Create a polished product demo video with motion graphics intro, narrated audio, and terminal recordings. Use when the user asks to build a demo video, product walkthrough, or promotional clip for a CLI tool or software project.

Product Demo Video — End-to-End Skill

Build a professional product demo video combining narrated audio (TTS), motion graphics (Remotion), and terminal recordings (VHS). The final output is a single .mp4 with synced audio.

Architecture Overview

A demo video has three layers assembled in a pipeline:

1. Narration Audio   →  TTS CLI generates speech from scripts
2. Motion Graphics   →  Remotion renders animated intro/transitions
3. Terminal Demos    →  VHS records scripted terminal sessions
4. Assembly          →  ffmpeg concatenates video + merges audio

Directory structure:

demo/
├── build.sh                  # Master build script (orchestrates everything)
├── build_narration.sh        # Narration pipeline: TTS → scribe → cues
├── narration_script.md       # Narration plan & source file list
├── transcript.md             # Final transcript with timestamps & beat markers
├── narration/                # Per-beat narration scripts (one sentence each)
│   ├── manifest.json         # Beat manifest (id, sequence, role, beatIndex, script)
│   ├── 01_hook.txt           # Act 1 beats (story)
│   ├── 02_stars.txt
│   ├── ...
│   ├── 08_engine.txt         # Act 2 beats (tech)
│   ├── ...
│   ├── 12_voice_cloning.txt  # Act 3 beats (features)
│   ├── ...
│   ├── 18_demo_say.txt       # Act 4 beats (demo)
│   ├── ...
│   └── 23_closing.txt        # Act 5 beats (cta)
├── terminal_voices.tape      # VHS tape: install & setup
├── terminal_speech.tape      # VHS tape: voice cloning & speech
├── terminal_config.tape      # VHS tape: generate, export & workflow
├── ttscli_demo.tape          # VHS tape: full demo (alternative single-take)
├── intro/                    # Remotion project
│   ├── package.json
│   ├── remotion.config.ts
│   ├── tsconfig.json
│   ├── tailwind.config.js
│   ├── public/               # Static assets (audio, images)
│   │   └── ttscli_intro.wav
│   └── src/
│       ├── Root.tsx           # Remotion entry — registers compositions
│       ├── TtsIntro.tsx       # Main composition — scene sequencing
│       ├── design.ts          # Shared palette, fonts, shadows
│       ├── narrationCues.ts   # Auto-generated timing constants from scribe
│       ├── index.ts
│       ├── style.css
│       ├── scenes/            # One component per visual act
│       │   ├── OpenClawStory.tsx     # Act 1: AI Agent story (6 beats)
│       │   ├── HowItWorks.tsx        # Act 2: Engine, backends, install (4 beats)
│       │   ├── FeatureHighlights.tsx  # Act 3: 6 unique feature beats
│       │   ├── LiveDemo.tsx          # Act 4: Persistent terminal (3 beats)
│       │   └── CallToAction.tsx      # Act 5: GitHub CTA + logo lock (3 beats)
│       └── effects/           # Reusable visual effects
│           ├── Backdrop.tsx
│           ├── RhythmOverlay.tsx
│           ├── TerminalChrome.tsx   # Shared terminal window chrome
│           └── Waveform.tsx         # Animated waveform SVG
└── out/                       # Build artifacts (gitignored)
    ├── ttscli_demo.mp4
    ├── intro.mp4
    ├── terminal1.mp4
    ├── terminal2.mp4
    ├── terminal3.mp4
    ├── narration/             # Per-segment WAV files
    │   ├── 01_title.wav
    │   ├── 02_tech.wav
    │   └── ...
    ├── narration_transcripts/ # Scribe JSON outputs per segment
    │   ├── 01_title.json
    │   ├── 02_tech.json
    │   └── ...
    └── narration_timestamps.json  # Combined timeline with all beat markers

Step 1: Write the Narration Script

Plan the story arc first. A good demo narration follows this structure:

Act	Purpose	Beats	Duration
Story / Hook	Grab attention, establish the problem	5–7	12–18s
How It Works	Engine, backends, install	3–5	10–14s
Feature Highlights	One unique visual per feature	4–6	14–20s
Live Demo	Terminal with accumulating commands	3	8–12s
CTA	GitHub link + logo lock	2–3	6–10s

Guidelines

Fast-cut structure: ~2–4 seconds per beat, one sentence each.
Split into separate text files per beat (easier to re-generate individually).
Aim for 60–80 seconds total — attention drops fast.
Each act = one Remotion scene component with internal <Sequence> per beat.

Beat manifest (`narration/manifest.json`)

Define all beats, their ordering, and their sequence grouping:

{
  "fps": 30,
  "segments": [
    { "id": "01_hook",     "sequence": "story",    "role": "beat", "beatIndex": 0, "script": "01_hook.txt" },
    { "id": "02_agents",   "sequence": "story",    "role": "beat", "beatIndex": 1, "script": "02_stars.txt" },
    ...
    { "id": "07_engine",   "sequence": "tech",     "role": "beat", "beatIndex": 0, "script": "08_engine.txt" },
    ...
    { "id": "22_closing",  "sequence": "cta",      "role": "beat", "beatIndex": 2, "script": "23_closing.txt" }
  ]
}

Fields:

id — Unique segment identifier (used as filename for WAV + scribe JSON)
sequence — Groups beats into Remotion scenes (story, tech, features, demo, cta)
role — Always "beat" in the fast-cut architecture
beatIndex — Zero-based index within the sequence (drives internal <Sequence> positioning)
script — Filename of the narration text file in narration/

Generate audio with TTS CLI

# Generate per-segment audio
tts generate "Hey, meet TTS CLI, a text-to-speech tool that runs entirely on your machine." \
  --voice james -o demo/out/01_title.wav

# Or from a file
tts generate --file demo/narration/01_title.txt --voice james -o demo/out/01_title.wav

Concatenate segments with ffmpeg

# Build a concat list
for f in demo/out/0*.wav; do echo "file '$f'"; done > demo/out/concat.txt

# Concatenate
ffmpeg -f concat -safe 0 -i demo/out/concat.txt -c copy demo/out/narration.wav

Extract timestamps with Scribe

Setup

# Install (Node.js CLI)
npm install -g scribe-cli

# Authenticate with ElevenLabs API key (one-time)
scribe auth

Transcribe individual segments

# Transcribe a single audio segment to JSON (includes duration + word timestamps)
scribe transcribe demo/out/narration/01_title.wav -f json -o demo/out/narration_transcripts/

# Output formats: json, md, txt, srt, all
scribe transcribe demo/out/narration/01_title.wav -f all -o demo/out/narration_transcripts/

# Print to stdout instead of file
scribe transcribe demo/out/narration/01_title.wav -f json --stdout

Scribe CLI options

Flag	Description
`-f, --format <type>`	Output format: `json`, `md`, `txt`, `srt`, `all` (default: `json`)
`-o, --output-dir <dir>`	Output directory (default: `.`)
`-d, --diarize`	Enable speaker diarization
`-s, --speakers <count>`	Speaker count hint (1–32)
`-l, --language <code>`	Language code (ISO-639, e.g. `en`, `zh`)
`--stdout`	Print to stdout instead of writing file
`-q, --quiet`	Suppress progress output

JSON output structure

Scribe JSON output contains the metadata needed for timeline sync:

{
  "text": "Meet TTS CLI, a fully local text-to-speech toolkit...",
  "metadata": {
    "duration": 15.30,
    "language": "en"
  },
  "words": [
    { "word": "Meet", "start": 0.0, "end": 0.32, "confidence": 0.98 },
    { "word": "TTS", "start": 0.35, "end": 0.72, "confidence": 0.95 },
    ...
  ]
}

Key fields:

metadata.duration — exact segment length in seconds (more accurate than ffprobe for timing)
text — verified transcript (catches TTS mispronunciations)
words[].start / words[].end — word-level timestamps for fine-grained sync

Batch transcription in the narration pipeline

The demo/build_narration.sh script automates scribe across all segments:

# Transcribe each segment, extract duration + text, accumulate running offset
for id in "${segment_ids[@]}"; do
  tts generate --file "$script_path" --output "$wav_path" --model "$MODEL"

  if [[ "$RUN_SCRIBE" == "1" ]]; then
    scribe transcribe "$wav_path" -f json -o "$TRANS_DIR"
    duration="$(jq -r '.metadata.duration' "$TRANS_DIR/$id.json")"
    text="$(jq -r '.text' "$TRANS_DIR/$id.json")"
  else
    # Fallback: ffprobe for duration, source script for text
    duration="$(ffprobe -v error -show_entries format=duration \
      -of default=nokey=1:noprint_wrappers=1 "$wav_path")"
    text="$(cat "$script_path")"
  fi

  # Compute frame offset: start_frame = running_seconds × fps
  start_frame=$(awk "BEGIN { printf \"%d\", $running_sec * 30 + 0.5 }")
  # ... accumulate into timeline JSON
done

Control with environment variable:

RUN_SCRIBE=1 ./build_narration.sh   # Use scribe (default) — accurate timestamps
RUN_SCRIBE=0 ./build_narration.sh   # Skip scribe — use ffprobe fallback (offline/faster)

From scribe output → beat markers → Remotion cues

The pipeline converts scribe timestamps into three artifacts:

1. Timeline JSON (demo/out/narration_timestamps.json):

{
  "fps": 30,
  "total_seconds": 84.80,
  "total_frames": 2544,
  "segments": [
    {
      "id": "01_title",
      "sequence": "title",
      "text": "Meet TTS CLI...",
      "start_sec": 0.0,
      "end_sec": 15.30,
      "start_frame": 0,
      "end_frame": 459,
      "duration_frames": 459
    },
    ...
  ]
}

2. Transcript markdown (demo/transcript.md):

| Segment | Start | End | Frame | Text |
|---|---:|---:|---:|---|
| 01_title | 0.00s | 15.30s | 0 | Meet TTS CLI... |
| 02_tech | 15.30s | 38.36s | 459 | Under the hood... |

Frame number = start_seconds × 30 (at 30fps).

3. Remotion narration cues (demo/intro/src/narrationCues.ts):

// Auto-generated from scribe transcription timestamps
export const narrationCues = {
  fps: 30,
  totalFrames: 2250,
  scenes: {
    story:    { from: 0,    duration: 480, beatDurations: [75, 90, 105, 60, 75, 75] },
    tech:     { from: 480,  duration: 330, beatDurations: [90, 90, 90, 60] },
    features: { from: 810,  duration: 510, beatDurations: [90, 90, 90, 90, 75, 75] },
    demo:     { from: 1320, duration: 285, beatDurations: [105, 90, 90] },
    cta:      { from: 1605, duration: 240, beatDurations: [75, 75, 90] },
  },
} as const;

Each scene has uniform shape: from (start frame), duration (total frames), beatDurations[] (per-beat frame counts). This is auto-generated by build_narration.sh from scribe timestamps.

This file is imported by TtsIntro.tsx for top-level <Sequence> placement, and by each scene component for internal beat <Sequence> positioning.

Why scribe over ffprobe alone?

	scribe	ffprobe fallback
Duration accuracy	From speech model — accounts for silence trimming	File-level — includes trailing silence
Verified transcript	Catches TTS errors (mispronunciations, skipped words)	Uses source script (assumes TTS was perfect)
Word-level timing	Available — enables per-word animation sync	Not available
Offline use	❌ Requires ElevenLabs API	✅ Fully offline
Speed	~2-5s per segment (API call)	Instant

Recommendation: Use scribe for the final build (accurate timing), use ffprobe fallback during rapid iteration.

Single-segment regeneration (iterating on one beat)

TTS output has randomness — the same text produces different results each run. When a segment sounds bad, generate 3 versions, let the user pick, then patch all downstream artifacts.

1. Generate multiple candidates

# Generate 3 versions for comparison (run in parallel)
tts generate --file demo/narration/05_silence.txt --output demo/out/narration/05_reveal_v1.wav --model 0.6B
tts generate --file demo/narration/05_silence.txt --output demo/out/narration/05_reveal_v2.wav --model 0.6B
tts generate --file demo/narration/05_silence.txt --output demo/out/narration/05_reveal_v3.wav --model 0.6B

Present durations to the user so they can audition and pick.

2. Replace and get new duration

cp demo/out/narration/05_reveal_v2.wav demo/out/narration/05_reveal.wav
ffprobe -v error -show_entries format=duration -of default=nokey=1:noprint_wrappers=1 demo/out/narration/05_reveal.wav

3. Update timestamps JSON (recompute all offsets)

Use jq to patch the single segment's duration and recompute all subsequent offsets:

jq '
  .segments |= (
    map(if .id == "SEGMENT_ID" then .duration_sec = NEW_DUR | .duration_frames = (NEW_DUR * 30 | round) else . end) |
    reduce range(length) as $i (
      .;
      if $i == 0 then
        .[$i].start_sec = 0 | .[$i].start_frame = 0 |
        .[$i].end_sec = .[$i].duration_sec | .[$i].end_frame = .[$i].duration_frames
      else
        .[$i].start_sec = .[$i-1].end_sec | .[$i].start_frame = .[$i-1].end_frame |
        .[$i].end_sec = (.[$i].start_sec + .[$i].duration_sec) |
        .[$i].end_frame = (.[$i].start_frame + .[$i].duration_frames)
      end
    )
  ) |
  .total_seconds = .segments[-1].end_sec |
  .total_frames = .segments[-1].end_frame
' demo/out/narration_timestamps.json > tmp.json && mv tmp.json demo/out/narration_timestamps.json

4. Rebuild all downstream artifacts

After patching timestamps JSON, regenerate these three (can run in parallel):

Re-concat audio — rebuild concat.txt from manifest order, ffmpeg -y -f concat, copy to public/ and root
Regenerate narrationCues.ts — rebuild scene blocks from timeline JSON (same logic as write_cues_ts())
Regenerate transcript.md — rebuild markdown table from timeline JSON (same logic as write_transcript_md())

Summary: single-segment patch checklist

Generate 3 candidate WAVs (parallel)
User picks → copy to official filename
ffprobe → get new duration
jq → patch timestamps JSON + recompute offsets
Re-concat audio + copy to public/ and root
Regenerate narrationCues.ts
Regenerate transcript.md

This avoids re-generating all other segments and takes ~10 seconds vs minutes for the full pipeline.

Step 2: Build Motion Graphics with Remotion

Project setup

cd demo
npx create-video@latest intro --template blank --tailwind
cd intro && npm install

Design system (`design.ts`)

Define a shared palette, fonts, and shadows so all scenes look consistent:

export const palette = {
  ink: "#111827",
  inkMuted: "#5B6475",
  bg: "#FFF8F5",
  bgPanel: "#FFFFFF",
  accent: "#FF6154",
  cool: "#3B82F6",
  // ...
} as const;

export const fonts = {
  display: "'Avenir Next', sans-serif",
  mono: "'JetBrains Mono', monospace",
} as const;

Scene components

Each scene is a React component using Remotion primitives:

useCurrentFrame() — current frame number (drives all animation)
useVideoConfig() — fps, width, height, duration
spring() — physics-based easing for entrances
interpolate() — map frame ranges to CSS values (opacity, translateY, scale)
<Sequence> — place a component at a specific time range

Pattern for a scene component:

import { AbsoluteFill, interpolate, spring, useCurrentFrame, useVideoConfig } from "remotion";

export const TitleCard: React.FC = () => {
  const frame = useCurrentFrame();
  const { fps } = useVideoConfig();

  // Entrance animation
  const enter = spring({ frame: frame - 8, fps, config: { damping: 14, stiffness: 120 } });

  // Fade-out before next scene
  const fadeOut = interpolate(frame, [437, 487], [1, 0], {
    extrapolateLeft: "clamp",
    extrapolateRight: "clamp",
  });

  return (
    <AbsoluteFill style={{ opacity: fadeOut }}>
      <div style={{
        opacity: enter,
        transform: `translateY(${interpolate(enter, [0, 1], [34, 0])}px)`,
        fontSize: 178,
        fontWeight: 800,
      }}>
        TTS CLI
      </div>
    </AbsoluteFill>
  );
};

Audio-synced timeline (`TtsIntro.tsx`)

The main composition sequences scenes using beat markers from the transcript:

import { AbsoluteFill, Audio, Sequence, staticFile } from "remotion";

export const TtsIntro: React.FC = () => (
  <AbsoluteFill>
    <Audio src={staticFile("ttscli_intro.wav")} />
    <Sequence from={0} durationInFrames={520}>
      <TitleCard />
    </Sequence>
    <Sequence from={487} durationInFrames={723}>
      <TechOverview />
    </Sequence>
    {/* ... more scenes ... */}
  </AbsoluteFill>
);

Narration cues file (`narrationCues.ts`)

Auto-generate this from transcript timestamps so scene timing stays in sync:

export const narrationCues = {
  fps: 30,
  totalFrames: 2544,
  scenes: {
    title:    { from: 0,    duration: 520  },
    tech:     { from: 487,  duration: 723  },
    features: { from: 1177, duration: 891  },
    terminal: { from: 2035, duration: 509  },
  },
};

Render

cd demo/intro
npx remotion render TtsIntro --output ../out/intro.mp4 --codec h264

Tips for motion graphics

Start simple — animate one element at a time, iterate.
Use spring() for entrances — feels natural, avoids linear motion.
Cross-fades: overlap scenes by ~30 frames (1s) and use interpolate() for fade-out on the outgoing scene.
Feature pills/badges: small animated labels that pop in one by one with staggered delays.
Avoid small decorative icons next to titles — stroke-based SVG icons (bolt, mic, terminal) render poorly at small sizes in video (broken borders, barely visible). If the scene content already illustrates the concept (e.g., progress bars for streaming, waveforms for voice cloning, terminal chrome for scripting), the title text alone is cleaner. Only use icons when they are the primary visual element (e.g., Apple logo + CPU chip for "MLX Native").
Background effects: subtle particle/grid animations add depth without distracting (see Backdrop.tsx).
Consistent resolution: always 1920×1080 @ 30fps across all segments.

Card layout guidelines

When building capability/feature cards in a row:

Fixed-height illustration boxes — when showing multiple cards side-by-side, give the illustration area a fixed height (e.g. 160px) so all cards match visually. Use display: "flex", alignItems: "center" inside to vertically center varied content.
Card sizing — for a 3-card row on 1920px canvas, 380px per card with connector arrows between them works well. Don't go below 320px or text gets cramped.
Centering SVG + text — when stacking an SVG icon above text, use display: "flex", flexDirection: "column", alignItems: "center" on the container instead of textAlign: "center". The latter won't reliably center inline SVG elements.

Themed scenes (e.g. GitHub-style)

When a scene references an external brand or platform, define a local token object for that theme instead of using the global palette. This keeps the scene self-contained and visually distinct.

// GitHub light theme tokens — scoped to one scene
const gh = {
  bg: "#ffffff",
  bgSubtle: "#f6f8fa",
  cardBg: "#ffffff",
  border: "#d0d7de",
  text: "#1f2328",
  textMuted: "#656d76",
  btnBg: "#f6f8fa",
  btnBorder: "#d0d7de",
  starYellow: "#e3b341",
  link: "#0969da",
} as const;

Tips for themed scenes:

Skip the shared <Backdrop> — use a flat backgroundColor matching the platform's style instead.
Reproduce recognizable UI elements — e.g., GitHub repo card with icon, description, star button, language dot, topic tags. These are instantly familiar and more engaging than abstract placeholders.
Animate interactions — e.g., a star button "click" with spring() pop, a counter rolling from N to N+1. Makes the scene feel alive.
Prefer light themes when the overall video uses a light design system. Dark-themed scenes create jarring contrast.

Concrete vs abstract illustrations

Prefer concrete, terminal-style content inside card illustration boxes over abstract graphics:

Abstract (avoid)	Concrete (prefer)
Neural network dots	Agent thinking steps: 🔍 `read codebase...` → 🧠 `analyzing...` → 📋 `plan: 3 steps`
Floating particles	Code snippet with syntax highlighting
Generic waveform	Terminal pipeline: `$ running...` → `✓ git done` → `✓ test done`

Concrete illustrations are more readable at video resolution and immediately communicate what the feature does.

Step 3: Record Terminal Demos with VHS

VHS records scripted terminal sessions as video.

Install

brew install charmbracelet/tap/vhs

Write a `.tape` file

Each terminal segment gets its own tape file:

# Terminal Scene: Install & Setup
Output out/terminal1.mp4
Set Width 1920
Set Height 1080
Set Framerate 30
Set FontFamily "Menlo"
Set FontSize 22
Set Theme "Github"
Set Padding 40
Set TypingSpeed 30ms
Set CursorBlink true
Set Shell zsh

Sleep 400ms

Type "curl -fsSL https://example.com/install.sh | bash"
Sleep 150ms
Enter
Sleep 4000ms

Type "mytool --version"
Sleep 150ms
Enter
Sleep 1500ms

Sleep 400ms

Tape file guidelines

Setting	Recommended Value	Why
`Width` / `Height`	1920 × 1080	Match Remotion resolution
`Framerate`	30	Match Remotion fps
`Theme`	Github (light) or Dracula (dark)	Consistent look
`TypingSpeed`	30ms	Fast enough to not bore, slow enough to read
`Sleep` after Enter	2000–4000ms	Let output render before next command

Record

vhs terminal_voices.tape
vhs terminal_speech.tape
vhs terminal_config.tape

Tips

Target 16 seconds per segment — trim in the assembly step.
Plan 3 terminal segments covering: setup, core features, advanced/workflow.
Keep commands short and readable — avoid long flags when possible.
End each tape with a Sleep 400ms buffer.

Step 4: Assemble the Final Video

The build script (demo/build.sh) orchestrates everything:

Timing plan

Segment        Start   Duration  Frames
Intro (motion) 0:00    28s       840
Label 1        0:28    2s        60      (optional title card)
Terminal 1     0:30    16s       480
Label 2        0:46    2s        60
Terminal 2     0:48    16s       480
Label 3        1:04    2s        60
Terminal 3     1:06    16s       480

Trim terminals to exact duration

ffmpeg -y -i out/terminal1.mp4 -t 16 \
  -c:v libx264 -preset fast -crf 18 -pix_fmt yuv420p -r 30 -an \
  out/terminal1_trimmed.mp4

Concatenate segments

cat > out/concat_list.txt <<EOF
file 'intro.mp4'
file 'label1.mp4'
file 'terminal1_trimmed.mp4'
file 'label2.mp4'
file 'terminal2_trimmed.mp4'
file 'label3.mp4'
file 'terminal3_trimmed.mp4'
EOF

ffmpeg -y -f concat -safe 0 -i out/concat_list.txt \
  -c:v libx264 -preset fast -crf 18 -pix_fmt yuv420p -r 30 -an \
  out/concat.mp4

Merge narration audio

ffmpeg -y -i out/concat.mp4 -i ttscli_intro.wav \
  -c:v copy -c:a aac -b:a 128k -ar 44100 -ac 2 \
  -shortest -movflags +faststart \
  out/ttscli_demo.mp4

Run the full pipeline

cd demo
./build.sh           # Build everything
./build.sh remotion  # Only re-render motion graphics
./build.sh terminals # Only re-record terminal demos
./build.sh merge     # Only re-assemble final video

Quick-Start Checklist

When asked to create a product demo, follow these steps:

[ ] Write narration script — Plan 4 scenes, ~60–90s total, one sentence per feature. Create narration/manifest.json + per-segment .txt files.
[ ] Generate audio — Use tts generate per segment, concatenate with ffmpeg.
[ ] Transcribe with scribe — Run scribe transcribe on each segment to get accurate durations and verified text. Compute beat markers (frame = seconds × fps).
[ ] Generate timing artifacts — Write narration_timestamps.json, transcript.md, and narrationCues.ts from scribe output. Or run build_narration.sh to automate steps 2–4.
[ ] Create design system — Define palette, fonts, shadows in design.ts.
[ ] Build Remotion scenes — One component per scene, use spring() + interpolate(), sync to audio beat markers from narrationCues.ts.
[ ] Write VHS tapes — One .tape per terminal segment, 1920×1080 @ 30fps, ~16s each.
[ ] Record terminals — vhs <tape>.tape for each.
[ ] Assemble — Trim terminals, concatenate all segments, merge audio with ffmpeg.
[ ] Review & iterate — Watch the full video, adjust timing, re-render individual pieces as needed.

Required Tools

Tool	Install	Purpose
`tts`	`pip install tts-cli`	Narration audio generation
`node` / `npx`	`brew install node`	Remotion rendering
`remotion`	`npx create-video@latest`	Motion graphics
`vhs`	`brew install charmbracelet/tap/vhs`	Terminal recording
`ffmpeg`	`brew install ffmpeg`	Video/audio processing
`scribe`	`npm install -g scribe-cli` + `scribe auth`	Transcription for accurate timestamps (ElevenLabs API)

Reference Files

Existing demo: demo/ directory in this repo
Video build script: demo/build.sh — renders Remotion, records VHS, assembles final MP4
Narration build script: demo/build_narration.sh — TTS generation → scribe transcription → timestamp extraction → narrationCues.ts
Segment manifest: demo/narration/manifest.json — defines segment order, roles, and script files
Narration scripts: demo/narration/0*.txt — one text file per segment
Remotion project: demo/intro/
Remotion timing cues: demo/intro/src/narrationCues.ts (auto-generated from scribe)
VHS tapes: demo/terminal_*.tape
Narration plan: demo/narration_script.md
Transcript & beat markers: demo/transcript.md
Remotion tips: remotion-tip.md

product-demo

Mehr aus diesem Repository

Mehr aus diesem Repository

Product Demo Video — End-to-End Skill

Architecture Overview

Step 1: Write the Narration Script

Guidelines

Beat manifest (narration/manifest.json)

Generate audio with TTS CLI

Concatenate segments with ffmpeg

Extract timestamps with Scribe

Setup

Transcribe individual segments

Scribe CLI options

JSON output structure

Batch transcription in the narration pipeline

From scribe output → beat markers → Remotion cues

Why scribe over ffprobe alone?

Single-segment regeneration (iterating on one beat)

1. Generate multiple candidates

2. Replace and get new duration

3. Update timestamps JSON (recompute all offsets)

4. Rebuild all downstream artifacts

Summary: single-segment patch checklist

Step 2: Build Motion Graphics with Remotion

Project setup

Design system (design.ts)

Scene components

Audio-synced timeline (TtsIntro.tsx)

Narration cues file (narrationCues.ts)

Render

Tips for motion graphics

Card layout guidelines

Themed scenes (e.g. GitHub-style)

Concrete vs abstract illustrations

Step 3: Record Terminal Demos with VHS

Install

Write a .tape file

Tape file guidelines

Record

Tips

Step 4: Assemble the Final Video

Timing plan

Trim terminals to exact duration

Concatenate segments

Merge narration audio

Run the full pipeline

Quick-Start Checklist

Required Tools

Reference Files

Product Demo Video — End-to-End Skill

Architecture Overview

Step 1: Write the Narration Script

Guidelines

Beat manifest (narration/manifest.json)

Generate audio with TTS CLI

Concatenate segments with ffmpeg

Extract timestamps with Scribe

Setup

Transcribe individual segments

Scribe CLI options

JSON output structure

Batch transcription in the narration pipeline

From scribe output → beat markers → Remotion cues

Why scribe over ffprobe alone?

Single-segment regeneration (iterating on one beat)

1. Generate multiple candidates

2. Replace and get new duration

3. Update timestamps JSON (recompute all offsets)

4. Rebuild all downstream artifacts

Summary: single-segment patch checklist

Step 2: Build Motion Graphics with Remotion

Project setup

Design system (design.ts)

Scene components

Audio-synced timeline (TtsIntro.tsx)

Narration cues file (narrationCues.ts)

Render

Tips for motion graphics

Card layout guidelines

Beat manifest (`narration/manifest.json`)

Design system (`design.ts`)

Audio-synced timeline (`TtsIntro.tsx`)

Narration cues file (`narrationCues.ts`)

Write a `.tape` file

Beat manifest (`narration/manifest.json`)

Design system (`design.ts`)

Audio-synced timeline (`TtsIntro.tsx`)

Narration cues file (`narrationCues.ts`)

Write a `.tape` file