| name | product-demo |
| description | Create a polished product demo video with motion graphics intro, narrated audio, and terminal recordings. Use when the user asks to build a demo video, product walkthrough, or promotional clip for a CLI tool or software project. |
Product Demo Video — End-to-End Skill
Build a professional product demo video combining narrated audio (TTS), motion graphics (Remotion), and terminal recordings (VHS). The final output is a single .mp4 with synced audio.
Architecture Overview
A demo video has three layers assembled in a pipeline:
1. Narration Audio → TTS CLI generates speech from scripts
2. Motion Graphics → Remotion renders animated intro/transitions
3. Terminal Demos → VHS records scripted terminal sessions
4. Assembly → ffmpeg concatenates video + merges audio
Directory structure:
demo/
├── build.sh # Master build script (orchestrates everything)
├── build_narration.sh # Narration pipeline: TTS → scribe → cues
├── narration_script.md # Narration plan & source file list
├── transcript.md # Final transcript with timestamps & beat markers
├── narration/ # Per-beat narration scripts (one sentence each)
│ ├── manifest.json # Beat manifest (id, sequence, role, beatIndex, script)
│ ├── 01_hook.txt # Act 1 beats (story)
│ ├── 02_stars.txt
│ ├── ...
│ ├── 08_engine.txt # Act 2 beats (tech)
│ ├── ...
│ ├── 12_voice_cloning.txt # Act 3 beats (features)
│ ├── ...
│ ├── 18_demo_say.txt # Act 4 beats (demo)
│ ├── ...
│ └── 23_closing.txt # Act 5 beats (cta)
├── terminal_voices.tape # VHS tape: install & setup
├── terminal_speech.tape # VHS tape: voice cloning & speech
├── terminal_config.tape # VHS tape: generate, export & workflow
├── ttscli_demo.tape # VHS tape: full demo (alternative single-take)
├── intro/ # Remotion project
│ ├── package.json
│ ├── remotion.config.ts
│ ├── tsconfig.json
│ ├── tailwind.config.js
│ ├── public/ # Static assets (audio, images)
│ │ └── ttscli_intro.wav
│ └── src/
│ ├── Root.tsx # Remotion entry — registers compositions
│ ├── TtsIntro.tsx # Main composition — scene sequencing
│ ├── design.ts # Shared palette, fonts, shadows
│ ├── narrationCues.ts # Auto-generated timing constants from scribe
│ ├── index.ts
│ ├── style.css
│ ├── scenes/ # One component per visual act
│ │ ├── OpenClawStory.tsx # Act 1: AI Agent story (6 beats)
│ │ ├── HowItWorks.tsx # Act 2: Engine, backends, install (4 beats)
│ │ ├── FeatureHighlights.tsx # Act 3: 6 unique feature beats
│ │ ├── LiveDemo.tsx # Act 4: Persistent terminal (3 beats)
│ │ └── CallToAction.tsx # Act 5: GitHub CTA + logo lock (3 beats)
│ └── effects/ # Reusable visual effects
│ ├── Backdrop.tsx
│ ├── RhythmOverlay.tsx
│ ├── TerminalChrome.tsx # Shared terminal window chrome
│ └── Waveform.tsx # Animated waveform SVG
└── out/ # Build artifacts (gitignored)
├── ttscli_demo.mp4
├── intro.mp4
├── terminal1.mp4
├── terminal2.mp4
├── terminal3.mp4
├── narration/ # Per-segment WAV files
│ ├── 01_title.wav
│ ├── 02_tech.wav
│ └── ...
├── narration_transcripts/ # Scribe JSON outputs per segment
│ ├── 01_title.json
│ ├── 02_tech.json
│ └── ...
└── narration_timestamps.json # Combined timeline with all beat markers
Step 1: Write the Narration Script
Plan the story arc first. A good demo narration follows this structure:
| Act | Purpose | Beats | Duration |
|---|
| Story / Hook | Grab attention, establish the problem | 5–7 | 12–18s |
| How It Works | Engine, backends, install | 3–5 | 10–14s |
| Feature Highlights | One unique visual per feature | 4–6 | 14–20s |
| Live Demo | Terminal with accumulating commands | 3 | 8–12s |
| CTA | GitHub link + logo lock | 2–3 | 6–10s |
Guidelines
- Fast-cut structure: ~2–4 seconds per beat, one sentence each.
- Split into separate text files per beat (easier to re-generate individually).
- Aim for 60–80 seconds total — attention drops fast.
- Each act = one Remotion scene component with internal
<Sequence> per beat.
Beat manifest (narration/manifest.json)
Define all beats, their ordering, and their sequence grouping:
{
"fps": 30,
"segments": [
{ "id": "01_hook", "sequence": "story", "role": "beat", "beatIndex": 0, "script": "01_hook.txt" },
{ "id": "02_agents", "sequence": "story", "role": "beat", "beatIndex": 1, "script": "02_stars.txt" },
...
{ "id": "07_engine", "sequence": "tech", "role": "beat", "beatIndex": 0, "script": "08_engine.txt" },
...
{ "id": "22_closing", "sequence": "cta", "role": "beat", "beatIndex": 2, "script": "23_closing.txt" }
]
}
Fields:
id — Unique segment identifier (used as filename for WAV + scribe JSON)
sequence — Groups beats into Remotion scenes (story, tech, features, demo, cta)
role — Always "beat" in the fast-cut architecture
beatIndex — Zero-based index within the sequence (drives internal <Sequence> positioning)
script — Filename of the narration text file in narration/
Generate audio with TTS CLI
tts generate "Hey, meet TTS CLI, a text-to-speech tool that runs entirely on your machine." \
--voice james -o demo/out/01_title.wav
tts generate --file demo/narration/01_title.txt --voice james -o demo/out/01_title.wav
Concatenate segments with ffmpeg
for f in demo/out/0*.wav; do echo "file '$f'"; done > demo/out/concat.txt
ffmpeg -f concat -safe 0 -i demo/out/concat.txt -c copy demo/out/narration.wav
Extract timestamps with Scribe
After generating audio, use scribe to transcribe each segment and extract precise timestamps. Scribe is a CLI that calls the ElevenLabs transcription API and returns word-level timing data — this is what drives the Remotion animation timeline.
Setup
npm install -g scribe-cli
scribe auth
Transcribe individual segments
scribe transcribe demo/out/narration/01_title.wav -f json -o demo/out/narration_transcripts/
scribe transcribe demo/out/narration/01_title.wav -f all -o demo/out/narration_transcripts/
scribe transcribe demo/out/narration/01_title.wav -f json --stdout
Scribe CLI options
| Flag | Description |
|---|
-f, --format <type> | Output format: json, md, txt, srt, all (default: json) |
-o, --output-dir <dir> | Output directory (default: .) |
-d, --diarize | Enable speaker diarization |
-s, --speakers <count> | Speaker count hint (1–32) |
-l, --language <code> | Language code (ISO-639, e.g. en, zh) |
--stdout | Print to stdout instead of writing file |
-q, --quiet | Suppress progress output |
JSON output structure
Scribe JSON output contains the metadata needed for timeline sync:
{
"text": "Meet TTS CLI, a fully local text-to-speech toolkit...",
"metadata": {
"duration": 15.30,
"language": "en"
},
"words": [
{ "word": "Meet", "start": 0.0, "end": 0.32, "confidence": 0.98 },
{ "word": "TTS", "start": 0.35, "end": 0.72, "confidence": 0.95 },
...
]
}
Key fields:
metadata.duration — exact segment length in seconds (more accurate than ffprobe for timing)
text — verified transcript (catches TTS mispronunciations)
words[].start / words[].end — word-level timestamps for fine-grained sync
Batch transcription in the narration pipeline
The demo/build_narration.sh script automates scribe across all segments:
for id in "${segment_ids[@]}"; do
tts generate --file "$script_path" --output "$wav_path" --model "$MODEL"
if [[ "$RUN_SCRIBE" == "1" ]]; then
scribe transcribe "$wav_path" -f json -o "$TRANS_DIR"
duration="$(jq -r '.metadata.duration' "$TRANS_DIR/$id.json")"
text="$(jq -r '.text' "$TRANS_DIR/$id.json")"
else
duration="$(ffprobe -v error -show_entries format=duration \
-of default=nokey=1:noprint_wrappers=1 "$wav_path")"
text="$(cat "$script_path")"
fi
start_frame=$(awk "BEGIN { printf \"%d\", $running_sec * 30 + 0.5 }")
done
Control with environment variable:
RUN_SCRIBE=1 ./build_narration.sh
RUN_SCRIBE=0 ./build_narration.sh
From scribe output → beat markers → Remotion cues
The pipeline converts scribe timestamps into three artifacts:
1. Timeline JSON (demo/out/narration_timestamps.json):
{
"fps": 30,
"total_seconds": 84.80,
"total_frames": 2544,
"segments": [
{
"id": "01_title",
"sequence": "title",
"text": "Meet TTS CLI...",
"start_sec": 0.0,
"end_sec": 15.30,
"start_frame": 0,
"end_frame": 459,
"duration_frames": 459
},
...
]
}
2. Transcript markdown (demo/transcript.md):
| Segment | Start | End | Frame | Text |
|---|---:|---:|---:|---|
| 01_title | 0.00s | 15.30s | 0 | Meet TTS CLI... |
| 02_tech | 15.30s | 38.36s | 459 | Under the hood... |
Frame number = start_seconds × 30 (at 30fps).
3. Remotion narration cues (demo/intro/src/narrationCues.ts):
export const narrationCues = {
fps: 30,
totalFrames: 2250,
scenes: {
story: { from: 0, duration: 480, beatDurations: [75, 90, 105, 60, 75, 75] },
tech: { from: 480, duration: 330, beatDurations: [90, 90, 90, 60] },
features: { from: 810, duration: 510, beatDurations: [90, 90, 90, 90, 75, 75] },
demo: { from: 1320, duration: 285, beatDurations: [105, 90, 90] },
cta: { from: 1605, duration: 240, beatDurations: [75, 75, 90] },
},
} as const;
Each scene has uniform shape: from (start frame), duration (total frames), beatDurations[] (per-beat frame counts). This is auto-generated by build_narration.sh from scribe timestamps.
This file is imported by TtsIntro.tsx for top-level <Sequence> placement, and by each scene component for internal beat <Sequence> positioning.
Why scribe over ffprobe alone?
| scribe | ffprobe fallback |
|---|
| Duration accuracy | From speech model — accounts for silence trimming | File-level — includes trailing silence |
| Verified transcript | Catches TTS errors (mispronunciations, skipped words) | Uses source script (assumes TTS was perfect) |
| Word-level timing | Available — enables per-word animation sync | Not available |
| Offline use | ❌ Requires ElevenLabs API | ✅ Fully offline |
| Speed | ~2-5s per segment (API call) | Instant |
Recommendation: Use scribe for the final build (accurate timing), use ffprobe fallback during rapid iteration.
Single-segment regeneration (iterating on one beat)
TTS output has randomness — the same text produces different results each run. When a segment sounds bad, generate 3 versions, let the user pick, then patch all downstream artifacts.
1. Generate multiple candidates
tts generate --file demo/narration/05_silence.txt --output demo/out/narration/05_reveal_v1.wav --model 0.6B
tts generate --file demo/narration/05_silence.txt --output demo/out/narration/05_reveal_v2.wav --model 0.6B
tts generate --file demo/narration/05_silence.txt --output demo/out/narration/05_reveal_v3.wav --model 0.6B
Present durations to the user so they can audition and pick.
2. Replace and get new duration
cp demo/out/narration/05_reveal_v2.wav demo/out/narration/05_reveal.wav
ffprobe -v error -show_entries format=duration -of default=nokey=1:noprint_wrappers=1 demo/out/narration/05_reveal.wav
3. Update timestamps JSON (recompute all offsets)
Use jq to patch the single segment's duration and recompute all subsequent offsets:
jq '
.segments |= (
map(if .id == "SEGMENT_ID" then .duration_sec = NEW_DUR | .duration_frames = (NEW_DUR * 30 | round) else . end) |
reduce range(length) as $i (
.;
if $i == 0 then
.[$i].start_sec = 0 | .[$i].start_frame = 0 |
.[$i].end_sec = .[$i].duration_sec | .[$i].end_frame = .[$i].duration_frames
else
.[$i].start_sec = .[$i-1].end_sec | .[$i].start_frame = .[$i-1].end_frame |
.[$i].end_sec = (.[$i].start_sec + .[$i].duration_sec) |
.[$i].end_frame = (.[$i].start_frame + .[$i].duration_frames)
end
)
) |
.total_seconds = .segments[-1].end_sec |
.total_frames = .segments[-1].end_frame
' demo/out/narration_timestamps.json > tmp.json && mv tmp.json demo/out/narration_timestamps.json
4. Rebuild all downstream artifacts
After patching timestamps JSON, regenerate these three (can run in parallel):
- Re-concat audio — rebuild
concat.txt from manifest order, ffmpeg -y -f concat, copy to public/ and root
- Regenerate
narrationCues.ts — rebuild scene blocks from timeline JSON (same logic as write_cues_ts())
- Regenerate
transcript.md — rebuild markdown table from timeline JSON (same logic as write_transcript_md())
Summary: single-segment patch checklist
- Generate 3 candidate WAVs (parallel)
- User picks → copy to official filename
ffprobe → get new duration
jq → patch timestamps JSON + recompute offsets
- Re-concat audio + copy to
public/ and root
- Regenerate
narrationCues.ts
- Regenerate
transcript.md
This avoids re-generating all other segments and takes ~10 seconds vs minutes for the full pipeline.
Step 2: Build Motion Graphics with Remotion
Project setup
cd demo
npx create-video@latest intro --template blank --tailwind
cd intro && npm install
Design system (design.ts)
Define a shared palette, fonts, and shadows so all scenes look consistent:
export const palette = {
ink: "#111827",
inkMuted: "#5B6475",
bg: "#FFF8F5",
bgPanel: "#FFFFFF",
accent: "#FF6154",
cool: "#3B82F6",
} as const;
export const fonts = {
display: "'Avenir Next', sans-serif",
mono: "'JetBrains Mono', monospace",
} as const;
Scene components
Each scene is a React component using Remotion primitives:
useCurrentFrame() — current frame number (drives all animation)
useVideoConfig() — fps, width, height, duration
spring() — physics-based easing for entrances
interpolate() — map frame ranges to CSS values (opacity, translateY, scale)
<Sequence> — place a component at a specific time range
Pattern for a scene component:
import { AbsoluteFill, interpolate, spring, useCurrentFrame, useVideoConfig } from "remotion";
export const TitleCard: React.FC = () => {
const frame = useCurrentFrame();
const { fps } = useVideoConfig();
const enter = spring({ frame: frame - 8, fps, config: { damping: 14, stiffness: 120 } });
const fadeOut = interpolate(frame, [437, 487], [1, 0], {
extrapolateLeft: "clamp",
extrapolateRight: "clamp",
});
return (
<AbsoluteFill style={{ opacity: fadeOut }}>
<div style={{
opacity: enter,
transform: `translateY(${interpolate(enter, [0, 1], [34, 0])}px)`,
fontSize: 178,
fontWeight: 800,
}}>
TTS CLI
</div>
</AbsoluteFill>
);
};
Audio-synced timeline (TtsIntro.tsx)
The main composition sequences scenes using beat markers from the transcript:
import { AbsoluteFill, Audio, Sequence, staticFile } from "remotion";
export const TtsIntro: React.FC = () => (
<AbsoluteFill>
<Audio src={staticFile("ttscli_intro.wav")} />
<Sequence from={0} durationInFrames={520}>
<TitleCard />
</Sequence>
<Sequence from={487} durationInFrames={723}>
<TechOverview />
</Sequence>
{/* ... more scenes ... */}
</AbsoluteFill>
);
Narration cues file (narrationCues.ts)
Auto-generate this from transcript timestamps so scene timing stays in sync:
export const narrationCues = {
fps: 30,
totalFrames: 2544,
scenes: {
title: { from: 0, duration: 520 },
tech: { from: 487, duration: 723 },
features: { from: 1177, duration: 891 },
terminal: { from: 2035, duration: 509 },
},
};
Render
cd demo/intro
npx remotion render TtsIntro --output ../out/intro.mp4 --codec h264
Tips for motion graphics
- Start simple — animate one element at a time, iterate.
- Use
spring() for entrances — feels natural, avoids linear motion.
- Cross-fades: overlap scenes by ~30 frames (1s) and use
interpolate() for fade-out on the outgoing scene.
- Feature pills/badges: small animated labels that pop in one by one with staggered delays.
- Avoid small decorative icons next to titles — stroke-based SVG icons (bolt, mic, terminal) render poorly at small sizes in video (broken borders, barely visible). If the scene content already illustrates the concept (e.g., progress bars for streaming, waveforms for voice cloning, terminal chrome for scripting), the title text alone is cleaner. Only use icons when they are the primary visual element (e.g., Apple logo + CPU chip for "MLX Native").
- Background effects: subtle particle/grid animations add depth without distracting (see
Backdrop.tsx).
- Consistent resolution: always 1920×1080 @ 30fps across all segments.
Card layout guidelines
When building capability/feature cards in a row:
- Fixed-height illustration boxes — when showing multiple cards side-by-side, give the illustration area a fixed
height (e.g. 160px) so all cards match visually. Use display: "flex", alignItems: "center" inside to vertically center varied content.
- Card sizing — for a 3-card row on 1920px canvas, 380px per card with connector arrows between them works well. Don't go below 320px or text gets cramped.
- Centering SVG + text — when stacking an SVG icon above text, use
display: "flex", flexDirection: "column", alignItems: "center" on the container instead of textAlign: "center". The latter won't reliably center inline SVG elements.
Themed scenes (e.g. GitHub-style)
When a scene references an external brand or platform, define a local token object for that theme instead of using the global palette. This keeps the scene self-contained and visually distinct.
const gh = {
bg: "#ffffff",
bgSubtle: "#f6f8fa",
cardBg: "#ffffff",
border: "#d0d7de",
text: "#1f2328",
textMuted: "#656d76",
btnBg: "#f6f8fa",
btnBorder: "#d0d7de",
starYellow: "#e3b341",
link: "#0969da",
} as const;
Tips for themed scenes:
- Skip the shared
<Backdrop> — use a flat backgroundColor matching the platform's style instead.
- Reproduce recognizable UI elements — e.g., GitHub repo card with icon, description, star button, language dot, topic tags. These are instantly familiar and more engaging than abstract placeholders.
- Animate interactions — e.g., a star button "click" with
spring() pop, a counter rolling from N to N+1. Makes the scene feel alive.
- Prefer light themes when the overall video uses a light design system. Dark-themed scenes create jarring contrast.
Concrete vs abstract illustrations
Prefer concrete, terminal-style content inside card illustration boxes over abstract graphics:
| Abstract (avoid) | Concrete (prefer) |
|---|
| Neural network dots | Agent thinking steps: 🔍 read codebase... → 🧠 analyzing... → 📋 plan: 3 steps |
| Floating particles | Code snippet with syntax highlighting |
| Generic waveform | Terminal pipeline: $ running... → ✓ git done → ✓ test done |
Concrete illustrations are more readable at video resolution and immediately communicate what the feature does.
Step 3: Record Terminal Demos with VHS
VHS records scripted terminal sessions as video.
Install
brew install charmbracelet/tap/vhs
Write a .tape file
Each terminal segment gets its own tape file:
# Terminal Scene: Install & Setup
Output out/terminal1.mp4
Set Width 1920
Set Height 1080
Set Framerate 30
Set FontFamily "Menlo"
Set FontSize 22
Set Theme "Github"
Set Padding 40
Set TypingSpeed 30ms
Set CursorBlink true
Set Shell zsh
Sleep 400ms
Type "curl -fsSL https://example.com/install.sh | bash"
Sleep 150ms
Enter
Sleep 4000ms
Type "mytool --version"
Sleep 150ms
Enter
Sleep 1500ms
Sleep 400ms
Tape file guidelines
| Setting | Recommended Value | Why |
|---|
Width / Height | 1920 × 1080 | Match Remotion resolution |
Framerate | 30 | Match Remotion fps |
Theme | Github (light) or Dracula (dark) | Consistent look |
TypingSpeed | 30ms | Fast enough to not bore, slow enough to read |
Sleep after Enter | 2000–4000ms | Let output render before next command |
Record
vhs terminal_voices.tape
vhs terminal_speech.tape
vhs terminal_config.tape
Tips
- Target 16 seconds per segment — trim in the assembly step.
- Plan 3 terminal segments covering: setup, core features, advanced/workflow.
- Keep commands short and readable — avoid long flags when possible.
- End each tape with a
Sleep 400ms buffer.
Step 4: Assemble the Final Video
The build script (demo/build.sh) orchestrates everything:
Timing plan
Segment Start Duration Frames
Intro (motion) 0:00 28s 840
Label 1 0:28 2s 60 (optional title card)
Terminal 1 0:30 16s 480
Label 2 0:46 2s 60
Terminal 2 0:48 16s 480
Label 3 1:04 2s 60
Terminal 3 1:06 16s 480
Trim terminals to exact duration
ffmpeg -y -i out/terminal1.mp4 -t 16 \
-c:v libx264 -preset fast -crf 18 -pix_fmt yuv420p -r 30 -an \
out/terminal1_trimmed.mp4
Concatenate segments
cat > out/concat_list.txt <<EOF
file 'intro.mp4'
file 'label1.mp4'
file 'terminal1_trimmed.mp4'
file 'label2.mp4'
file 'terminal2_trimmed.mp4'
file 'label3.mp4'
file 'terminal3_trimmed.mp4'
EOF
ffmpeg -y -f concat -safe 0 -i out/concat_list.txt \
-c:v libx264 -preset fast -crf 18 -pix_fmt yuv420p -r 30 -an \
out/concat.mp4
Merge narration audio
ffmpeg -y -i out/concat.mp4 -i ttscli_intro.wav \
-c:v copy -c:a aac -b:a 128k -ar 44100 -ac 2 \
-shortest -movflags +faststart \
out/ttscli_demo.mp4
Run the full pipeline
cd demo
./build.sh
./build.sh remotion
./build.sh terminals
./build.sh merge
Quick-Start Checklist
When asked to create a product demo, follow these steps:
- [ ] Write narration script — Plan 4 scenes, ~60–90s total, one sentence per feature. Create
narration/manifest.json + per-segment .txt files.
- [ ] Generate audio — Use
tts generate per segment, concatenate with ffmpeg.
- [ ] Transcribe with scribe — Run
scribe transcribe on each segment to get accurate durations and verified text. Compute beat markers (frame = seconds × fps).
- [ ] Generate timing artifacts — Write
narration_timestamps.json, transcript.md, and narrationCues.ts from scribe output. Or run build_narration.sh to automate steps 2–4.
- [ ] Create design system — Define palette, fonts, shadows in
design.ts.
- [ ] Build Remotion scenes — One component per scene, use
spring() + interpolate(), sync to audio beat markers from narrationCues.ts.
- [ ] Write VHS tapes — One
.tape per terminal segment, 1920×1080 @ 30fps, ~16s each.
- [ ] Record terminals —
vhs <tape>.tape for each.
- [ ] Assemble — Trim terminals, concatenate all segments, merge audio with ffmpeg.
- [ ] Review & iterate — Watch the full video, adjust timing, re-render individual pieces as needed.
Required Tools
| Tool | Install | Purpose |
|---|
tts | pip install tts-cli | Narration audio generation |
node / npx | brew install node | Remotion rendering |
remotion | npx create-video@latest | Motion graphics |
vhs | brew install charmbracelet/tap/vhs | Terminal recording |
ffmpeg | brew install ffmpeg | Video/audio processing |
scribe | npm install -g scribe-cli + scribe auth | Transcription for accurate timestamps (ElevenLabs API) |
Reference Files
- Existing demo:
demo/ directory in this repo
- Video build script:
demo/build.sh — renders Remotion, records VHS, assembles final MP4
- Narration build script:
demo/build_narration.sh — TTS generation → scribe transcription → timestamp extraction → narrationCues.ts
- Segment manifest:
demo/narration/manifest.json — defines segment order, roles, and script files
- Narration scripts:
demo/narration/0*.txt — one text file per segment
- Remotion project:
demo/intro/
- Remotion timing cues:
demo/intro/src/narrationCues.ts (auto-generated from scribe)
- VHS tapes:
demo/terminal_*.tape
- Narration plan:
demo/narration_script.md
- Transcript & beat markers:
demo/transcript.md
- Remotion tips:
remotion-tip.md