Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

app-demo-agent

Use this skill to turn screen recordings into polished app demo videos with AI voiceover and music. Triggers: "app demo", "demo video from recording", "screen recording demo", "narrate this recording", "add voiceover to screen recording", "make a demo from this recording", "polish this screen recording", "app walkthrough video", "product demo video", "turn this recording into a demo", "screen recording with voiceover", "app showcase video", "demo with narration" Orchestrates: screen analysis, voiceover script, device framing, TTS, background music, and final assembly. NOTE: This skill requires an existing screen recording as input. It does NOT generate screen recordings.

In Manus ausführen

Überblick

Installationsbefehl

npx skills add https://github.com/michaelboeding/skills --skill app-demo-agent

Kopieren Sie diesen Befehl und fügen Sie ihn in Claude Code ein, um den Skill zu installieren

Quelle

michaelboeding/skills

Sterne13

Forks1

Aktualisiert1. April 2026 um 04:53

Datei-Explorer

2 Dateien

SKILL.md

readonly

Mehr aus diesem Repository

gleiches Repository

walkthrough-script-agent

michaelboeding/skills

Use this skill to generate walkthrough video scripts for app features. Triggers: "walkthrough script", "feature walkthrough", "app walkthrough script", "demo script", "video script for app", "feature tour script", "product walkthrough", "screen-by-screen script", "app feature scripts", "walkthrough narration", "tutorial script", "onboarding script", "feature highlight script" Outputs: Timed walkthrough scripts with narration, screen directions, and transitions for each app feature.

2026-04-0913

auto-permissions-review-install

michaelboeding/skills

Install an AI-powered permission review hook that uses Claude Haiku to evaluate tool call safety before execution. Per-session, auto-approves read-only operations, sends ambiguous Bash commands to Haiku, and only reviews edits in accept-edits mode. Use when setting up auto-approve with AI review, reducing permission fatigue, or wanting Claude to review its own tool calls.

2026-04-0613

auto-permissions-review-disable

michaelboeding/skills

Disable the auto permissions review hook for this session. Tool calls return to normal permission prompts. Other terminal sessions are not affected.

2026-04-0613

auto-permissions-review-enable

michaelboeding/skills

Enable the auto permissions review hook for this session. Auto-approves read-only operations, sends ambiguous Bash to Haiku, and reviews edits only in accept-edits mode. Other terminal sessions are not affected.

2026-04-0613

device-framer

michaelboeding/skills

Wrap screen recordings and screenshots in photorealistic iPhone device frames with drop shadow and background. Use this skill whenever the user uploads a screen recording (MP4, MOV, etc.) or screenshot (PNG, JPG, etc.) and wants it placed inside a phone mockup, device frame, or device bezel. Also trigger when the user mentions "device frame", "phone mockup", "iPhone frame", "app demo", "wrap in device", "Screen Studio", "mockup video", "app store screenshot", or wants to make a screen recording or screenshot look polished/professional. Supports 12 iPhone models from iPhone 13 mini to iPhone 17 Pro Max with 44 color variants. Handles both video (ffmpeg) and image (Pillow) inputs automatically.

2026-04-0113

voice-generation

michaelboeding/skills

Use this skill for AI text-to-speech generation. Triggers include: "generate voice", "create audio", "text to speech", "TTS", "read this aloud", "generate narration", "create voiceover", "synthesize speech", "podcast audio", "dialogue audio", "multi-speaker", "audiobook" Supports Google Gemini TTS, ElevenLabs, and OpenAI TTS.

2026-04-0113

Quelle

michaelboeding

michaelboeding/skills

GitHub-Repository öffnen Creator-Repositorys ansehen

Installationsbefehl

Download

In Manus ausführen

Nützlich fürSOC

Film- und VideobearbeiterKunst, Design, Unterhaltung, Sport und Medien27-4032L4

Marktforschungsanalysten und MarketingspezialistenL4

name

app-demo-agent

description

App Demo Producer

Turn raw screen recordings into polished app demo videos with AI-generated voiceover, device frames, and background music.

This is an orchestrator skill that combines:

Screen recording analysis (FFmpeg frame extraction + Claude vision)
Voiceover script generation (Claude)
Device framing (device-framer)
Voice synthesis (Gemini TTS / OpenAI TTS / ElevenLabs)
Background music (Lyria / Suno / Udio)
Final assembly (FFmpeg via media-utils)

Workflow

Step 1: Gather Requirements (REQUIRED)

⚠️ DO NOT skip this step. DO NOT start processing until you have ALL answers.

Use interactive questioning — ask ONE question at a time, wait for the response, then ask the next.

Question Flow

⚠️ Use the AskUserQuestion tool for each question below.

Q1: Screen Recording

"I'll turn that into a polished demo! First — where's the screen recording?

(provide the file path, e.g., ~/Desktop/recording.mp4)"

Wait for response. Verify the file exists.

Q2: What does the app do?

"Give me a quick overview of what this app does and what the recording shows.

(e.g., 'It's a task management app. The recording shows creating a task, setting a due date, and marking it complete.')"

Wait for response. This context dramatically improves the voiceover script.

Q3: Device Frame

"Should I wrap it in a device frame?

Yes — iPhone 16 Pro (default)

Yes, specific device — (I'll show you options)

No — Keep as-is"

Wait for response. If they want a specific device, run --list-devices and let them pick.

Q3b: Device Color (if device frame selected)

"What color for the [device name]?

(list the available colors for the selected device)"

Wait for response. Show the actual color options from the device registry.

Q3c: Background Color (if device frame selected)

"What background color for the video?

Dark (#0a0a0a) — cinematic, great for social media

White (#ffffff) — clean, good for presentations/websites

Transparent — for compositing (note: output will be WebM, not MP4)

Custom — specify a hex color"

Wait for response. If transparent, set output format to WebM with alpha. If custom, validate the hex color.

Q4: Voiceover

"How should we handle the voiceover?

Generate — I'll analyze the recording and write a script that narrates what's on screen

You provide — Give me the script text

None — No voiceover"

Wait for response.

Q5: Voice Style (if voiceover enabled)

"What kind of voice?

Tone:

Professional / polished

Friendly / conversational

Energetic / excited

Calm / reassuring

Or describe your own tone

Gender preference:

Male

Female

No preference"

Wait for response.

Q6: Voice Selection (if voiceover enabled)

Based on the user's tone and gender preference, recommend a specific voice and let them confirm or pick another:

"Based on your preferences, I'd recommend [voice name] ([provider]). Here are some options:

Voice Provider Tone
Kore Gemini Friendly, clear, female
Charon Gemini Professional, authoritative, male
Puck Gemini Upbeat, energetic, male
Aoede Gemini Breezy, warm, female
nova OpenAI Friendly, natural, female
onyx OpenAI Deep, professional, male

Want to go with [recommendation], or pick a different one?"

Wait for response.

Q7: Background Music

"Want background music? If so, what vibe?

Modern / minimal — clean, tech-forward

Upbeat / positive — energetic, fun

Corporate / professional — polished, confident

Ambient / calm — soft, relaxed

Custom — describe the vibe you want

No music — voiceover only (or silent)"

Wait for response.

Q8: Output

"Where should I save the final video?

(default: same directory as the input, with _demo suffix)"

Wait for response.

Quick Reference

Question	Determines
Screen Recording	Input file
App Overview	Context for voiceover script
Device Frame	Whether to use device-framer and which device
Device Color	Color variant for the selected device
Background Color	Background behind the device frame
Voiceover	Generate script vs user-provided vs none
Voice Style	Tone and gender preference
Voice Selection	Specific TTS voice and provider
Background Music	Whether to generate music and what style
Output	Where to save final video

Step 2: Analyze the Screen Recording

Extract keyframes from the recording. Use scene detection to capture frames at actual screen transitions rather than fixed intervals — this gives you frames at the moments that matter.

python3 ${CLAUDE_PLUGIN_ROOT}/skills/app-demo-agent/scripts/extract_frames.py \
  INPUT_VIDEO \
  -o ~/demo_project/frames/ \
  --scene-detect \
  --timestamps

If scene detection doesn't find enough transitions (e.g., scrolling content), fall back to fixed intervals:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/app-demo-agent/scripts/extract_frames.py \
  INPUT_VIDEO \
  -o ~/demo_project/frames/ \
  --interval 2 \
  --max-frames 20 \
  --timestamps

Options:

Option	Default	Description
`--scene-detect`	off	Detect actual screen transitions instead of fixed intervals
`--scene-threshold`	`0.3`	Scene sensitivity 0.0-1.0, lower = more sensitive
`--interval`	`2`	Seconds between frame captures (fixed interval mode)
`--max-frames`	`20`	Maximum number of frames to extract
`-o, --output`	`./frames/`	Output directory for frames
`--timestamps`	off	Also output a `timestamps.json` mapping

This creates numbered PNG screenshots: frame_001.png, frame_002.png, etc. The timestamps.json maps each frame to its exact timestamp in the video.

Now read each frame image using the Read tool to understand the screen flow. Build a mental timeline:

0s  - App opens, home screen visible
2s  - User taps "New Task" button
4s  - Task creation form appears
6s  - User types task name
8s  - User sets due date
10s - User taps "Save"
12s - Task appears in list with checkmark

Use the user's app description (Q2) combined with what you see in the frames to understand the full flow.

Step 3: Write the Voiceover Script

⚠️ This is the most important step. The voiceover must match what's happening on screen at every moment. A generic script that doesn't align with the visuals will feel disconnected and unprofessional.

First, get the exact video duration:

ffprobe -v quiet -show_entries format=duration -of csv=p=0 INPUT_VIDEO

Then write a timed script that maps narration to the frame timeline you built in Step 2. Every line of the script should correspond to what's visible on screen at that moment.

Script Rules

Content must match the screen — if the user is tapping a button at 4s, the narration at 4s should describe that action. Never narrate something that isn't visible.
Fill the full video duration — the voiceover should naturally span the entire recording. Not too short (dead silence at the end), not too long (audio gets cut off). Aim for the TTS output to be within 2-3 seconds of the video length.
Flow with transitions — when the screen transitions between views, use that moment for a brief pause or transitional phrase. Don't talk over a screen change.
Be natural and conversational — not robotic marketing speak.
Highlight key features — call out what makes the app special as those features appear on screen.

Write the script as a timed outline first

Map each line to the timestamp where it should be spoken:

[0-3s]   "Meet TaskFlow — the simplest way to stay on top of your day."
[3-6s]   "Tap the plus button to create a new task."
[6-10s]  "Give it a name... set a due date... and you're done."
[10-14s] "Your tasks show up right on your home screen, organized by priority."
[14-17s] "One tap to mark it complete."
[17-20s] "That's it — no clutter, no complexity. TaskFlow."

Then convert to a single block for TTS

Remove the timestamps and join into flowing text. Use punctuation to control pacing:

Meet TaskFlow — the simplest way to stay on top of your day.
Tap the plus button to create a new task.
Give it a name... set a due date... and you're done.
Your tasks show up right on your home screen, organized by priority.
One tap to mark it complete.
That's it — no clutter, no complexity. TaskFlow.

Pacing guide

Video Duration	Target Word Count	Pace
15s	30-40 words	Fast, punchy
30s	60-80 words	Standard
45s	90-120 words	Comfortable
60s	120-150 words	Relaxed
90s+	~150 words/min	Natural conversational

Tips:

Short sentences work better for TTS pacing
Periods create natural pauses (~0.5s)
Ellipsis (...) creates longer pauses (~1s)
Em dashes (—) create brief pauses (~0.3s)
Front-load the hook — first sentence matters most
End with the app name or a clear CTA

Present the timed script to the user for approval before generating audio. Show both the timed outline and the final text block. Ask if they want changes to wording, pacing, or emphasis.

Step 4: Frame the Video (Optional)

If the user wants a device frame, wrap the recording:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/device-framer/scripts/frame_video.py \
  INPUT_VIDEO \
  -o ~/demo_project/framed.mp4 \
  -d iphone-16-pro --color natural-titanium \
  --bg '#0a0a0a' --scale 0.5 --padding 100

Recommended combos for demos:

Vibe	Device	Color	Background
Premium dark	iphone-16-pro	natural-titanium	`#0a0a0a`
Modern vibrant	iphone-17-pro	cosmic-orange	`#1a1a2e`
Clean light	iphone-air	cloud-white	`#f0f4f8`
Bold	iphone-16	ultramarine	`#0a0a0a`
Classic	iphone-16-pro-max	black-titanium	`#111111`

⚠️ Use --scale 0.5 for most demos — native resolution is very large.

The framed video becomes the input for the rest of the pipeline.

Step 5: Generate Voiceover Audio

Use one of the voice generation scripts:

Gemini TTS (recommended — free, high quality):

python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
  --text "Your voiceover script here..." \
  --voice Orus \
  --style "Clear, confident, app demo narration" \
  -o ~/demo_project/voiceover.wav

OpenAI TTS:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/openai_tts.py \
  --text "Your voiceover script here..." \
  --voice nova \
  -o ~/demo_project/voiceover.mp3

ElevenLabs:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/elevenlabs.py \
  --text "Your voiceover script here..." \
  --voice rachel \
  -o ~/demo_project/voiceover.mp3

⚠️ You can also use --text-file instead of --text to read the script from a file:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
  --text-file ~/demo_project/script.md \
  --voice Kore \
  --style "Friendly, clear, app demo narration" \
  -o ~/demo_project/voiceover.wav

Duration Check (CRITICAL)

After generating the voiceover, immediately check if it matches the video duration:

# Get voiceover duration
VO_DUR=$(ffprobe -v quiet -show_entries format=duration -of csv=p=0 ~/demo_project/voiceover.wav)

# Get video duration
VID_DUR=$(ffprobe -v quiet -show_entries format=duration -of csv=p=0 INPUT_VIDEO)

echo "Voiceover: ${VO_DUR}s, Video: ${VID_DUR}s"

If there's a mismatch greater than 3 seconds, fix it before proceeding:

Situation	Fix
Voiceover too long	Shorten the script — remove filler words, tighten phrasing, cut a line. Regenerate.
Voiceover slightly long (< 5s over)	Extend the video with a freeze frame at the end: `ffmpeg -i video.mp4 -vf "tpad=stop_mode=clone:stop_duration=5" -c:a copy extended.mp4`
Voiceover too short	Add more descriptive lines, slow the pace with pauses (ellipsis), or add a closing line. Regenerate.
Voiceover slightly short (< 3s under)	Acceptable — the video will have a brief silent outro which can feel natural.

Always prefer adjusting the script over stretching the video. The script should be written to fit the video, not the other way around.

Voice Recommendations for Demos

Tone	Gemini Voice	OpenAI Voice	ElevenLabs Voice
Default / Professional	Orus (recommended)	onyx	josh
Friendly	Kore	nova	rachel
Energetic	Puck	echo	domi
Calm	Aoede	shimmer	bella
Authoritative	Charon	alloy	adam

Default voice: Orus (Gemini TTS) — firm, clear, professional. Works well for most app demos.

Step 6: Generate Background Music (Optional)

If the user wants music:

⚠️ Always match the video duration exactly. Get the video duration first, then pass it to the music generator:

# Get exact video duration
DURATION=$(ffprobe -v quiet -show_entries format=duration -of csv=p=0 INPUT_VIDEO | cut -d. -f1)

python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
  --prompt "modern, minimal, tech product demo, clean, upbeat, positive" \
  --duration $DURATION \
  -o ~/demo_project/music.wav

Music prompts for demos:

App Type	Music Prompt
Productivity	"minimal, modern, clean, focused, light electronic"
Social/Fun	"upbeat, playful, positive, acoustic, indie"
Finance/Business	"professional, confident, modern, ambient, corporate"
Health/Wellness	"calm, organic, warm, gentle, ambient"
Gaming	"energetic, electronic, dynamic, exciting, bass"
Creative tool	"inspiring, flowing, ambient, creative, modern"

Step 7: Mix Audio

If you have both voiceover and music, mix them:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_mix.py \
  --voice ~/demo_project/voiceover.wav \
  --music ~/demo_project/music.wav \
  -o ~/demo_project/final_audio.mp3 \
  --music-volume 0.2 \
  --fade-in 1.0 \
  --fade-out 2.0

Recommended settings for demos:

--music-volume 0.15 to 0.25 — music should be subtle under narration
--fade-in 1.0 — gentle music intro
--fade-out 2.0 — clean ending

If voiceover only (no music), skip this step and use the voiceover file directly.

Step 8: Merge Audio with Video

Strip existing audio from the video (if any) and merge the new audio:

# Strip existing audio first
python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/video_strip_audio.py \
  -i ~/demo_project/framed.mp4 \
  -o ~/demo_project/silent_video.mp4

# Merge new audio
python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/video_audio_merge.py \
  --video ~/demo_project/silent_video.mp4 \
  --audio ~/demo_project/final_audio.mp3 \
  -o ~/demo_project/output/app_demo_final.mp4

If the video has no audio to strip, merge directly:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/video_audio_merge.py \
  --video ~/demo_project/framed.mp4 \
  --audio ~/demo_project/final_audio.mp3 \
  -o ~/demo_project/output/app_demo_final.mp4

Step 9: Deliver

Present the final video to the user:

Tell them where the output file is
Mention the duration and file size
Offer adjustments:
- "Want me to change the voiceover script?"
- "Different voice or music style?"
- "Different device frame or background color?"
- "Adjust the music volume?"

Project Structure

demo_project/
├── frames/                   # Extracted keyframes for analysis
│   ├── frame_001.png
│   ├── frame_002.png
│   └── timestamps.json       # Frame timestamp mapping
├── framed.mp4                # Device-framed video (if applicable)
├── voiceover.wav             # Generated voiceover
├── music.wav                 # Generated background music
├── final_audio.mp3           # Mixed voiceover + music
├── silent_video.mp4          # Video with audio stripped
├── script.md                 # Voiceover script (for reference)
└── output/
    └── app_demo_final.mp4    # Final deliverable

Full Pipeline (Quick Reference)

For quick reference, here's the full pipeline in order:

SKILL="${CLAUDE_PLUGIN_ROOT}/skills"
PROJECT=~/demo_project
INPUT=recording.mp4

# 1. Extract frames for analysis
python3 $SKILL/app-demo-agent/scripts/extract_frames.py $INPUT -o $PROJECT/frames/ --timestamps

# 2. [Claude analyzes frames + writes voiceover script]

# 3. Frame the video (optional)
python3 $SKILL/device-framer/scripts/frame_video.py $INPUT -o $PROJECT/framed.mp4 \
  -d iphone-16-pro --color natural-titanium --bg '#0a0a0a' --scale 0.5

# 4. Generate voiceover
python3 $SKILL/voice-generation/scripts/gemini_tts.py \
  --text-file $PROJECT/script.md --voice Orus \
  --style "Clear, confident, app demo narration" -o $PROJECT/voiceover.wav

# 5. Generate music (optional)
python3 $SKILL/music-generation/scripts/lyria.py \
  --prompt "minimal, modern, tech demo" --duration 30 -o $PROJECT/music.wav

# 6. Mix audio
python3 $SKILL/media-utils/scripts/audio_mix.py \
  --voice $PROJECT/voiceover.wav --music $PROJECT/music.wav \
  -o $PROJECT/final_audio.mp3 --music-volume 0.2

# 7. Strip + merge
python3 $SKILL/media-utils/scripts/video_strip_audio.py -i $PROJECT/framed.mp4 -o $PROJECT/silent.mp4
python3 $SKILL/media-utils/scripts/video_audio_merge.py \
  --video $PROJECT/silent.mp4 --audio $PROJECT/final_audio.mp3 \
  -o $PROJECT/output/app_demo_final.mp4

What You Can Create

Input	Output
Raw screen recording	Polished demo with voiceover
App walkthrough capture	Narrated product tour
Feature demo recording	Marketing-ready feature highlight
Tutorial screen capture	Professional tutorial with narration
Bug reproduction video	Annotated bug report video
Prototype recording	Investor demo with voiceover

Prerequisites

FFmpeg — brew install ffmpeg (required for all media processing)
Pillow — pip install Pillow (required if using device framing)
Voice generation — at least one of:
- GOOGLE_API_KEY or GOOGLE_CLOUD_PROJECT (Gemini TTS — recommended)
- OPENAI_API_KEY (OpenAI TTS)
- ELEVENLABS_API_KEY (ElevenLabs)
Music generation (optional) — GOOGLE_API_KEY or GOOGLE_CLOUD_PROJECT for Lyria

Voice & Music Pairing Guide

Demo Style	Voice (Gemini)	Music Prompt
General / default	Orus (firm)	"modern, clean, minimal, professional"
SaaS product tour	Orus (firm)	"modern, clean, minimal, professional"
Mobile app showcase	Puck (upbeat)	"upbeat, positive, mobile, fresh"
Enterprise demo	Charon (authoritative)	"corporate, confident, ambient, subtle"
Creative tool	Aoede (breezy)	"inspiring, creative, flowing, warm"
Developer tool	Orus (firm)	"tech, minimal, electronic, focused"
Consumer app	Kore (friendly)	"playful, modern, light, accessible"

Limitations

Requires existing screen recording — this skill does not capture screens
Frame analysis depends on image clarity — blurry recordings produce weaker scripts
Video duration limits TTS — very long recordings (5+ min) may need the script split into segments
Device framer adds processing time — especially for long videos
Music generation — Lyria clips max at ~30s, may need to loop for longer videos

Error Handling

Error	Solution
"FFmpeg not found"	Install: `brew install ffmpeg`
"No frames extracted"	Check input video is valid, try lower `--interval`
"GOOGLE_API_KEY not set"	Set up API key per main README
"Pillow not installed"	Run: `pip install Pillow`
Audio/video duration mismatch	Voiceover script may be too long/short — adjust and regenerate
Device framer fails	Check frame PNG files exist in device-framer/frames/

Example Prompts

Simple:

"Here's a screen recording of my app — turn it into a polished demo video"

With details:

"I have a screen recording at ~/Desktop/myapp.mov. It's a fitness app showing the workout tracking feature. Put it in an iPhone 16 Pro frame, add a friendly voiceover, and some upbeat background music."

Minimal:

"Add voiceover narration to this screen recording: ~/recordings/demo.mp4"

Full control:

"Take ~/Desktop/recording.mp4, frame it in iPhone 17 Pro cosmic orange on dark background, generate a professional voiceover script, use Gemini TTS with Charon voice, add minimal electronic background music, save to ~/Desktop/final_demo.mp4"

Voice	Provider	Tone
Kore	Gemini	Friendly, clear, female
Charon	Gemini	Professional, authoritative, male
Puck	Gemini	Upbeat, energetic, male
Aoede	Gemini	Breezy, warm, female
nova	OpenAI	Friendly, natural, female
onyx	OpenAI	Deep, professional, male