with one click
manim-video-with-tts-ark-plan
// 【Ark Agent Plan 专用版本】Manim 数学/算法讲解视频完整流水线,使用火山引擎 TTS 中文旁白(与 Seedream/Seedance 共享认证)。Plan → TTS → Code → Render → Stitch → Deliver. 适用于:Manim 动画 + 中文配音、音画同步讲解视频、3Blue1Brown 风格教学视频。
// 【Ark Agent Plan 专用版本】Manim 数学/算法讲解视频完整流水线,使用火山引擎 TTS 中文旁白(与 Seedream/Seedance 共享认证)。Plan → TTS → Code → Render → Stitch → Deliver. 适用于:Manim 动画 + 中文配音、音画同步讲解视频、3Blue1Brown 风格教学视频。
Use when generating PPT-style image slides, poetic presentation covers, quiet paper-texture visual pages, report pages, invitations, social cards, or slide-image sets with GPT-Image-2 via image_generate.
Use when generating high-readability hand-drawn knowledge diagrams, architecture diagrams, workflow maps, or consulting-style visual explanations with GPT-Image-2 via image_generate.
可独立运行的 GPT-Image 增强版 EPUB2Podcast:在本地把 EPUB 转成双人中文音频、GPT-Image/Smart Slide 视觉页、最终 MP4,并生成 YouTube 发布素材。
可独立运行的 standalone 版 EPUB2Podcast:用户只需下载当前项目本身,即可在本地把 EPUB 转成 Smart Slide + 双人中文音频 + 最终 MP4 视频播客。
【Ark Agent Plan 专用版本】EPUB 转双人中文播客视频流水线:使用火山引擎 TTS(与 Seedream/Seedance 共享技术栈),Smart Slide + 双人音频 + 最终 MP4 视频,无需额外 Google/OpenRouter API Key。
【Ark Agent Plan 专用版本】基于 Remotion 的英文词汇视频自动化生成流水线。输入一个英文单词,自动完成:诊断、火山引擎 TTS 音频(与 Seedream/Seedance 共享认证)、节奏分割、视频渲染、飞书上传和成本汇报。
| name | manim-video-with-tts-ark-plan |
| description | 【Ark Agent Plan 专用版本】Manim 数学/算法讲解视频完整流水线,使用火山引擎 TTS 中文旁白(与 Seedream/Seedance 共享认证)。Plan → TTS → Code → Render → Stitch → Deliver. 适用于:Manim 动画 + 中文配音、音画同步讲解视频、3Blue1Brown 风格教学视频。 |
| version | 1.0.0 |
| metadata | {"hermes":{"tags":["manim","animation","video","volcengine","tts","seedream","seedance","math","ark-plan"],"related_skills":["epub2podcast","feishu-seedance-video-pipeline","vocabulary-video-pipeline"]}} |
Use this skill when the user requests:
PLAN → TTS → CODE → DRAFT → PRODUCTION → STITCH → MUX → DELIVER
| Step | Tool | Output |
|---|---|---|
| 1. Plan | Markdown | plan.md — narrative arc, scene breakdown, voiceover script, color palette |
| 2. TTS | Volcengine TTS | audio/s01.mp3 ~ sNN.mp3 + duration manifest |
| 3. Code | Manim CE | script.py — one class per scene, run_times synced to audio |
| 4. Draft | manim -ql | 480p15 preview clips for rhythm verification |
| 5. Production | manim -qh | 1080p60 final scene clips |
| 6. Stitch | ffmpeg concat | video_pro.mp4 — seamless scene assembly |
| 7. Mux | ffmpeg | final.mp4 — video + narration mixed |
| 8. Deliver | lark-cli + web | Feishu Drive link + HTTPS direct link |
VOLCENGINE_ACCESS_TOKEN,无需额外配置)/var/www/hermes.aigc.green/media/Write plan.md before any code. Include:
Rule: Scene count = audio segment count. Each scene gets exactly one TTS clip.
Generate one MP3 per scene using Volcengine TTS:
volc-tts --text '第N段旁白' --voice female --output ./audio/s0N.mp3
Then measure exact durations:
for f in audio/*.mp3; do
dur=$(ffprobe -v error -show_entries format=duration -of csv=p=0 "$f")
echo "$f: ${dur}s"
done
Record these durations — they are the master clock for all Manim timing.
Since Manim does not natively support Volcengine TTS, use the manual sync approach:
# In each Scene.construct(), the sum of:
# all run_time values + all self.wait() values
# must equal the corresponding audio clip duration.
To avoid rebuilding axes/curves every scene (wasting precious audio time):
class S3_NextScene(Scene):
def construct(self):
self.camera.background_color = BG
axes = Axes(...)
sin_curve = axes.plot(...)
prev_curve = axes.plot(...)
# Carry over from previous scene — instant, costs 0s
self.add(axes, sin_curve, prev_curve)
# Only animate NEW elements
self.play(ReplacementTransform(prev_curve, next_curve), run_time=2.0)
self.wait(...)
# ...
Critical: The last frame of Scene N must match the first frame of Scene N+1 for seamless ffmpeg concatenation. If Scene N fades out an element, Scene N+1 must not self.add() it.
For a scene with audio duration D:
D = Σ(run_times) + Σ(waits)
Recommended allocation:
BG = "#0A0A0F" # deep void black
SIN_COLOR = "#FF4D4D" # coral red — target curve
APPROX_COLOR= "#00F5FF" # electric cyan — approximation
TEXT_COLOR = "#FFFFFF" # pure white
ACCENT = "#FFD93D" # golden yellow
GRID_COLOR = "#2A2A3A" # dim purple-grey
MONO = "FreeMono" # or "DejaVu Sans Mono"
manim -ql script.py Scene1 Scene2 Scene3 ...
Verify each clip duration matches audio:
ffprobe -v error -show_entries format=duration -of csv=p=0 media/videos/script/480p15/Scene1.mp4
If mismatch > 0.1s, adjust run_time or self.wait() in code and re-render.
manim -qh script.py Scene1 Scene2 Scene3 ...
# Video concat
cat > concat_video.txt << 'EOF'
file 'media/videos/script/1080p60/Scene1.mp4'
file 'media/videos/script/1080p60/Scene2.mp4'
...
EOF
ffmpeg -y -f concat -safe 0 -i concat_video.txt -c copy video_pro.mp4
# Audio concat
cat > concat_audio.txt << 'EOF'
file 'audio/s01.mp3'
file 'audio/s02.mp3'
...
EOF
ffmpeg -y -f concat -safe 0 -i concat_audio.txt -c copy audio_full.mp3
# Mux
ffmpeg -y -i video_pro.mp4 -i audio_full.mp3 -c:v copy -c:a aac -b:a 192k final.mp4
cp final.mp4 /var/www/hermes.aigc.green/media/<filename>.mp4
# URL: https://hermes.aigc.green/media/<filename>.mp4
cd <project-dir>
lark-cli drive +upload --file ./final.mp4 --name "<Title>.mp4"
lark-cli drive metas batch_query --data '{"request_docs":[{"doc_token":"<TOKEN>","doc_type":"file"}],"with_url":true}'
Extract url from response for Feishu direct link.
| Stage | Resolution | FPS | Typical Speed |
|---|---|---|---|
Draft (-ql) | 854×480 | 15 | 5–15s/scene |
Production (-qh) | 1920×1080 | 60 | 30–120s/scene |
When adapting a technical explainer, distinguish two independent dimensions:
These require separate rewrites. Do not try to make one version serve both purposes.
When adapting for younger audiences (elementary/middle school), do not tweak—rewrite entirely. The kids version requires new plan, new script, new TTS, and new code.
| Aspect | Technical (Adult) | Kids/Student |
|---|---|---|
| Narrative hook | "泰勒展开" / "Taylor Expansion" | "计算器是怎么算出来的?" |
| Central metaphor | Mathematical approximation | Building blocks (积木) |
| Character naming | "目标曲线" / "逼近曲线" | 小红 & 小蓝 (give curves personalities) |
| Pace | Tight, efficient | +20-30% duration, longer pauses between concepts |
| Language | Technical terms (derivative, order, convergence) | Shape, slope, bend, "pretend", "looks like" |
| Voice | Neutral | Female voice (perceived as more approachable) |
| Colors | High-contrast electric | Softer, friendlier palette |
Kids-Optimized Color Palette:
BG = "#0D1117" # deep void (same)
SIN_COLOR = "#FF6B6B" # coral red — warmer, less aggressive
APPROX_COLOR= "#4ECDC4" # mint cyan — softer than electric
TEXT_COLOR = "#F0F6FC" # off-white, easier on eyes
ACCENT = "#FFE66D" # golden yellow — magic/block highlights
GRID_COLOR = "#30363D" # dim grey
MONO = "FreeMono"
Kids Script Guidelines:
Timing Adjustments for Kids:
# Adult version
self.wait(0.3) # between phrases
# Kids version
self.wait(0.5) # longer breathing room
Add suspense pauses before reveals: self.wait(1.0) before "Here's the secret..."
When the audience is Chinese students, apply the Bilingual Annotation Principle:
Every English mathematical term must appear with its Chinese equivalent on first use.
This applies to both narration and on-screen text.
| Rule | Example | Bad | Good |
|---|---|---|---|
| Narration: Chinese first, then English in parentheses | sine | "sine" | "正弦(sine)" |
| Narration: Read formulas in Chinese, not symbols | y=x | "y equals x" | "y 等于 x(y=x)" |
| Narration: Describe operations in Chinese | x³/6 | "x cubed over six" | "x 的立方除以六" |
| On-screen labels: 100% Chinese | title | "Math Magic" | "数学的魔法" |
| On-screen labels: Chinese with English sub-label | formula label | "good fit!" | "很像!" |
| Math formulas: Keep standard notation on screen | y=x | Remove formula | Keep y=x on screen, but narration says "y 等于 x" |
Critical: The mathematical notation (MathTex) on screen should never be removed or translated. The formula y = x, x - x³/6, etc. must remain in standard mathematical notation. Only the narration and text labels are localized.
| English Label | 中文标签 | Notes |
|---|---|---|
| Math Magic | 数学的魔法 | Title |
| How does a calculator know sine? | 计算器是怎么算出正弦的? | Hook question |
| 小红 | 小红 | Character name (keep simple) |
| Let's pretend with a straight line! | 用直线来假装! | Subtitle |
| Add a magic block! | 加一块魔法积木! | Subtitle |
| good fit! | 很像! | Region label |
| much better! | 更像了! | Region label |
| Another block! | 再加一块! | Subtitle |
| Here's the secret... | 秘密在这里... | Subtitle |
| shape, slope, bend | 形状、斜率、弯曲度 | Concept text |
| Infinite simple blocks | 无限个简单积木 | Closing text |
| building something complex & beautiful | 搭出复杂又美丽的东西 | Closing text |
In practice, producing a China-localized kids version requires three iterations:
| Version | Language | Labels | Audience |
|---|---|---|---|
| V1: Adult Technical | English terms, English labels | English | Global adult |
| V2: Generic Kids | English terms, mixed labels | English/Chinese | International kids |
| V3: China-localized Kids | Chinese + bilingual annotations | 100% Chinese | Chinese students |
Workflow for V3:
# 1. Create entirely new plan (do not reuse V1/V2 plan)
cat > plan-kids-cn.md << 'EOF'
## Audience: Chinese Elementary/Early Middle School
## Principle: Bilingual annotation (Chinese first, English in parens)
## Labels: 100% Chinese
## Voice: Female Chinese TTS
EOF
# 2. Write Chinese narration with bilingual annotations
# Example: "你有没有想过,计算器是怎么算出正弦(sine)的?"
# 3. Generate TTS with female voice
volc-tts --text '...' --voice female --output ./audio-kids-cn/s01.mp3
# 4. Create new script file with Chinese labels
# script-kids-cn.py — all Text() objects use Chinese strings
# 5. Render, stitch, deliver separately
| Scene | V1 Adult (EN) | V2 Kids (Mixed) | V3 China Kids (CN+EN) |
|---|---|---|---|
| S1 | "sin x Maclaurin expansion" | "How does a calculator know sine?" | "你有没有想过,计算器是怎么算出**正弦(sine)**的?" |
| S3 | "First order gives tangent y=x" | "y=x fools the eye!" | "y 等于 x(y=x),居然在中心骗过了眼睛!" |
| S4 | "Introduce cubic correction -x³/6" | "Add a block: x³/6" | "数学家加了一块积木:减去 x 的立方除以六" |
| S6 | "Match derivatives at zero" | "Shape and slope match" | "形状、斜率、弯曲程度,都和小红一模一样!" |
Chinese TTS cannot naturally read mathematical symbols. Never write:
volc-tts --text 'x³/6' — TTS will say random characters or failvolc-tts --text 'y=x' — TTS may read as "y dengyu x" (awkward)Always write the spoken Chinese description:
volc-tts --text '减去 x 的立方除以六' — natural Chinesevolc-tts --text 'y 等于 x' — natural ChineseThe screen still shows x - x³/6 and y=x via MathTex. The narration describes them in Chinese. This dual-channel approach (visual: standard notation, audio: Chinese description) is the core of bilingual annotation.
# 1. Save adult plan as baseline
cp plan.md plan-adult.md
# 2. Create entirely new plan
cat > plan-kids.md << 'EOF'
# Math Magic: How Calculators Know Sine
## Audience: Elementary/Early Middle School
## Metaphor: Building Blocks
## Characters: 小红 (target), 小蓝 (building)
EOF
# 3. Generate new TTS with female voice
volc-tts --text '...' --voice female --output ./audio-kids/s01.mp3
# 4. Create new script file (do not reuse adult script)
# script-kids.py with adjusted timings and softer colors
# 5. Render, stitch, deliver separately from adult version
FreeMono or DejaVu Sans Mono. Menlo is macOS-only.SurroundingRectangle requires a Mobject, not a coordinate. Use Rectangle(...).move_to(point) instead.-ql) can have up to ~0.07s timing drift per animation. Production (-qh) at 60fps reduces this to ~0.017s.self.add()s, ensure the last frame of N matches the first frame of N+1. Otherwise a "flash" occurs at the stitch point.y_range wide enough to show the "shrinking" effect visually, or clip x_range to safe bounds.run_time + wait sum matches audio duration within 0.1s-shortest truncation