Run any Skill in Manus with one click

$pwd:

create-news-video

Name: Create News Video
Author: hoquanghai

// Tạo video tin tức ngắn 9:16 (~60s) từ URL bài báo hoặc file .txt tiếng Việt. Trigger khi user yêu cầu tạo video tin tức, làm short news, làm bản tin video, render tin thành video, làm TikTok tin tức. Output: video.mp4 + voice.mp3 + script.txt cho CapCut.

Run Skill in Manus

$ git log --oneline --stat

stars:255

forks:149

updated:April 29, 2026 at 12:43

SKILL.md

readonly

package.json

"author": "hoquanghai"

"repository": "hoquanghai/Auto-Create-Video"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

News Analysts, Reporters, and JournalistsArts, Design, Entertainment, Sports, and Media Occupations27-3023L4

Run any Skill with one click

name	create-news-video
description	Tạo video tin tức ngắn 9:16 (~60s) từ URL bài báo hoặc file .txt tiếng Việt. Trigger khi user yêu cầu tạo video tin tức, làm short news, làm bản tin video, render tin thành video, làm TikTok tin tức. Output: video.mp4 + voice.mp3 + script.txt cho CapCut.

Create News Video Skill

Generate a Vietnamese 9:16 motion-graphic news video from a URL or .txt file.

Input

Single argument: a news article URL (starts with http:// or https://) OR a path to a .txt file.

Workflow (MUST follow these steps in order)

Step 1: Detect input type

Starts with http:// or https:// → URL mode
Otherwise → file mode

Step 2: Fetch content

URL mode:

Use WebFetch with prompt:

Trích xuất từ trang này:
- title (string): tiêu đề bài báo
- content (string): nội dung chính, ~500-1500 từ
- ogImage (string|null): URL ảnh og:image (meta og:image hoặc ảnh đầu bài)
- domain (string): domain của URL (vd "vnexpress.net")
Trả về JSON với 4 field trên.

If WebFetch fails (paywall, JS-rendered, 4xx) → tell user to save content to a .txt file and pass that instead. Stop.

File mode:

Use Read to read the .txt file
Title = first non-empty line (strip whitespace, max 80 chars)
Content = remaining lines joined
ogImage = null
domain = "local"

Step 3: Create slug + output directory

slug = lowercase ASCII (strip Vietnamese diacritics, đ→d), replace non-alphanumeric with -, trim dashes, max 40 chars
timestamp = current local time as YYYYMMDD-HHmm
outputDir = output/<slug>-<timestamp>/
Use Bash: mkdir -p <outputDir>

Step 4: Generate script.json

Following the schema in docs/superpowers/specs/2026-04-29-auto-news-video-design.md Section 4. Key rules:

Script content (Vietnamese):

Total voiceText: ~150–200 words → ~55–65s spoken at speed 1.0
Number of scenes: 5–8 (1 hook + 3–6 body + 1 outro)
Each scene voiceText is 1-3 short sentences, văn nói (spoken style, not formal)
No emoji, no markdown in voiceText

⚠️ CRITICAL: Vietnamese TTS Phonetic Rules

The voiceText field is read aloud by LucyLab/ElevenLabs Vietnamese TTS. Numbers and symbols are read literally — if you write "5.5", TTS may say "năm rưỡi" (five and a half — WRONG for version numbers). Always spell out numbers in Vietnamese phonetic form in voiceText. The templateData fields (visual text on screen) can keep the original "5.5" / "82.7%" formatting.

Mandatory rules for voiceText:

Number form	WRONG (TTS misreads)	RIGHT (spell out in Vietnamese)
Decimal version	`GPT 5.5` → "năm rưỡi" ❌	`GPT năm chấm năm` ✅
Decimal stat	`82.7%`	`tám mươi hai phẩy bảy phần trăm`
Version	`iPhone 17`	`iPhone mười bảy` (or `iPhone 17` works for whole numbers)
Version with point	`iOS 18.2`	`iOS mười tám chấm hai`
Tech spec	`200MP`	`hai trăm megapixel`
Battery	`5000mAh`	`năm nghìn miliampe giờ`
Tokens	`1M tokens` / `1000000 tokens`	`một triệu token`
Price VND	`21 triệu đồng`	`hai mươi mốt triệu đồng`
Price USD	`$5`	`năm đô la` (or `năm đô`)
Multiplier	`2x`	`gấp đôi` (more natural than "hai lần")
Year	`2026`	`hai nghìn không trăm hai mươi sáu` (or just `năm 2026` reads OK)
Percentage with decimal	`30%`	`ba mươi phần trăm`
Time	`60 giây`	`sáu mươi giây`
Frequency	`5G`	`năm gờ` (be careful — TTS often says "năm-gờ")

Notation choices:

For decimal point use chấm (more spoken/natural) or phẩy (formal). Both work; pick consistent.
For comma separator, use phẩy (e.g. "1,000" → "một nghìn")
For ratio "3:1" → say ba trên một or ba so với một

English brand names — keep as-is, TTS handles them OK:

Apple, Google, OpenAI, Microsoft, TikTok, YouTube ✅

English acronyms — write phonetically if TTS misreads:

AI → usually OK, sometimes write ây ai for clarity
API → write ây pi ai if matter
GPT → usually OK; if not, write gí pi tí
iOS → write ai ô ét if matter

Symbols to AVOID in voiceText:

→ & % $ # + = (TTS may say literal name or skip)
! ? at end of sentence is OK — they create natural intonation
Emoji: NEVER (TTS pronounces or skips inconsistently)
URLs: NEVER (TTS reads dot/slash literally)

End each voiceText sentence with . or ? for natural pause/intonation.

Examples — full scene:

WRONG (will sound bad):

{ "voiceText": "GPT 5.5 đạt 82.7% trên Terminal-Bench, vượt GPT 5.4 (75.1%)." }

→ TTS reads: "GPT năm rưỡi đạt tám mươi hai chấm bảy phần trăm trên Terminal-Bench..."

RIGHT (natural):

{ "voiceText": "GPT năm chấm năm đạt tám mươi hai phẩy bảy phần trăm trên Terminal Bench, vượt phiên bản năm chấm bốn ở mức bảy mươi lăm phẩy một." }

Note: templateData (text on screen) CAN use original formatting — the visual is separate from spoken:

{
  "voiceText": "GPT năm chấm năm đạt tám mươi hai phẩy bảy phần trăm.",
  "templateData": {
    "template": "stat-hero",
    "value": "82.7%",                 ← Visual: keep readable formatting
    "label": "Terminal-Bench"
  }
}

Hook (most important — gets first 3 seconds of viewer attention):

Must contain a claim, statistic, or curious question
NEVER generic ("Hôm nay chúng ta sẽ nói về..." is wrong)
ALWAYS include at least 1 effect: flash-white-3f or particle-burst

Visual rules:

For image scenes: background.src = "$source.image" (literal — CLI substitutes)
Vary kenBurns across scenes (don't use zoom-in for every scene)
Vary text animation (don't use slide-up for every line)
Each line ≤ 25 characters
Each scene 1-3 lines

Outro (always fixed format):

{
  "id": "outro",
  "type": "outro",
  "voiceText": "Theo dõi Công nghệ 24h để xem bản tin mới mỗi ngày.",
  "visual": {
    "background": { "type": "gradient", "preset": "outro-purple" },
    "text": {
      "position": "center",
      "style": "outro-card",
      "lines": [
        { "content": "Xem bản tin mới mỗi ngày", "emphasis": "primary", "animation": "fade-in" },
        { "content": "Công nghệ 24h",            "emphasis": "channel", "animation": "scale-pop" },
        { "content": "Nguồn: <DOMAIN>",          "emphasis": "muted",   "animation": "fade-in-late" }
      ]
    }
  }
}

Replace <DOMAIN> with the actual domain string. Note: outro line 1 is shortened to fit 25-char schema rule (full CTA "Theo dõi để xem bản tin mới mỗi ngày" is 36 chars).

Step 5: Self-validate before writing

Check:

Total word count ~150-200
Every line.content ≤ 25 chars
5-8 scenes total
scenes[0].type === "hook"
last scene type === "outro"
All enum values valid (see spec Section 4.2)

If invalid, fix yourself silently. Up to 2 self-correction passes. After that, write anyway — the CLI's Zod validation will produce a precise error message that the user can act on.

Step 6: Write script.json

Use the Write tool (not Bash) to write the validated JSON to <outputDir>/script.json.

Step 7: Run the pipeline

Use Bash, foreground (not background), stream output:

npm run pipeline -- <outputDir>/script.json

If exit code != 0:

Report the error message clearly
Tell user the output dir path so they can inspect intermediate files

Step 8: Report success

If successful, report to user with markdown links:

✓ Video:  [video.mp4](output/<slug>-<timestamp>/video.mp4)
✓ Audio:  [voice.mp3](output/<slug>-<timestamp>/voice.mp3) — for CapCut
✓ Script: [script.txt](output/<slug>-<timestamp>/script.txt) — for CapCut auto-caption
Tổng thời lượng: XX.Xs

Examples

Example 1: URL with image (vnexpress)

User: /create-news-video https://vnexpress.net/iphone-17-200mp

Generated script.json (excerpt):

{
  "version": "1.0",
  "metadata": {
    "title": "Apple ra mắt iPhone 17 với camera 200MP",
    "source": {
      "url": "https://vnexpress.net/iphone-17-200mp",
      "domain": "vnexpress.net",
      "image": "https://i1-vnexpress.vnecdn.net/iphone17.jpg"
    },
    "channel": "Công nghệ 24h"
  },
  "voice": { "provider": "lucylab", "voiceId": "${VIETNAMESE_VOICEID}", "speed": 1.0 },
  "scenes": [
    {
      "id": "hook", "type": "hook",
      "voiceText": "Apple vừa ra mắt iPhone 17 với camera hai trăm megapixel.",
      "visual": {
        "background": { "type": "image", "src": "$source.image", "kenBurns": "zoom-in" },
        "overlay":    { "darkness": 0.4 },
        "text": {
          "position": "center", "style": "hook-large",
          "lines": [
            { "content": "iPhone 17",     "emphasis": "primary", "animation": "scale-pop" },
            { "content": "Camera 200MP!", "emphasis": "accent",  "animation": "slide-up-bounce" }
          ]
        },
        "effects": ["flash-white-3f", "particle-burst"]
      }
    }
    /* ... 3 body scenes + outro ... */
  ]
}

Example 2: .txt file with no image (local)

User: /create-news-video news/agi-update.txt

Generated script.json (excerpt):

{
  "metadata": {
    "title": "OpenAI công bố mô hình mới với khả năng lập luận",
    "source": { "url": "local", "domain": "local", "image": null },
    "channel": "Công nghệ 24h"
  },
  "scenes": [
    {
      "id": "hook", "type": "hook",
      "voiceText": "OpenAI vừa công bố mô hình mới có khả năng lập luận như con người.",
      "visual": {
        "background": { "type": "gradient", "preset": "news-dark" },
        "text": {
          "position": "center", "style": "hook-large",
          "lines": [
            { "content": "Mô hình mới", "emphasis": "primary", "animation": "scale-pop" },
            { "content": "Lập luận!",  "emphasis": "accent",  "animation": "slide-up-bounce" }
          ]
        },
        "effects": ["flash-white-3f"]
      }
    }
    /* ... outro line 3 = "Nguồn: local" ... */
  ]
}

Note: when source has no image, every scene uses background.type = "gradient" (no image fallback at composer level needed).

Sound Effects (SFX)

You almost never need to set the sfx field. The pipeline has a smart 3-tier selector that picks the right SFX for each scene automatically:

If scene.sfx is set → use exactly that (override).
Else, scan voiceText for semantic keywords:
- cảnh báo / rủi ro / nguy hiểm / warning → alert/
- kỷ lục / vượt / xuất sắc / breakthrough / success → success/
- thất bại / sai / lỗi / fail / wrong → fail/
- ra mắt / công bố / lần đầu / launch / unveil → reveal/
- đếm ngược / tích tắc / countdown → countdown/
- hùng vĩ / hoành tráng / cinematic / epic → cinematic/
- hồi hộp / chờ đợi / drumroll / suspense → drumroll/
Else, fall back to template default category:
- hook → transition/ or cinematic/
- comparison → transition/ or emphasis/
- stat-hero → emphasis/ or success/
- feature-list → transition/ or emphasis/
- callout → alert/ or drumroll/
- outro → outro/ or success/

Within a category, the actual file is picked deterministically by hashing the scene id — same script gives same SFX (idempotent), but different scenes in the same video get different files (variety).

This means: in 95% of cases you should OMIT the sfx field entirely. Just write good Vietnamese voiceText with natural keywords (warning, breakthrough, launch, etc.) and the pipeline will pick the right sound.

When to add explicit `sfx` override

Only when you want to FORCE a specific sound that the keyword matcher won't infer:

Scene needs a particular signature sound (e.g., always a gong on important scenes)
Disable SFX for a particular scene: "sfx": { "name": "none" }
Use a specific file: "sfx": { "name": "transition/whoosh-sfx", "volume": 0.4 }

Example (rarely needed):

{
  "id": "body-3",
  "voiceText": "...",
  "templateData": { ... },
  "sfx": { "name": "drumroll/snare-roll", "volume": 0.5, "startOffsetSec": 0.2 }
}

The pipeline auto-mixes a sound effect at each scene start based on the template type:

Template	Default SFX	Sound character
`hook`	`transition/whoosh-soft`	Dramatic entrance
`comparison`	`transition/swoosh`	Side-by-side reveal
`stat-hero`	`emphasis/ding`	Number reveal
`feature-list`	`transition/pop`	Bullet appearance
`callout`	`alert/notification`	Important info
`outro`	`outro/tada`	Ending signature

You usually do NOT need to add a sfx field — defaults work for 95% of cases.

ONLY add an explicit sfx override when content STRONGLY suggests a different mood:

Content cue (in voiceText)	Override
"cảnh báo", "rủi ro", "đáng lo", "nguy hiểm"	`{ "name": "alert/notification", "volume": 0.4 }`
"vượt", "kỷ lục", "xuất sắc", "tăng mạnh" (positive stat)	`{ "name": "emphasis/chime", "volume": 0.35 }`
Want to disable SFX for this scene	`{ "name": "none" }`

Place sfx at the same level as voiceText and templateData:

{
  "id": "body-3",
  "type": "body",
  "voiceText": "Cảnh báo: AI tự chủ có thể đặt ra rủi ro về an ninh mạng.",
  "templateData": { "template": "callout", ... },
  "sfx": { "name": "alert/notification", "volume": 0.4 }
}

Available SFX categories (any <name> subfolder in assets/sfx/<category>/<name>.mp3):

transition/ — whoosh, swoosh, swish, pop, punch, page-flip, slide, riser
emphasis/ — ding, tick, chime, ping, bong, pop, punch
alert/ — notification, alert, alarm, warning
success/ — applepay, achievement, win, xbox, steam, jet-set
fail/ — wrong-answer-buzzer, incorrect, error, dank-meme
outro/ — tada, win31, noooo
reveal/ — magic-fairy, anime-girl, hey-female-voice
drumroll/ — snare, drum-roll, boom
countdown/ — beep, timer
cinematic/ — rise, impact

Browse assets/sfx/<category>/ to see exact filenames. Reference WITHOUT the .mp3 extension. Example:

{ "sfx": { "name": "success/xbox-360-achievement-sound", "volume": 0.4 } }

Edge cases

Situation	Action
URL paywall / JS-rendered → WebFetch returns no content	Tell user: "Không đọc được URL (có thể do paywall hoặc JS). Hãy lưu nội dung vào file .txt rồi gọi lại." Stop.
URL content < 200 words	Warn "Tin gốc ngắn, video có thể không đủ chất liệu", continue anyway
URL content > 2000 words	Summarize to key points, fit ~150-200 words script
File mode + file empty/missing	Error message, don't create output dir
Pipeline fails	Report error message + output dir path; user can re-try `npm run pipeline -- <path>` after fixing

name	create-news-video
description	Tạo video tin tức ngắn 9:16 (~60s) từ URL bài báo hoặc file .txt tiếng Việt. Trigger khi user yêu cầu tạo video tin tức, làm short news, làm bản tin video, render tin thành video, làm TikTok tin tức. Output: video.mp4 + voice.mp3 + script.txt cho CapCut.

Create News Video Skill

Generate a Vietnamese 9:16 motion-graphic news video from a URL or .txt file.

Input

Single argument: a news article URL (starts with http:// or https://) OR a path to a .txt file.

Workflow (MUST follow these steps in order)

Step 1: Detect input type

Starts with http:// or https:// → URL mode
Otherwise → file mode

Step 2: Fetch content

URL mode:

Use WebFetch with prompt:

Trích xuất từ trang này:
- title (string): tiêu đề bài báo
- content (string): nội dung chính, ~500-1500 từ
- ogImage (string|null): URL ảnh og:image (meta og:image hoặc ảnh đầu bài)
- domain (string): domain của URL (vd "vnexpress.net")
Trả về JSON với 4 field trên.

If WebFetch fails (paywall, JS-rendered, 4xx) → tell user to save content to a .txt file and pass that instead. Stop.

File mode:

Use Read to read the .txt file
Title = first non-empty line (strip whitespace, max 80 chars)
Content = remaining lines joined
ogImage = null
domain = "local"

Step 3: Create slug + output directory

slug = lowercase ASCII (strip Vietnamese diacritics, đ→d), replace non-alphanumeric with -, trim dashes, max 40 chars
timestamp = current local time as YYYYMMDD-HHmm
outputDir = output/<slug>-<timestamp>/
Use Bash: mkdir -p <outputDir>

Step 4: Generate script.json

Following the schema in docs/superpowers/specs/2026-04-29-auto-news-video-design.md Section 4. Key rules:

Script content (Vietnamese):

Total voiceText: ~150–200 words → ~55–65s spoken at speed 1.0
Number of scenes: 5–8 (1 hook + 3–6 body + 1 outro)
Each scene voiceText is 1-3 short sentences, văn nói (spoken style, not formal)
No emoji, no markdown in voiceText

⚠️ CRITICAL: Vietnamese TTS Phonetic Rules

Mandatory rules for voiceText:

Number form	WRONG (TTS misreads)	RIGHT (spell out in Vietnamese)
Decimal version	`GPT 5.5` → "năm rưỡi" ❌	`GPT năm chấm năm` ✅
Decimal stat	`82.7%`	`tám mươi hai phẩy bảy phần trăm`
Version	`iPhone 17`	`iPhone mười bảy` (or `iPhone 17` works for whole numbers)
Version with point	`iOS 18.2`	`iOS mười tám chấm hai`
Tech spec	`200MP`	`hai trăm megapixel`
Battery	`5000mAh`	`năm nghìn miliampe giờ`
Tokens	`1M tokens` / `1000000 tokens`	`một triệu token`
Price VND	`21 triệu đồng`	`hai mươi mốt triệu đồng`
Price USD	`$5`	`năm đô la` (or `năm đô`)
Multiplier	`2x`	`gấp đôi` (more natural than "hai lần")
Year	`2026`	`hai nghìn không trăm hai mươi sáu` (or just `năm 2026` reads OK)
Percentage with decimal	`30%`	`ba mươi phần trăm`
Time	`60 giây`	`sáu mươi giây`
Frequency	`5G`	`năm gờ` (be careful — TTS often says "năm-gờ")

Notation choices:

For decimal point use chấm (more spoken/natural) or phẩy (formal). Both work; pick consistent.
For comma separator, use phẩy (e.g. "1,000" → "một nghìn")
For ratio "3:1" → say ba trên một or ba so với một

English brand names — keep as-is, TTS handles them OK:

Apple, Google, OpenAI, Microsoft, TikTok, YouTube ✅

English acronyms — write phonetically if TTS misreads:

AI → usually OK, sometimes write ây ai for clarity
API → write ây pi ai if matter
GPT → usually OK; if not, write gí pi tí
iOS → write ai ô ét if matter

Symbols to AVOID in voiceText:

→ & % $ # + = (TTS may say literal name or skip)
! ? at end of sentence is OK — they create natural intonation
Emoji: NEVER (TTS pronounces or skips inconsistently)
URLs: NEVER (TTS reads dot/slash literally)

End each voiceText sentence with . or ? for natural pause/intonation.

Examples — full scene:

WRONG (will sound bad):

{ "voiceText": "GPT 5.5 đạt 82.7% trên Terminal-Bench, vượt GPT 5.4 (75.1%)." }

→ TTS reads: "GPT năm rưỡi đạt tám mươi hai chấm bảy phần trăm trên Terminal-Bench..."

RIGHT (natural):

{ "voiceText": "GPT năm chấm năm đạt tám mươi hai phẩy bảy phần trăm trên Terminal Bench, vượt phiên bản năm chấm bốn ở mức bảy mươi lăm phẩy một." }

Note: templateData (text on screen) CAN use original formatting — the visual is separate from spoken:

{
  "voiceText": "GPT năm chấm năm đạt tám mươi hai phẩy bảy phần trăm.",
  "templateData": {
    "template": "stat-hero",
    "value": "82.7%",                 ← Visual: keep readable formatting
    "label": "Terminal-Bench"
  }
}

Hook (most important — gets first 3 seconds of viewer attention):

Must contain a claim, statistic, or curious question
NEVER generic ("Hôm nay chúng ta sẽ nói về..." is wrong)
ALWAYS include at least 1 effect: flash-white-3f or particle-burst

Visual rules:

For image scenes: background.src = "$source.image" (literal — CLI substitutes)
Vary kenBurns across scenes (don't use zoom-in for every scene)
Vary text animation (don't use slide-up for every line)
Each line ≤ 25 characters
Each scene 1-3 lines

Outro (always fixed format):

{
  "id": "outro",
  "type": "outro",
  "voiceText": "Theo dõi Công nghệ 24h để xem bản tin mới mỗi ngày.",
  "visual": {
    "background": { "type": "gradient", "preset": "outro-purple" },
    "text": {
      "position": "center",
      "style": "outro-card",
      "lines": [
        { "content": "Xem bản tin mới mỗi ngày", "emphasis": "primary", "animation": "fade-in" },
        { "content": "Công nghệ 24h",            "emphasis": "channel", "animation": "scale-pop" },
        { "content": "Nguồn: <DOMAIN>",          "emphasis": "muted",   "animation": "fade-in-late" }
      ]
    }
  }
}

Replace <DOMAIN> with the actual domain string. Note: outro line 1 is shortened to fit 25-char schema rule (full CTA "Theo dõi để xem bản tin mới mỗi ngày" is 36 chars).

Step 5: Self-validate before writing

Check:

Total word count ~150-200
Every line.content ≤ 25 chars
5-8 scenes total
scenes[0].type === "hook"
last scene type === "outro"
All enum values valid (see spec Section 4.2)

If invalid, fix yourself silently. Up to 2 self-correction passes. After that, write anyway — the CLI's Zod validation will produce a precise error message that the user can act on.

Step 6: Write script.json

Use the Write tool (not Bash) to write the validated JSON to <outputDir>/script.json.

Step 7: Run the pipeline

Use Bash, foreground (not background), stream output:

npm run pipeline -- <outputDir>/script.json

If exit code != 0:

Report the error message clearly
Tell user the output dir path so they can inspect intermediate files

Step 8: Report success

If successful, report to user with markdown links:

✓ Video:  [video.mp4](output/<slug>-<timestamp>/video.mp4)
✓ Audio:  [voice.mp3](output/<slug>-<timestamp>/voice.mp3) — for CapCut
✓ Script: [script.txt](output/<slug>-<timestamp>/script.txt) — for CapCut auto-caption
Tổng thời lượng: XX.Xs

Examples

Example 1: URL with image (vnexpress)

User: /create-news-video https://vnexpress.net/iphone-17-200mp

Generated script.json (excerpt):

{
  "version": "1.0",
  "metadata": {
    "title": "Apple ra mắt iPhone 17 với camera 200MP",
    "source": {
      "url": "https://vnexpress.net/iphone-17-200mp",
      "domain": "vnexpress.net",
      "image": "https://i1-vnexpress.vnecdn.net/iphone17.jpg"
    },
    "channel": "Công nghệ 24h"
  },
  "voice": { "provider": "lucylab", "voiceId": "${VIETNAMESE_VOICEID}", "speed": 1.0 },
  "scenes": [
    {
      "id": "hook", "type": "hook",
      "voiceText": "Apple vừa ra mắt iPhone 17 với camera hai trăm megapixel.",
      "visual": {
        "background": { "type": "image", "src": "$source.image", "kenBurns": "zoom-in" },
        "overlay":    { "darkness": 0.4 },
        "text": {
          "position": "center", "style": "hook-large",
          "lines": [
            { "content": "iPhone 17",     "emphasis": "primary", "animation": "scale-pop" },
            { "content": "Camera 200MP!", "emphasis": "accent",  "animation": "slide-up-bounce" }
          ]
        },
        "effects": ["flash-white-3f", "particle-burst"]
      }
    }
    /* ... 3 body scenes + outro ... */
  ]
}

Example 2: .txt file with no image (local)

User: /create-news-video news/agi-update.txt

Generated script.json (excerpt):

{
  "metadata": {
    "title": "OpenAI công bố mô hình mới với khả năng lập luận",
    "source": { "url": "local", "domain": "local", "image": null },
    "channel": "Công nghệ 24h"
  },
  "scenes": [
    {
      "id": "hook", "type": "hook",
      "voiceText": "OpenAI vừa công bố mô hình mới có khả năng lập luận như con người.",
      "visual": {
        "background": { "type": "gradient", "preset": "news-dark" },
        "text": {
          "position": "center", "style": "hook-large",
          "lines": [
            { "content": "Mô hình mới", "emphasis": "primary", "animation": "scale-pop" },
            { "content": "Lập luận!",  "emphasis": "accent",  "animation": "slide-up-bounce" }
          ]
        },
        "effects": ["flash-white-3f"]
      }
    }
    /* ... outro line 3 = "Nguồn: local" ... */
  ]
}

Note: when source has no image, every scene uses background.type = "gradient" (no image fallback at composer level needed).

Sound Effects (SFX)

You almost never need to set the sfx field. The pipeline has a smart 3-tier selector that picks the right SFX for each scene automatically:

If scene.sfx is set → use exactly that (override).
Else, scan voiceText for semantic keywords:
- cảnh báo / rủi ro / nguy hiểm / warning → alert/
- kỷ lục / vượt / xuất sắc / breakthrough / success → success/
- thất bại / sai / lỗi / fail / wrong → fail/
- ra mắt / công bố / lần đầu / launch / unveil → reveal/
- đếm ngược / tích tắc / countdown → countdown/
- hùng vĩ / hoành tráng / cinematic / epic → cinematic/
- hồi hộp / chờ đợi / drumroll / suspense → drumroll/
Else, fall back to template default category:
- hook → transition/ or cinematic/
- comparison → transition/ or emphasis/
- stat-hero → emphasis/ or success/
- feature-list → transition/ or emphasis/
- callout → alert/ or drumroll/
- outro → outro/ or success/

When to add explicit `sfx` override

Only when you want to FORCE a specific sound that the keyword matcher won't infer:

Scene needs a particular signature sound (e.g., always a gong on important scenes)
Disable SFX for a particular scene: "sfx": { "name": "none" }
Use a specific file: "sfx": { "name": "transition/whoosh-sfx", "volume": 0.4 }

Example (rarely needed):

{
  "id": "body-3",
  "voiceText": "...",
  "templateData": { ... },
  "sfx": { "name": "drumroll/snare-roll", "volume": 0.5, "startOffsetSec": 0.2 }
}

The pipeline auto-mixes a sound effect at each scene start based on the template type:

Template	Default SFX	Sound character
`hook`	`transition/whoosh-soft`	Dramatic entrance
`comparison`	`transition/swoosh`	Side-by-side reveal
`stat-hero`	`emphasis/ding`	Number reveal
`feature-list`	`transition/pop`	Bullet appearance
`callout`	`alert/notification`	Important info
`outro`	`outro/tada`	Ending signature

You usually do NOT need to add a sfx field — defaults work for 95% of cases.

ONLY add an explicit sfx override when content STRONGLY suggests a different mood:

Content cue (in voiceText)	Override
"cảnh báo", "rủi ro", "đáng lo", "nguy hiểm"	`{ "name": "alert/notification", "volume": 0.4 }`
"vượt", "kỷ lục", "xuất sắc", "tăng mạnh" (positive stat)	`{ "name": "emphasis/chime", "volume": 0.35 }`
Want to disable SFX for this scene	`{ "name": "none" }`

Place sfx at the same level as voiceText and templateData:

{
  "id": "body-3",
  "type": "body",
  "voiceText": "Cảnh báo: AI tự chủ có thể đặt ra rủi ro về an ninh mạng.",
  "templateData": { "template": "callout", ... },
  "sfx": { "name": "alert/notification", "volume": 0.4 }
}

Available SFX categories (any <name> subfolder in assets/sfx/<category>/<name>.mp3):

transition/ — whoosh, swoosh, swish, pop, punch, page-flip, slide, riser
emphasis/ — ding, tick, chime, ping, bong, pop, punch
alert/ — notification, alert, alarm, warning
success/ — applepay, achievement, win, xbox, steam, jet-set
fail/ — wrong-answer-buzzer, incorrect, error, dank-meme
outro/ — tada, win31, noooo
reveal/ — magic-fairy, anime-girl, hey-female-voice
drumroll/ — snare, drum-roll, boom
countdown/ — beep, timer
cinematic/ — rise, impact

Browse assets/sfx/<category>/ to see exact filenames. Reference WITHOUT the .mp3 extension. Example:

{ "sfx": { "name": "success/xbox-360-achievement-sound", "volume": 0.4 } }

Edge cases

Situation	Action
URL paywall / JS-rendered → WebFetch returns no content	Tell user: "Không đọc được URL (có thể do paywall hoặc JS). Hãy lưu nội dung vào file .txt rồi gọi lại." Stop.
URL content < 200 words	Warn "Tin gốc ngắn, video có thể không đủ chất liệu", continue anyway
URL content > 2000 words	Summarize to key points, fit ~150-200 words script
File mode + file empty/missing	Error message, don't create output dir
Pipeline fails	Report error message + output dir path; user can re-try `npm run pipeline -- <path>` after fixing

create-news-video

Create News Video Skill

Input

Workflow (MUST follow these steps in order)

Step 1: Detect input type

Step 2: Fetch content

Step 3: Create slug + output directory

Step 4: Generate script.json

⚠️ CRITICAL: Vietnamese TTS Phonetic Rules

Step 5: Self-validate before writing

Step 6: Write script.json

Step 7: Run the pipeline

Step 8: Report success

Examples

Example 1: URL with image (vnexpress)

Example 2: .txt file with no image (local)

Sound Effects (SFX)

When to add explicit sfx override

Edge cases

Create News Video Skill

Input

Workflow (MUST follow these steps in order)

Step 1: Detect input type

Step 2: Fetch content

Step 3: Create slug + output directory

Step 4: Generate script.json

⚠️ CRITICAL: Vietnamese TTS Phonetic Rules

Step 5: Self-validate before writing

Step 6: Write script.json

Step 7: Run the pipeline

Step 8: Report success

Examples

Example 1: URL with image (vnexpress)

Example 2: .txt file with no image (local)

Sound Effects (SFX)

When to add explicit sfx override

Edge cases

When to add explicit `sfx` override

When to add explicit `sfx` override