원클릭으로 Manus에서 모든 스킬 실행

$pwd:

doubao-tts

Name: Doubao Tts
Author: calesthio

// Generate Mandarin and multilingual narration with Volcengine Doubao Speech 2.0. Use when creating Chinese voiceovers, when the user prefers Doubao/Volcengine/火山引擎/豆包 TTS, or when narration needs character-level timestamp metadata for subtitles.

Manus에서 실행

$ git log --oneline --stat

stars:4,137

forks:836

updated:2026년 5월 4일 13:40

SKILL.md

readonly

name	doubao-tts
description	Generate Mandarin and multilingual narration with Volcengine Doubao Speech 2.0. Use when creating Chinese voiceovers, when the user prefers Doubao/Volcengine/火山引擎/豆包 TTS, or when narration needs character-level timestamp metadata for subtitles.

Doubao TTS

Requires DOUBAO_SPEECH_API_KEY in .env. Set DOUBAO_SPEECH_VOICE_TYPE for the default voice, or pass voice_id to the tool.

Current API

Use the new-console API key flow:

X-Api-Key: ${DOUBAO_SPEECH_API_KEY}
X-Api-Resource-Id: seed-tts-2.0

Do not use X-Api-App-Id and X-Api-Access-Key with a new-console API Key. If the API returns load grant: requested grant not found, the key type or auth header is probably wrong.

For long-form video narration, prefer the async endpoint:

POST https://openspeech.bytedance.com/api/v3/tts/submit
POST https://openspeech.bytedance.com/api/v3/tts/query

This returns audio_url plus sentences[].words[] timing metadata that can be used to build subtitles.

OpenMontage Usage

Generate with the TTS selector:

from tools.audio.tts_selector import TTSSelector

result = TTSSelector().execute({
    "preferred_provider": "doubao",
    "text": "如果 AI 真的会改变未来，普通人到底该怎么参与？",
    "voice_id": "zh_female_vv_uranus_bigtts",
    "output_path": "projects/my-video/assets/audio/narration.mp3",
    "speech_rate": 0,
    "enable_timestamp": True,
})

Or call the provider directly:

from tools.audio.doubao_tts import DoubaoTTS

result = DoubaoTTS().execute({
    "text": "短样本试听文本。",
    "voice_id": "zh_female_vv_uranus_bigtts",
    "output_path": "projects/my-video/assets/audio/doubao_sample.mp3",
})

The provider writes:

output_path: downloaded audio file
metadata_path: full query response JSON, defaulting to <output_path>.json

Recommended Workflow

Generate a 10-15 second sample before a full paid narration.
Ask the user to approve voice naturalness, accent, and speed.
Generate the full narration only after approval.
Keep the query JSON. It is the source of truth for subtitle timing.
Build captions from sentences[].words[], not from estimated text length.
Group captions by Chinese semantic phrases before applying timestamps. Do not split only by fixed character count; it can break phrases like "在不押单个公司的情况下" or "可能会被慢慢稀释" and hurt comprehension.
Let the video duration follow the approved voice rhythm unless the user explicitly asks to match a prior runtime.

Parameters

voice_id: Doubao speaker / voice type. Defaults to DOUBAO_SPEECH_VOICE_TYPE.
resource_id: use seed-tts-2.0 for Doubao Speech 2.0 voices.
speech_rate: 0 is normal, 100 is 2x, -50 is 0.5x.
sample_rate: default 24000.
enable_timestamp: default true.
return_usage: default true, requests usage metadata when available.

Do not pass additions.explicit_language by default. Some endpoint/key combinations reject zh-cn with unsupported additions explicit language zh-cn.

For calm Mandarin explainers, start with speech_rate: 0. If the result is too long for the approved format, make a short comparison sample with speech_rate: 25 or 50 before regenerating the full narration. Do not speed up only to match a previous provider's duration if the user prefers Doubao's natural pace.

Troubleshooting

load grant: requested grant not found: wrong key type or wrong auth header. Use X-Api-Key for new-console API Keys.
speaker permission denied: voice id is wrong or not authorized for the selected resource.
quota exceeded: quota, lifetime characters, or concurrency exceeded.
Missing timestamps: verify enable_timestamp: true, keep the query JSON, and confirm the selected endpoint returned sentences.

Safety

Never print or write the API key to logs, metadata, patches, or project artifacts. .env.example should contain only empty variable names.

related-skills.json

같은 저장소

canvas-procedural-animation.md

from "calesthio/OpenMontage"

Use p5.js/canvas for local procedural character effects: particles, weather, squash/stretch, walk cycles, and environmental motion.

2026-04-284.1k

character-animation-qa.md

from "calesthio/OpenMontage"

Review local character animation with schema checks, Playwright browser previews, frame sampling, and FFmpeg/ffprobe final output checks.

2026-04-284.1k

character-rigging.md

from "calesthio/OpenMontage"

Build data-driven 2D character rigs for local animation: parts, pivots, layers, constraints, views, and reusable rig packages.

2026-04-284.1k

pose-library-design.md

from "calesthio/OpenMontage"

Design reusable 2D character pose libraries, action cycles, and expression states for data-driven animation.

2026-04-284.1k

svg-character-animation.md

from "calesthio/OpenMontage"

Animate SVG character rigs with GSAP, CSS transforms, Remotion frame control, and HyperFrames-compatible browser previews.

2026-04-284.1k

seedance-2-0.md

from "calesthio/OpenMontage"

Generate cinematic clips with ByteDance Seedance 2.0 — the preferred premium video model in OpenMontage when a paid gateway is configured. Use when: (1) producing trailers, teasers, hype edits, or premium cinematic clips, (2) needing native synchronized audio (speech, SFX, ambience) in a single pass, (3) needing multi-shot cuts inside one generation, (4) needing director-level camera control, (5) needing lip-sync from quoted dialogue in the prompt, (6) needing reference-conditioned generation with up to 9 images + 3 video clips + 3 audio clips, (7) wanting consistent character identity across shots. Accessible via fal.ai (`seedance_video` tool), HeyGen (Video Agent / Avatar Shots), Replicate, Runway (Enterprise, non-US), Freepik, BytePlus ModelArk, Higgsfield, Pollo, and other aggregators.

2026-04-244.1k

package.json

"author": "calesthio"

"repository": "calesthio/OpenMontage"

GitHub 저장소 열기 Creator 저장소 보기

$ install --global

$ download --local

Manus에서 실행

$ useful --forSOC

소프트웨어 개발자컴퓨터 및 수학직15-1252L4

name	doubao-tts
description	Generate Mandarin and multilingual narration with Volcengine Doubao Speech 2.0. Use when creating Chinese voiceovers, when the user prefers Doubao/Volcengine/火山引擎/豆包 TTS, or when narration needs character-level timestamp metadata for subtitles.

Doubao TTS

Requires DOUBAO_SPEECH_API_KEY in .env. Set DOUBAO_SPEECH_VOICE_TYPE for the default voice, or pass voice_id to the tool.

Current API

Use the new-console API key flow:

X-Api-Key: ${DOUBAO_SPEECH_API_KEY}
X-Api-Resource-Id: seed-tts-2.0

Do not use X-Api-App-Id and X-Api-Access-Key with a new-console API Key. If the API returns load grant: requested grant not found, the key type or auth header is probably wrong.

For long-form video narration, prefer the async endpoint:

POST https://openspeech.bytedance.com/api/v3/tts/submit
POST https://openspeech.bytedance.com/api/v3/tts/query

This returns audio_url plus sentences[].words[] timing metadata that can be used to build subtitles.

OpenMontage Usage

Generate with the TTS selector:

from tools.audio.tts_selector import TTSSelector

result = TTSSelector().execute({
    "preferred_provider": "doubao",
    "text": "如果 AI 真的会改变未来，普通人到底该怎么参与？",
    "voice_id": "zh_female_vv_uranus_bigtts",
    "output_path": "projects/my-video/assets/audio/narration.mp3",
    "speech_rate": 0,
    "enable_timestamp": True,
})

Or call the provider directly:

from tools.audio.doubao_tts import DoubaoTTS

result = DoubaoTTS().execute({
    "text": "短样本试听文本。",
    "voice_id": "zh_female_vv_uranus_bigtts",
    "output_path": "projects/my-video/assets/audio/doubao_sample.mp3",
})

The provider writes:

output_path: downloaded audio file
metadata_path: full query response JSON, defaulting to <output_path>.json

Recommended Workflow

Generate a 10-15 second sample before a full paid narration.
Ask the user to approve voice naturalness, accent, and speed.
Generate the full narration only after approval.
Keep the query JSON. It is the source of truth for subtitle timing.
Build captions from sentences[].words[], not from estimated text length.
Group captions by Chinese semantic phrases before applying timestamps. Do not split only by fixed character count; it can break phrases like "在不押单个公司的情况下" or "可能会被慢慢稀释" and hurt comprehension.
Let the video duration follow the approved voice rhythm unless the user explicitly asks to match a prior runtime.

Parameters

voice_id: Doubao speaker / voice type. Defaults to DOUBAO_SPEECH_VOICE_TYPE.
resource_id: use seed-tts-2.0 for Doubao Speech 2.0 voices.
speech_rate: 0 is normal, 100 is 2x, -50 is 0.5x.
sample_rate: default 24000.
enable_timestamp: default true.
return_usage: default true, requests usage metadata when available.

Do not pass additions.explicit_language by default. Some endpoint/key combinations reject zh-cn with unsupported additions explicit language zh-cn.

Troubleshooting

load grant: requested grant not found: wrong key type or wrong auth header. Use X-Api-Key for new-console API Keys.
speaker permission denied: voice id is wrong or not authorized for the selected resource.
quota exceeded: quota, lifetime characters, or concurrency exceeded.
Missing timestamps: verify enable_timestamp: true, keep the query JSON, and confirm the selected endpoint returned sentences.

Safety

Never print or write the API key to logs, metadata, patches, or project artifacts. .env.example should contain only empty variable names.

doubao-tts

Doubao TTS

Current API

OpenMontage Usage

Recommended Workflow

Parameters

Troubleshooting

Safety

이 저장소의 다른 Skills

Doubao TTS

Current API

OpenMontage Usage

Recommended Workflow

Parameters

Troubleshooting

Safety

이 저장소의 다른 Skills