بنقرة واحدة
subtitle-craft
// Create, translate, repair, and burn subtitles through the Subtitle Craft plugin.
// Create, translate, repair, and burn subtitles through the Subtitle Craft plugin.
Bailian-powered unified creative studio — built-in Wan/Qwen image generation + HappyHorse 1.0 (text-to-video / image-to-video / reference-to-video / video-edit) + Wan 2.6/2.7 + 5 digital-human modes (photo speak / video relip / video reface / pose drive / avatar compose) + CosyVoice & Edge-TTS + storyboard long-video pipeline. Use when the user asks for AI image or video generation on Aliyun Bailian, keyframes, ecommerce images, multi-character video with audio sync, video editing/style transfer, talking-head, lip-sync, video reface, or storyboard-driven long video.
Web search and content extraction with Tavily and Exa via inference.sh CLI. Apps: Tavily Search, Tavily Extract, Exa Search, Exa Answer, Exa Extract. Capabilities: AI-powered search, content extraction, direct answers, research. Use for: research, RAG pipelines, fact-checking, content aggregation, agents. Triggers: web search, tavily, exa, search api, content extraction, research, internet search, ai search, search assistant, web scraping, rag, perplexity alternative
DashScope-powered digital human studio — photo speak, video relip, video reface, avatar compose. Use when the user asks for an AI talking-head video, lip-sync replacement on an existing video, replacing a person inside a video, or composing a new character from multiple reference images.
Guide AI-powered video editing, highlight extraction, silence removal, and talking-head polish through ClipSense.
Research, score, and plan content ideas through the Idea Research plugin.
通过火山引擎 Ark API 生成 AI 视频 — 文生视频、图生视频、多模态、视频编辑、视频续写、长视频分镜拼接。任务异步执行+轮询,成功后可自动下载到本地。
| name | subtitle-craft |
| description | Create, translate, repair, and burn subtitles through the Subtitle Craft plugin. |
| risk_class | mutating_scoped |
Use this skill when the user wants to:
Keywords (zh): 字幕, 听写, 转录, 翻译字幕, 字幕修复, 字幕烧制, 双语字幕, 角色识别, 说话人分离 Keywords (en): subtitle, caption, transcribe, translate srt, burn subtitle, diarization, speaker
| Tool | Purpose |
|---|---|
subtitle_craft_create | Create a subtitle task (mode = auto_subtitle / translate / repair / burn) |
subtitle_craft_status | Inspect a single task's status, pipeline step, error_kind |
subtitle_craft_list | List recent tasks (default 10) |
subtitle_craft_cancel | Cooperative cancel of a running task |
v1.0 explicitly does NOT declare any
subtitle_craft_handoff_*tool — cross-plugin dispatch is deferred to v2.0 with schema reservation only.
subtitle_craft_create{
"mode": "auto_subtitle | translate | repair | burn",
"source_path": "/abs/path/to/video.mp4", // for auto_subtitle / burn
"srt_path": "/abs/path/to/input.srt", // for translate / repair / burn
"source_lang": "", // empty = auto-detect (Paraformer language_hints)
"target_lang": "en", // for translate
"translation_model": "qwen-mt-flash", // qwen-mt-flash | qwen-mt-plus | qwen-mt-lite
"diarization_enabled": false, // auto_subtitle only
"speaker_count": 0, // 0 = auto
"character_identify_enabled": false, // requires diarization_enabled=true
"disfluency_removal_enabled": false, // remove um/uh/嗯/啊
"bilingual": false, // translate only — keep both lines
"subtitle_style": "default", // default | cinema | youtube | tiktok | tv | <custom_id>
"burn_engine": "ass", // ass | html
"burn_mode": "hard" // hard (in-stream) | soft (mp4 sidecar)
}
subtitle_craft_status{ "task_id": "abc123def456" }
subtitle_craft_cancel{ "task_id": "abc123def456" }
{
"id": "abc123def456",
"mode": "auto_subtitle",
"status": "pending|running|succeeded|failed|canceled",
"pipeline_step": "setup_environment|estimate_cost|prepare_assets|asr_or_load|identify_characters|translate_or_repair|render_output|burn_or_finalize",
"progress": 0.42,
"source_path": "...",
"output_srt_path": "...",
"output_vtt_path": "...",
"output_video_path": "...",
"error_kind": "network|timeout|auth|quota|moderation|dependency|format|duration|unknown",
"error_message": "...",
"error_hints": "..."
}
| User says | Choose mode | Notes |
|---|---|---|
| "给这段视频加字幕" / "transcribe video" | auto_subtitle | If the user mentions multiple people, also turn on diarization_enabled |
| "把字幕翻译成英文" / "translate srt to english" | translate | Default translation_model=qwen-mt-flash (best value) |
| "字幕重叠了" / "时间轴乱" / "fix overlap" | repair | No API cost — pure local rewrite |
| "烧制" / "硬字幕" / "burn subtitle" | burn | Default burn_engine=ass (FFmpeg native) — switch to html only when the user wants custom CSS |
| 提到 "区分谁在说话" / "speaker" | auto_subtitle + diarization_enabled=true | Add character_identify_enabled=true if the user wants character names instead of SPEAKER_xx |
A 30-min YouTube → English subtitles round-trip is usually ¥0.50–¥0.80.
The error_kind field always falls into one of these 9 values, identical
to clip-sense's taxonomy:
error_kind | What to tell the user |
|---|---|
network | Suggest checking VPN / DNS, then retry |
timeout | Increase Paraformer timeout in Settings, then retry |
auth | API key wrong / 4xx → open Settings |
quota | Bailian balance / quota exhausted |
moderation | Content flagged → ask the user to trim sensitive parts |
dependency | FFmpeg / Playwright missing — HTML burn auto-falls back to ASS |
format | Bad SRT encoding (must be UTF-8) / unsupported video container |
duration | File >2 GB or audio >12 h — ask user to split first |
unknown | Surface the raw error_message and link to logs |
The plugin emits a single event name task_update with payload:
{
"task_id": "...",
"status": "...",
"mode": "...",
"pipeline_step": "...",
"progress": 0.42,
"error_kind": "...",
"error_message": "...",
"error_hints": "..."
}
Subscribe via onEvent("task_update", handler) — the plugin host fans
this out to the iframe via bridge:event.
v1.0 ships without:
/handoff/* routes → no cross-plugin dispatch surfacesubtitle_craft_handoff_* tools → only the 4 aboveassets_bus.write / tasks.origin_* fills → schema is reserved but
always NULL in v1.0When the user asks "can it pipe into clip-sense?" answer with: "v1.0
exports SRT/VTT files; v2.0 will add a one-click Handoff. For now you
can manually re-upload the SRT into clip-sense's burn_subtitle mode."
// Pure transcription, Chinese, no diarization
{ "mode": "auto_subtitle", "source_path": "/x.mp4", "source_lang": "zh" }
// Transcription + diarization + character ID (the only place the toggle
// is active — embedded inside auto_subtitle, not a standalone mode)
{ "mode": "auto_subtitle", "source_path": "/x.mp4",
"diarization_enabled": true, "character_identify_enabled": true,
"speaker_count": 3 }
// Translate an existing SRT to English, bilingual output
{ "mode": "translate", "srt_path": "/x.srt", "target_lang": "en",
"bilingual": true, "translation_model": "qwen-mt-flash" }
// Repair a glitchy SRT (no API cost)
{ "mode": "repair", "srt_path": "/x.srt" }
// Burn an existing SRT into video using ffmpeg ASS (recommended)
{ "mode": "burn", "source_path": "/x.mp4", "srt_path": "/x.srt",
"burn_engine": "ass", "burn_mode": "hard", "subtitle_style": "youtube" }
// AI Hook Picker (v1.1) — Qwen-Plus selects an opening hook
{ "mode": "hook_picker", "srt_path": "/x.srt",
"instruction": "find the strongest opening line",
"target_duration_sec": 12, "hook_model": "qwen-plus" }
hook_picker, added in v1.1)A 5th processing mode that runs an existing SRT through Qwen-Plus to
pick the strongest opening "hook" dialogue for short-form video. The
algorithm is ported 1:1 from CutClaw's
Screenwriter_scene_short.py and lives in the dedicated
subtitle_hook_picker.py module — the prompt (SELECT_HOOK_DIALOGUE_PROMPT),
the fuzzy-match threshold (0.55) and the 3-window-with-2-retries loop
are red lines: do NOT change them without re-validating against
CutClaw output.
CreateTaskBody)| field | type | default | meaning |
|---|---|---|---|
srt_path | str | "" | Required when from_task_id is empty |
from_task_id | str | "" | Reuse another task's output_srt_path |
instruction | str | "" | Free-form direction for the AI |
main_character | str | "" | Constrain selection to this speaker |
target_duration_sec | float | 12.0 | Hook length target (6–30) |
prompt_window_mode | str | "tail_then_head" | or "random_window" |
random_window_attempts | int | 3 | 1–5; cost scales linearly |
hook_model | str | "qwen-plus" | Or qwen-plus-2025-09-11 / qwen-max |
task_dir/hook.srt — single-cue SRT for the chosen window.task_dir/hook.json — { hook: {...}, telemetry: {...} } payload
(lines, timed_lines, source_start/_end, duration_seconds,
selected_window, selected_attempt, reason).GET /tasks/{task_id} enriches succeeded hook tasks with hook +
hook_telemetry so the UI right-pane HookResultPanel renders
without an extra fetch.GET /library/hooks aggregates every succeeded hook task
({items: [...], total: N}).hook_picker declares skip_steps = ("prepare_assets", "identify_characters", "translate_or_repair", "burn_or_finalize").
ASR is NOT skipped; instead _step_asr_or_load short-circuits to
_load_srt_input (the SRT must contain ≥5 cues — fewer raises
PipelineError(kind="format")).
Server-side estimate_cost bills qwen-plus (¥0.005/round) ×
(2 + random_window_attempts) rounds; a typical run costs < ¥0.01.
The UI surfaces this estimate in the right-pane oa-preview-card
before the user clicks Start.