with one click
voice-conversion-studio
Convert a local source recording into an authorized target voice. Use when the user asks for voice conversion, voice changer, 换声, 变声, 音色转换, or converting existing narration to another approved voice.
Menu
Convert a local source recording into an authorized target voice. Use when the user asks for voice conversion, voice changer, 换声, 变声, 音色转换, or converting existing narration to another approved voice.
Generate or edit a single image via OpenRouter (google/gemini-3.1-flash-image-preview by default). Accepts a text prompt and optional --input-image for image-to-image editing. Trigger when the user asks for an AI image, illustration, concept art, product render, or wants to modify an existing image.
Render a single 3-15s video clip via Seedance 2.0. Supports two backends: OpenRouter (default, model bytedance/seedance-2.0) and the official Volcengine ARK / BytePlus ModelArk endpoint (model doubao-seedance-2-0-260128 / dreamina-seedance-2-0-260128). Accepts a structured English video prompt, optional first-frame image, and optional identity/style reference image. Trigger when the user asks for AI video clip generation, 分镜视频, seedance, or wants a short cinematic shot from a prompt + frame.
Use this meta-skill instead of answering directly when the current user asks to draft, repair, compile, or produce an academic/research paper or LaTeX manuscript. It uses multi-skill orchestration for manuscript workflows that need source search, citation planning, experiment or figure/table placeholders, drafting, length checks, citation integrity, and LaTeX/PDF compilation. Ordinary paper requests use a compact draft path; explicit full/PDF/long-form requests use the full manuscript path. Do not use it for web research reports, slide decks, document decisions, or generic plotting.
Submit audio or video for multilingual dubbing, poll status, and download dubbed audio. Use when the user asks for dubbing, 多语言配音, 视频翻译配音, 译制片, or wants a source clip dubbed into another language.
Generate instrumental music, background beds, jingles, or sung songs with lyrics through OpenSquilla audio tools. Use when the user asks for BGM, music generation, 唱歌, 生成歌曲, lyrics to song, or a playable music audio artifact.
Create and register cloned voices for later TTS only when the speaker has explicit consent. Use when the user asks for voice clone, clone voice, 克隆音色, 复刻声音, or wants a reusable voice_id.
| name | voice-conversion-studio |
| description | Convert a local source recording into an authorized target voice. Use when the user asks for voice conversion, voice changer, 换声, 变声, 音色转换, or converting existing narration to another approved voice. |
| triggers | ["voice conversion","voice convert","voice changer","音色转换","换声","变声"] |
| provenance | {"origin":"opensquilla-original","license":"Apache-2.0","maintained_by":"OpenSquilla"} |
| metadata | {"opensquilla":{"risk":"high","capabilities":["network-read","filesystem-write"],"requires_tools":["voice_convert","audio_provider_capabilities"]}} |
Converts an existing local recording into a target voice using the configured
audio provider. OpenRouter can assist with planning or file naming, but the
conversion itself must use voice_convert.
Before calling tools, extract these fields from the user request:
OpenRouter can help summarize or translate instructions, but it is not an audio provider and cannot authorize voice identity use.
audio_provider_capabilities if conversion availability is uncertain.voice_convert with source_audio, voice, optional output_path,
and any supported provider controls.When source quality, accent transfer, or target voice fit is uncertain, convert a short sample before processing a full recording. Recommend re-recording or cleaning the source if the preview contains room echo, background music, strong dialect mismatch, or heavy code-switching.
For multilingual conversion, avoid using a target voice that does not naturally support the target language. A short preview is the fastest way to catch odd accent transfer before spending quota on the whole asset.
voice_convert returns status=ok, return the playable artifact/path
first, then target voice, mime type, and rights summary.consent_required, ask for source and target consent metadata
instead of attempting a different voice identity.not_available, quote the note and distinguish provider
setup, feature gating, key/quota limits, file format, and source path issues.For voice conversion, first identify the target language and locale. The source recording and target voice should be compatible with the desired locale-appropriate accent.
Return: