Run any Skill in Manus with one click

voice-conversion-studio

Convert a local source recording into an authorized target voice. Use when the user asks for voice conversion, voice changer, 换声, 变声, 音色转换, or converting existing narration to another approved voice.

Run Skill in Manus

Stars3,700

Forks288

UpdatedJune 2, 2026 at 21:32

Source

opensquilla

opensquilla/opensquilla

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

Useful forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

SKILL.md

readonly

name	voice-conversion-studio
description	Convert a local source recording into an authorized target voice. Use when the user asks for voice conversion, voice changer, 换声, 变声, 音色转换, or converting existing narration to another approved voice.
triggers	["voice conversion","voice convert","voice changer","音色转换","换声","变声"]
provenance	{"origin":"opensquilla-original","license":"Apache-2.0","maintained_by":"OpenSquilla"}
metadata	{"opensquilla":{"risk":"high","capabilities":["network-read","filesystem-write"],"requires_tools":["voice_convert","audio_provider_capabilities"]}}

voice-conversion-studio

Converts an existing local recording into a target voice using the configured audio provider. OpenRouter can assist with planning or file naming, but the conversion itself must use voice_convert.

Request triage

Before calling tools, extract these fields from the user request:

source audio path and whether it is local, intentional, and user-provided
source rights: speaker consent and recording copyright
target voice: provider-licensed voice, cloned voice ID, or user-provided voice ID
target language, target locale, desired accent, emotion, pace, and output format
output expectation: quick conversion sample, final asset, or multiple takes

OpenRouter can help summarize or translate instructions, but it is not an audio provider and cannot authorize voice identity use.

Required workflow

Check the source file is local and intentionally provided.
Confirm rights for both sides:
- source recording copyright and speaker authorization
- target voice consent or provider-licensed voice
Refuse public figure or copyrighted character imitation.
Use audio_provider_capabilities if conversion availability is uncertain.
Call voice_convert with source_audio, voice, optional output_path, and any supported provider controls.
Return the result as a playable audio artifact when the surface supports it.

Preview-first

When source quality, accent transfer, or target voice fit is uncertain, convert a short sample before processing a full recording. Recommend re-recording or cleaning the source if the preview contains room echo, background music, strong dialect mismatch, or heavy code-switching.

For multilingual conversion, avoid using a target voice that does not naturally support the target language. A short preview is the fastest way to catch odd accent transfer before spending quota on the whole asset.

Tool-result handling

If voice_convert returns status=ok, return the playable artifact/path first, then target voice, mime type, and rights summary.
If it returns consent_required, ask for source and target consent metadata instead of attempting a different voice identity.
If it returns not_available, quote the note and distinguish provider setup, feature gating, key/quota limits, file format, and source path issues.

Rights and copyright guard

授权 is required for the source speaker and target voice.
Copyright / 版权: do not convert songs, movie lines, podcasts, audiobooks, lectures, interviews, or game/animation dialogue unless the user says they have rights.
Public figure policy: do not convert a recording to sound like a public figure, celebrity, actor, singer, politician, influencer, or fictional character.
If the user asks for a risky identity target, offer a non-identifying target: "mature calm Mandarin narrator", "bright young commercial voice", etc.

Locale and accent quality notes

For voice conversion, first identify the target language and locale. The source recording and target voice should be compatible with the desired locale-appropriate accent.

Chinese neutral narration: prefer clean 普通话 source and target voice.
English: preserve requested locale such as en-US, en-GB, en-AU, en-IN, or en-SG.
Japanese/Korean/French/German/Spanish/etc.: prefer source/target voices that naturally support that language.
Strong dialect, background music, reverberation, and heavy code-switching can cause odd accent transfer. Recommend re-recording a short, dry sample before converting a whole script.

Output contract

Return:

provider
target voice
output path
mime type
playable audio artifact status
rights/consent summary
target language / locale assumption

More from this repository

same repository

nano-banana-pro

opensquilla/opensquilla

Generate or edit a single image via OpenRouter (google/gemini-3.1-flash-image-preview by default). Accepts a text prompt and optional --input-image for image-to-image editing. Trigger when the user asks for an AI image, illustration, concept art, product render, or wants to modify an existing image.

2026-06-043.7k

seedance-2-prompt

opensquilla/opensquilla

Render a single 3-15s video clip via Seedance 2.0. Supports two backends: OpenRouter (default, model bytedance/seedance-2.0) and the official Volcengine ARK / BytePlus ModelArk endpoint (model doubao-seedance-2-0-260128 / dreamina-seedance-2-0-260128). Accepts a structured English video prompt, optional first-frame image, and optional identity/style reference image. Trigger when the user asks for AI video clip generation, 分镜视频, seedance, or wants a short cinematic shot from a prompt + frame.

2026-06-043.7k

meta-paper-write

opensquilla/opensquilla

Use this meta-skill instead of answering directly when the current user asks to draft, repair, compile, or produce an academic/research paper or LaTeX manuscript. It uses multi-skill orchestration for manuscript workflows that need source search, citation planning, experiment or figure/table placeholders, drafting, length checks, citation integrity, and LaTeX/PDF compilation. Ordinary paper requests use a compact draft path; explicit full/PDF/long-form requests use the full manuscript path. Do not use it for web research reports, slide decks, document decisions, or generic plotting.

2026-06-033.7k

advanced-dubbing-studio

opensquilla/opensquilla

Submit audio or video for multilingual dubbing, poll status, and download dubbed audio. Use when the user asks for dubbing, 多语言配音, 视频翻译配音, 译制片, or wants a source clip dubbed into another language.

2026-06-023.7k

music-and-singing-studio

opensquilla/opensquilla

Generate instrumental music, background beds, jingles, or sung songs with lyrics through OpenSquilla audio tools. Use when the user asks for BGM, music generation, 唱歌, 生成歌曲, lyrics to song, or a playable music audio artifact.

2026-06-023.7k

voice-clone-lab

opensquilla/opensquilla

Create and register cloned voices for later TTS only when the speaker has explicit consent. Use when the user asks for voice clone, clone voice, 克隆音色, 复刻声音, or wants a reusable voice_id.

2026-06-023.7k

name	voice-conversion-studio
description	Convert a local source recording into an authorized target voice. Use when the user asks for voice conversion, voice changer, 换声, 变声, 音色转换, or converting existing narration to another approved voice.
triggers	["voice conversion","voice convert","voice changer","音色转换","换声","变声"]
provenance	{"origin":"opensquilla-original","license":"Apache-2.0","maintained_by":"OpenSquilla"}
metadata	{"opensquilla":{"risk":"high","capabilities":["network-read","filesystem-write"],"requires_tools":["voice_convert","audio_provider_capabilities"]}}

voice-conversion-studio

Converts an existing local recording into a target voice using the configured audio provider. OpenRouter can assist with planning or file naming, but the conversion itself must use voice_convert.

Request triage

Before calling tools, extract these fields from the user request:

source audio path and whether it is local, intentional, and user-provided
source rights: speaker consent and recording copyright
target voice: provider-licensed voice, cloned voice ID, or user-provided voice ID
target language, target locale, desired accent, emotion, pace, and output format
output expectation: quick conversion sample, final asset, or multiple takes

OpenRouter can help summarize or translate instructions, but it is not an audio provider and cannot authorize voice identity use.

Required workflow

Check the source file is local and intentionally provided.
Confirm rights for both sides:
- source recording copyright and speaker authorization
- target voice consent or provider-licensed voice
Refuse public figure or copyrighted character imitation.
Use audio_provider_capabilities if conversion availability is uncertain.
Call voice_convert with source_audio, voice, optional output_path, and any supported provider controls.
Return the result as a playable audio artifact when the surface supports it.

Preview-first

Tool-result handling

If voice_convert returns status=ok, return the playable artifact/path first, then target voice, mime type, and rights summary.
If it returns consent_required, ask for source and target consent metadata instead of attempting a different voice identity.
If it returns not_available, quote the note and distinguish provider setup, feature gating, key/quota limits, file format, and source path issues.

Rights and copyright guard

授权 is required for the source speaker and target voice.
Copyright / 版权: do not convert songs, movie lines, podcasts, audiobooks, lectures, interviews, or game/animation dialogue unless the user says they have rights.
Public figure policy: do not convert a recording to sound like a public figure, celebrity, actor, singer, politician, influencer, or fictional character.
If the user asks for a risky identity target, offer a non-identifying target: "mature calm Mandarin narrator", "bright young commercial voice", etc.

Locale and accent quality notes

For voice conversion, first identify the target language and locale. The source recording and target voice should be compatible with the desired locale-appropriate accent.

Chinese neutral narration: prefer clean 普通话 source and target voice.
English: preserve requested locale such as en-US, en-GB, en-AU, en-IN, or en-SG.
Japanese/Korean/French/German/Spanish/etc.: prefer source/target voices that naturally support that language.
Strong dialect, background music, reverberation, and heavy code-switching can cause odd accent transfer. Recommend re-recording a short, dry sample before converting a whole script.

Output contract

Return:

provider
target voice
output path
mime type
playable audio artifact status
rights/consent summary
target language / locale assumption