Run any Skill in Manus with one click

voice-clone-lab

Create and register cloned voices for later TTS only when the speaker has explicit consent. Use when the user asks for voice clone, clone voice, 克隆音色, 复刻声音, or wants a reusable voice_id.

Run Skill in Manus

Stars3,700

Forks288

UpdatedJune 2, 2026 at 21:32

Source

opensquilla

opensquilla/opensquilla

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

Useful forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

SKILL.md

readonly

name	voice-clone-lab
description	Create and register cloned voices for later TTS only when the speaker has explicit consent. Use when the user asks for voice clone, clone voice, 克隆音色, 复刻声音, or wants a reusable voice_id.
triggers	["voice clone","clone voice","克隆音色","复刻声音","声音克隆"]
provenance	{"origin":"opensquilla-original","license":"Apache-2.0","maintained_by":"OpenSquilla"}
metadata	{"opensquilla":{"risk":"high","capabilities":["network-read","filesystem-write"],"requires_tools":["voice_clone","audio_provider_capabilities"]}}

voice-clone-lab

Creates a reusable provider voice from a local sample. OpenRouter may help summarize the request or produce labels, but cloning must use the direct audio provider through voice_clone.

Request triage

Before calling tools, extract these fields from the user request:

sample path and whether the file is local, intentional, and user-provided
speaker identity class: self, employee/team member, private person, public figure, fictional character, or unknown
consent metadata: speaker, consent, sample source, permitted use, requested by, retention expectation, and whether commercial use is allowed
target use: TTS narration, IVR, dubbing, training content, or internal demo
target language, target locale, and desired locale-appropriate accent

OpenRouter can summarize consent text or label a voice, but it is not an audio provider and cannot replace explicit consent.

Consent-first workflow

Confirm the sample audio path is local and intentionally provided.
Require consent_metadata before calling voice_clone.
Include at minimum:
- speaker
- consent: true
- sample_source
- permitted_use
- requested_by
Reject or stop when consent is missing, vague, or contradicted by the request.
Call audio_provider_capabilities if cloning availability is uncertain.
Call voice_clone with the sample, name, description, and consent metadata.
Return the created voice ID and the allowed usage summary.

Tool-result handling

If voice_clone returns status=ok, return the voice ID first, then the consent summary, intended locale/accent, and any sample-quality warning.
If it returns consent_required, do not proceed with a workaround. Ask for the missing consent metadata in one concise question.
If the provider returns not_available, quote the note and distinguish disabled provider, key/quota limits, feature gating, and sample format issues.
Never suggest scraping, downloading, or extracting third-party voice samples as a fallback.

Rights and copyright guard

授权 is mandatory. The speaker must own or control the voice sample and agree to cloning for this use.
Copyright / 版权: do not use copyrighted recordings, film/TV/game clips, music stems, interviews, or scraped audio unless the user states they have rights.
Public figure policy: do not clone or imitate a public figure, celebrity, politician, influencer, actor, singer, or fictional character voice.
Do not help bypass provider safety checks or watermark/disclosure duties.
Store only the returned provider voice ID and consent summary in ordinary output; do not duplicate raw sample audio.

Locale and accent quality notes

Ask which target language and locale the cloned voice will be used for. A clone works best when the sample matches the desired locale-appropriate accent.

Chinese neutral narration: use clean 普通话 sample audio.
American English: use clean en-US sample audio.
British English: use clean en-GB sample audio.
Japanese/Korean/French/German/Spanish/etc.: use samples spoken in that target language, not an English sample repurposed cross-lingually.
Strong dialect, code-switching, room echo, music, or singing can produce odd accent transfer in later TTS. Recommend 30-90 seconds of dry speech when possible.

Output contract

Return:

provider
voice ID
voice name
consent summary
allowed use
target language / locale assumption
warning if the source sample quality may harm target-language accent quality

More from this repository

same repository

nano-banana-pro

opensquilla/opensquilla

Generate or edit a single image via OpenRouter (google/gemini-3.1-flash-image-preview by default). Accepts a text prompt and optional --input-image for image-to-image editing. Trigger when the user asks for an AI image, illustration, concept art, product render, or wants to modify an existing image.

2026-06-043.7k

seedance-2-prompt

opensquilla/opensquilla

Render a single 3-15s video clip via Seedance 2.0. Supports two backends: OpenRouter (default, model bytedance/seedance-2.0) and the official Volcengine ARK / BytePlus ModelArk endpoint (model doubao-seedance-2-0-260128 / dreamina-seedance-2-0-260128). Accepts a structured English video prompt, optional first-frame image, and optional identity/style reference image. Trigger when the user asks for AI video clip generation, 分镜视频, seedance, or wants a short cinematic shot from a prompt + frame.

2026-06-043.7k

meta-paper-write

opensquilla/opensquilla

Use this meta-skill instead of answering directly when the current user asks to draft, repair, compile, or produce an academic/research paper or LaTeX manuscript. It uses multi-skill orchestration for manuscript workflows that need source search, citation planning, experiment or figure/table placeholders, drafting, length checks, citation integrity, and LaTeX/PDF compilation. Ordinary paper requests use a compact draft path; explicit full/PDF/long-form requests use the full manuscript path. Do not use it for web research reports, slide decks, document decisions, or generic plotting.

2026-06-033.7k

advanced-dubbing-studio

opensquilla/opensquilla

Submit audio or video for multilingual dubbing, poll status, and download dubbed audio. Use when the user asks for dubbing, 多语言配音, 视频翻译配音, 译制片, or wants a source clip dubbed into another language.

2026-06-023.7k

music-and-singing-studio

opensquilla/opensquilla

Generate instrumental music, background beds, jingles, or sung songs with lyrics through OpenSquilla audio tools. Use when the user asks for BGM, music generation, 唱歌, 生成歌曲, lyrics to song, or a playable music audio artifact.

2026-06-023.7k

voice-conversion-studio

opensquilla/opensquilla

Convert a local source recording into an authorized target voice. Use when the user asks for voice conversion, voice changer, 换声, 变声, 音色转换, or converting existing narration to another approved voice.

2026-06-023.7k

voice-clone-lab

Creates a reusable provider voice from a local sample. OpenRouter may help summarize the request or produce labels, but cloning must use the direct audio provider through voice_clone.

Request triage

Before calling tools, extract these fields from the user request:

sample path and whether the file is local, intentional, and user-provided

speaker identity class: self, employee/team member, private person, public figure, fictional character, or unknown

consent metadata: speaker, consent, sample source, permitted use, requested by, retention expectation, and whether commercial use is allowed

target use: TTS narration, IVR, dubbing, training content, or internal demo

target language, target locale, and desired locale-appropriate accent

OpenRouter can summarize consent text or label a voice, but it is not an audio provider and cannot replace explicit consent.

Consent-first workflow

Confirm the sample audio path is local and intentionally provided.

Require consent_metadata before calling voice_clone.

Include at minimum:

speaker
consent: true
sample_source
permitted_use
requested_by

Reject or stop when consent is missing, vague, or contradicted by the request.

Call audio_provider_capabilities if cloning availability is uncertain.

Call voice_clone with the sample, name, description, and consent metadata.

Return the created voice ID and the allowed usage summary.

Tool-result handling

If voice_clone returns status=ok, return the voice ID first, then the consent summary, intended locale/accent, and any sample-quality warning.

If it returns consent_required, do not proceed with a workaround. Ask for the missing consent metadata in one concise question.

If the provider returns not_available, quote the note and distinguish disabled provider, key/quota limits, feature gating, and sample format issues.

Never suggest scraping, downloading, or extracting third-party voice samples as a fallback.

Rights and copyright guard

授权 is mandatory. The speaker must own or control the voice sample and agree to cloning for this use.

Copyright / 版权: do not use copyrighted recordings, film/TV/game clips, music stems, interviews, or scraped audio unless the user states they have rights.

Public figure policy: do not clone or imitate a public figure, celebrity, politician, influencer, actor, singer, or fictional character voice.

Do not help bypass provider safety checks or watermark/disclosure duties.

Store only the returned provider voice ID and consent summary in ordinary output; do not duplicate raw sample audio.

Locale and accent quality notes

Ask which target language and locale the cloned voice will be used for. A clone works best when the sample matches the desired locale-appropriate accent.

Chinese neutral narration: use clean 普通话 sample audio.

American English: use clean en-US sample audio.

British English: use clean en-GB sample audio.

Japanese/Korean/French/German/Spanish/etc.: use samples spoken in that target language, not an English sample repurposed cross-lingually.

Strong dialect, code-switching, room echo, music, or singing can produce odd accent transfer in later TTS. Recommend 30-90 seconds of dry speech when possible.

Output contract

Return:

provider

voice ID

voice name

consent summary

allowed use

target language / locale assumption

warning if the source sample quality may harm target-language accent quality