一键在 Manus 中运行任何 Skill

speech-use

星标123

分支10

更新时间2026年2月6日 21:44

Generate (TTS), Transcribe (STT), and Clone voices using Google's GenAI and Cloud Speech SDKs. Supports Gemini-TTS, Chirp 3, and Instant Custom Voice.

安装

用 Codex 或 Claude 帮你安装复制这段 Prompt，粘贴到 Codex、Claude 或其他助手里，让它检查 Skill 页面并帮你完成安装。

在 Manus 中运行

来源

cnemri

cnemri/google-genai-skills

打开 GitHub 仓库查看创作者相关仓库

下载

在 Manus 中运行

Speech Use

Use this skill to perform Text-to-Speech (TTS), Speech-to-Text (STT), and Voice Cloning operations.

This skill uses portable Python scripts managed by uv.

Prerequisites

Environment Variables:
- GOOGLE_API_KEY (for TTS via Gemini)
- GOOGLE_CLOUD_PROJECT (Required for STT and Voice Cloning)
- GOOGLE_APPLICATION_CREDENTIALS (Recommended for STT/Voice Cloning)
APIs Enabled:
- Text-to-Speech API (texttospeech.googleapis.com)
- Speech-to-Text API (speech.googleapis.com)

Usage

1. Generate Speech (TTS)

Generate audio from text using Gemini-TTS.

Standard Voice:

uv run skills/speech-use/scripts/generate_speech.py "Hello world, this is a test." --voice Puck --output hello.wav

Custom Voice (Cloned):

uv run skills/speech-use/scripts/generate_speech.py "This is my custom voice speaking." --voice-cloning-key "YOUR_KEY_HERE" --output custom.wav

2. Create Custom Voice (Voice Cloning)

Generate a voiceCloningKey from a reference audio file and a consent file.

Requirements:

reference.wav: 10-30s of clear speech (the voice to clone).
consent.wav: The speaker saying: "I am the owner of this voice and I consent to Google using this voice to create a synthetic voice model."

uv run skills/speech-use/scripts/create_custom_voice.py --reference-audio reference.wav --consent-audio consent.wav

Save the output key to use with generate_speech.py.

3. Transcribe Audio (STT)

Transcribe audio files using Chirp 3.

uv run skills/speech-use/scripts/transcribe_audio.py audio.wav --language en-US --output transcript.txt

Options

generate_speech.py

--voice: Prebuilt voice (e.g., Kore, Puck, Fenrir, Aoede).
--voice-cloning-key: Key from create_custom_voice.py.
--model: Default gemini-2.5-flash-preview-tts.

transcribe_audio.py

--model: Default chirp_3.
--language: Default auto.
--location: Cloud region (default us).

References

Before running scripts, review the reference guides for available voices and options.

Voices Guide - 30+ voice options with styles (Puck, Kore, Fenrir, Aoede, etc.)

同仓库更多 Skills

同仓库

deep-research

cnemri/google-genai-skills

Perform autonomous, multi-step research using the Gemini Deep Research Agent (Interactions API). Supports web search, file/directory context, and resilient streaming.

2026-02-06123

google-developer-knowledge

cnemri/google-genai-skills

Search and retrieve Google's developer documentation using the Developer Knowledge API. Query documentation chunks, get full document content, or batch retrieve multiple documents. Covers ai.google.dev, developer.android.com, docs.cloud.google.com, firebase.google.com, and more.

2026-02-06123

nano-banana-use

cnemri/google-genai-skills

Generate, edit, and compose images using Gemini Nano Banana models via portable Python scripts. Handles authentication via API Key or Vertex AI environment variables. Available parameters: prompt, model, aspect-ratio, safety-filter-level. Always confirm parameters with the user or explicitly state defaults before running.

2026-02-06123

veo-use

cnemri/google-genai-skills

Create and edit videos using Google's Veo 2 and Veo 3 models. Supports Text-to-Video, Image-to-Video, Reference-to-Video, Inpainting, and Video Extension. Available parameters: prompt, image, mask, mode, duration, aspect-ratio. Always confirm parameters with the user or explicitly state defaults before running.

2026-02-06123

speech-build

cnemri/google-genai-skills

Generate and transcribe speech using Google's Gemini-TTS and Chirp 3 models. Supports Text-to-Speech (Single/Multi-speaker), Instant Custom Voice, and Speech-to-Text (Transcription/Diarization).

2026-01-30123

nano-banana-build

cnemri/google-genai-skills

Generate and edit high-quality images using Gemini 2.5 Flash Image and Gemini 3 Pro Image (Nano Banana). Supports Text-to-Image, Style Transfer, Virtual Try-On, and Character Consistency.

2026-01-30123

name	speech-use
description	Generate (TTS), Transcribe (STT), and Clone voices using Google's GenAI and Cloud Speech SDKs. Supports Gemini-TTS, Chirp 3, and Instant Custom Voice.

Speech Use

Use this skill to perform Text-to-Speech (TTS), Speech-to-Text (STT), and Voice Cloning operations.

This skill uses portable Python scripts managed by uv.

Prerequisites

Environment Variables:
- GOOGLE_API_KEY (for TTS via Gemini)
- GOOGLE_CLOUD_PROJECT (Required for STT and Voice Cloning)
- GOOGLE_APPLICATION_CREDENTIALS (Recommended for STT/Voice Cloning)
APIs Enabled:
- Text-to-Speech API (texttospeech.googleapis.com)
- Speech-to-Text API (speech.googleapis.com)

Usage

1. Generate Speech (TTS)

Generate audio from text using Gemini-TTS.

Standard Voice:

uv run skills/speech-use/scripts/generate_speech.py "Hello world, this is a test." --voice Puck --output hello.wav

Custom Voice (Cloned):

uv run skills/speech-use/scripts/generate_speech.py "This is my custom voice speaking." --voice-cloning-key "YOUR_KEY_HERE" --output custom.wav

2. Create Custom Voice (Voice Cloning)

Generate a voiceCloningKey from a reference audio file and a consent file.

Requirements:

reference.wav: 10-30s of clear speech (the voice to clone).
consent.wav: The speaker saying: "I am the owner of this voice and I consent to Google using this voice to create a synthetic voice model."

uv run skills/speech-use/scripts/create_custom_voice.py --reference-audio reference.wav --consent-audio consent.wav

Save the output key to use with generate_speech.py.

3. Transcribe Audio (STT)

Transcribe audio files using Chirp 3.

uv run skills/speech-use/scripts/transcribe_audio.py audio.wav --language en-US --output transcript.txt

Options

generate_speech.py

--voice: Prebuilt voice (e.g., Kore, Puck, Fenrir, Aoede).
--voice-cloning-key: Key from create_custom_voice.py.
--model: Default gemini-2.5-flash-preview-tts.

transcribe_audio.py

--model: Default chirp_3.
--language: Default auto.
--location: Cloud region (default us).

References

Before running scripts, review the reference guides for available voices and options.

Voices Guide - 30+ voice options with styles (Puck, Kore, Fenrir, Aoede, etc.)