| name | speech-build |
| description | Generate and transcribe speech using Google's Gemini-TTS and Chirp 3 models. Supports Text-to-Speech (Single/Multi-speaker), Instant Custom Voice, and Speech-to-Text (Transcription/Diarization). |
Speech Skill (TTS & STT)
Use this skill to implement audio generation and transcription workflows using the google-genai and google-cloud-speech SDKs.
Quick Start Setup
from google import genai
from google.genai import types
client = genai.Client()
Reference Materials
Common Workflows
1. Generate Speech (Gemini-TTS)
response = client.models.generate_content(
model="gemini-2.5-flash-preview-tts",
contents="Hello, world!",
config=types.GenerateContentConfig(
response_modalities=["AUDIO"],
speech_config=types.SpeechConfig(
voice_config=types.VoiceConfig(
prebuilt_voice_config=types.PrebuiltVoiceConfig(voice_name='Kore')
)
)
)
)
2. Transcribe Audio (Chirp 3)
from google.cloud import speech_v2
response = speech_client.recognize(...)