Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

text-to-speech

Étoiles64

Forks12

Mis à jour13 mars 2026 à 14:07

Convert text to natural speech using Sarvam AI's Bulbul v3 model. Handles audio generation, voiceovers, and voice interfaces for 11 Indian languages with 30+ voices. Supports REST, HTTP streaming, WebSocket, and pronunciation dictionaries. Use when generating spoken audio from text.

Installation

Installer avec Codex ou Claude Copiez ce prompt, collez-le dans Codex, Claude ou un autre assistant, puis laissez-le vérifier la page du skill et l'installer pour vous.

Exécuter dans Manus

Source

sarvamai

sarvamai/skills

Ouvrir le dépôt GitHub Voir les dépôts du créateur

Téléchargement

Exécuter dans Manus

Métiers associésSOC

Basé sur la classification professionnelle SOC

Autres occupations informatiquesProfessions informatiques et mathématiques·SOC 15-1299

SKILL.md

readonly

name	text-to-speech
description	Convert text to natural speech using Sarvam AI's Bulbul v3 model. Handles audio generation, voiceovers, and voice interfaces for 11 Indian languages with 30+ voices. Supports REST, HTTP streaming, WebSocket, and pronunciation dictionaries. Use when generating spoken audio from text.
license	Apache-2.0
metadata	{"author":"sarvam-ai","version":"3.0"}

Text-to-Speech — Bulbul

[!IMPORTANT] Auth: api-subscription-key header — NOT Authorization: Bearer. Base URL: https://api.sarvam.ai/v1

Model

bulbul:v3 — 11 languages, 30+ voices (default: shubh), REST/HTTP stream/WebSocket.

Quick Start (Python)

from sarvamai import SarvamAI
from sarvamai.play import save

client = SarvamAI()

response = client.text_to_speech.convert(
    text="नमस्ते, आप कैसे हैं?",
    target_language_code="hi-IN",
    model="bulbul:v3",
    speaker="shubh"
)
save(response, "output.wav")

# HTTP Stream (lower latency, binary audio)
chunks = []
for chunk in client.text_to_speech.convert_stream(
    text="Hello from Sarvam AI",
    target_language_code="en-IN",
    speaker="shubh",
    model="bulbul:v3"
):
    chunks.append(chunk)
audio = b"".join(chunks)

Quick Start (JavaScript/TypeScript)

import { SarvamAIClient } from "sarvamai";
import { writeFile } from "fs/promises";

const client = new SarvamAIClient({ apiSubscriptionKey: "YOUR_SARVAM_API_KEY" });

// REST
const response = await client.textToSpeech.convert({
    text: "नमस्ते, आप कैसे हैं?",
    target_language_code: "hi-IN",
    model: "bulbul:v3",
    speaker: "shubh"
});

// HTTP Stream (lower latency, returns BinaryResponse)
const streamResponse = await client.textToSpeech.convertStream({
    text: "Hello from Sarvam AI",
    target_language_code: "en-IN",
    speaker: "shubh",
    model: "bulbul:v3"
});
const bytes = await streamResponse.bytes();
await writeFile("output.wav", bytes);

WebSocket Streaming

import asyncio
from sarvamai import AsyncSarvamAI

async def tts_stream():
    client = AsyncSarvamAI()
    async with client.text_to_speech_streaming.connect(model="bulbul:v3") as ws:
        await ws.configure(target_language_code="hi-IN", speaker="shubh")
        await ws.convert("Your text here")
        await ws.flush()
        async for message in ws:
            pass  # base64 audio chunks

asyncio.run(tts_stream())

Character Limits

Method	Max Text
REST (`convert`)	2,500 chars
HTTP Stream (`convert_stream`)	3,500 chars
WebSocket	2,500 chars/msg

Gotchas

Gotcha	Detail
JS method name	`client.textToSpeech.convert({...})` and `.convertStream({...})` — camelCase. Stream returns `BinaryResponse` with `.stream()`, `.bytes()`, `.blob()`.
`pitch`/`loudness` rejected	SDK accepts these but API returns 400 for v3. Only `pace` (0.5–2.0) works.
v2 voices incompatible	`anushka`, `abhilash`, `arya`, etc. don't work with v3. Use `shubh` (default).
Sample rate >24kHz	32kHz, 44.1kHz, 48kHz only via REST, not streaming.
REST response	Base64-encoded audio in `response.audios[0]`. Use `sarvamai.play.save()` or `base64.b64decode()`.
Pronunciation dictionary	`dict_id` param teaches custom word pronunciations. Create via `client.pronunciation_dictionary.create(file=f)`.

Full Docs

Fetch voice catalog, streaming protocol, pronunciation dictionary CRUD, and codec options from:

https://docs.sarvam.ai/llms.txt — comprehensive docs index
TTS Overview
Voice Catalog
HTTP Stream
Pronunciation Dictionary
Rate Limits

Plus depuis ce dépôt

même dépôt

chat

sarvamai/skills

Chat completions using Sarvam AI LLMs (Sarvam-105B, Sarvam-30B). Handles AI chat, text generation, reasoning, coding, and multilingual conversations in Indian languages. OpenAI-compatible API. Use when building chatbots, Q&A systems, agents, or any LLM feature targeting Indian users.

2026-03-1364

speech-to-text

sarvamai/skills

Transcribe audio to text using Sarvam AI's Saaras model. Handles speech recognition, transcription, and voice interfaces for 23 Indian languages. Supports 5 output modes, auto language detection, WebSocket streaming, and batch diarization. Use when converting speech to text or building voice-enabled apps.

2026-03-1364

translate

sarvamai/skills

Translate text between English and Indian languages using Sarvam AI (Sarvam-Translate, Mayura). Handles content translation and app localization across 22+ languages with mode control, script options, and numeral formats. Use when translating or localizing content for Indian users.

2026-03-1364

voice-agents

sarvamai/skills

Build conversational voice agents using Sarvam AI with LiveKit or Pipecat. Handles voice assistants, phone bots, IVR, and real-time conversational AI for Indian languages. Integrates Sarvam STT (Saaras v3), TTS (Bulbul v3), and LLM (Sarvam-30B) with low-latency streaming. Use when creating voice-enabled applications or real-time speech pipelines.

2026-03-1364

name	text-to-speech
description	Convert text to natural speech using Sarvam AI's Bulbul v3 model. Handles audio generation, voiceovers, and voice interfaces for 11 Indian languages with 30+ voices. Supports REST, HTTP streaming, WebSocket, and pronunciation dictionaries. Use when generating spoken audio from text.
license	Apache-2.0
metadata	{"author":"sarvam-ai","version":"3.0"}

Text-to-Speech — Bulbul

[!IMPORTANT] Auth: api-subscription-key header — NOT Authorization: Bearer. Base URL: https://api.sarvam.ai/v1

Model

bulbul:v3 — 11 languages, 30+ voices (default: shubh), REST/HTTP stream/WebSocket.

Quick Start (Python)

from sarvamai import SarvamAI
from sarvamai.play import save

client = SarvamAI()

response = client.text_to_speech.convert(
    text="नमस्ते, आप कैसे हैं?",
    target_language_code="hi-IN",
    model="bulbul:v3",
    speaker="shubh"
)
save(response, "output.wav")

# HTTP Stream (lower latency, binary audio)
chunks = []
for chunk in client.text_to_speech.convert_stream(
    text="Hello from Sarvam AI",
    target_language_code="en-IN",
    speaker="shubh",
    model="bulbul:v3"
):
    chunks.append(chunk)
audio = b"".join(chunks)

Quick Start (JavaScript/TypeScript)

import { SarvamAIClient } from "sarvamai";
import { writeFile } from "fs/promises";

const client = new SarvamAIClient({ apiSubscriptionKey: "YOUR_SARVAM_API_KEY" });

// REST
const response = await client.textToSpeech.convert({
    text: "नमस्ते, आप कैसे हैं?",
    target_language_code: "hi-IN",
    model: "bulbul:v3",
    speaker: "shubh"
});

// HTTP Stream (lower latency, returns BinaryResponse)
const streamResponse = await client.textToSpeech.convertStream({
    text: "Hello from Sarvam AI",
    target_language_code: "en-IN",
    speaker: "shubh",
    model: "bulbul:v3"
});
const bytes = await streamResponse.bytes();
await writeFile("output.wav", bytes);

WebSocket Streaming

import asyncio
from sarvamai import AsyncSarvamAI

async def tts_stream():
    client = AsyncSarvamAI()
    async with client.text_to_speech_streaming.connect(model="bulbul:v3") as ws:
        await ws.configure(target_language_code="hi-IN", speaker="shubh")
        await ws.convert("Your text here")
        await ws.flush()
        async for message in ws:
            pass  # base64 audio chunks

asyncio.run(tts_stream())

Character Limits

Method	Max Text
REST (`convert`)	2,500 chars
HTTP Stream (`convert_stream`)	3,500 chars
WebSocket	2,500 chars/msg

Gotchas

Gotcha	Detail
JS method name	`client.textToSpeech.convert({...})` and `.convertStream({...})` — camelCase. Stream returns `BinaryResponse` with `.stream()`, `.bytes()`, `.blob()`.
`pitch`/`loudness` rejected	SDK accepts these but API returns 400 for v3. Only `pace` (0.5–2.0) works.
v2 voices incompatible	`anushka`, `abhilash`, `arya`, etc. don't work with v3. Use `shubh` (default).
Sample rate >24kHz	32kHz, 44.1kHz, 48kHz only via REST, not streaming.
REST response	Base64-encoded audio in `response.audios[0]`. Use `sarvamai.play.save()` or `base64.b64decode()`.
Pronunciation dictionary	`dict_id` param teaches custom word pronunciations. Create via `client.pronunciation_dictionary.create(file=f)`.

Full Docs

Fetch voice catalog, streaming protocol, pronunciation dictionary CRUD, and codec options from:

https://docs.sarvam.ai/llms.txt — comprehensive docs index
TTS Overview
Voice Catalog
HTTP Stream
Pronunciation Dictionary
Rate Limits