一键在 Manus 中运行任何 Skill

voice-agents

星标64

分支12

更新时间2026年3月13日 14:07

Build conversational voice agents using Sarvam AI with LiveKit or Pipecat. Handles voice assistants, phone bots, IVR, and real-time conversational AI for Indian languages. Integrates Sarvam STT (Saaras v3), TTS (Bulbul v3), and LLM (Sarvam-30B) with low-latency streaming. Use when creating voice-enabled applications or real-time speech pipelines.

安装

用 Codex 或 Claude 帮你安装复制这段 Prompt，粘贴到 Codex、Claude 或其他助手里，让它检查 Skill 页面并帮你完成安装。

在 Manus 中运行

来源

sarvamai

sarvamai/skills

打开 GitHub 仓库查看创作者相关仓库

下载

在 Manus 中运行

Voice Agents — Sarvam AI

[!IMPORTANT] Auth: api-subscription-key header — NOT Authorization: Bearer. Env var: SARVAM_API_KEY

LiveKit Quick Start

pip install livekit-agents livekit-plugins-sarvam livekit-plugins-silero

from livekit.agents import Agent, AgentSession, JobContext, WorkerOptions, cli
from livekit.plugins import sarvam, silero

class VoiceAssistant(Agent):
    def __init__(self):
        super().__init__(
            vad=silero.VAD.load(),
            stt=sarvam.STT(model="saaras:v3"),
            llm=sarvam.LLM(model="sarvam-30b"),
            tts=sarvam.TTS(model="bulbul:v3", voice="shubh")
        )

    async def on_enter(self, session: AgentSession):
        await session.say("नमस्ते! मैं आपकी कैसे मदद कर सकती हूं?")

async def entrypoint(ctx: JobContext):
    agent = VoiceAssistant()
    await agent.start(ctx)

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Pipecat Quick Start

pip install pipecat-ai "pipecat-ai[sarvam,silero,daily]"

from pipecat.pipeline import Pipeline
from pipecat.services.sarvam import SarvamSTT, SarvamTTS, SarvamLLM
from pipecat.vad.silero import SileroVAD
from pipecat.transports.local import LocalAudioTransport

transport = LocalAudioTransport()
pipeline = Pipeline([
    transport.input(), SileroVAD(),
    SarvamSTT(model="saaras:v3"),
    SarvamLLM(model="sarvam-30b", system_prompt="You are a helpful voice assistant."),
    SarvamTTS(model="bulbul:v3", voice="shubh"),
    transport.output()
])

JavaScript/TypeScript Note

LiveKit and Pipecat agents are Python-only. For JS/TS voice pipelines, use the individual SDK methods directly:

import { SarvamAIClient } from "sarvamai";
const client = new SarvamAIClient({ apiSubscriptionKey: "YOUR_SARVAM_API_KEY" });

// STT: client.speechToText.transcribe({...})
// TTS: client.textToSpeech.convertStream({...})  // returns BinaryResponse
// LLM: client.chat.completions({...})

Gotchas

Gotcha	Detail
Use `sarvam-30b`	Best latency for voice. Only use `sarvam-105b` when reasoning quality matters more than speed.
`max_tokens` budget	Sarvam models reason internally. Don't set low `max_tokens` or `content` will be `None`. Omit or set 500+.
TTS pitch/loudness	NOT supported on Bulbul v3 — API returns 400. Only `pace` works.
STT WebSocket codecs	Only `wav`/`pcm` — no MP3/AAC/OGG for streaming.
HTTP Stream for TTS	`convert_stream` returns binary audio directly (no base64), better for pipelines.

Full Docs

Fetch framework integration guides, environment setup, and advanced patterns from:

https://docs.sarvam.ai/llms.txt — comprehensive docs index
LiveKit Guide
Pipecat Guide
Rate Limits

同仓库更多 Skills

同仓库

chat

sarvamai/skills

Chat completions using Sarvam AI LLMs (Sarvam-105B, Sarvam-30B). Handles AI chat, text generation, reasoning, coding, and multilingual conversations in Indian languages. OpenAI-compatible API. Use when building chatbots, Q&A systems, agents, or any LLM feature targeting Indian users.

2026-03-1364

speech-to-text

sarvamai/skills

Transcribe audio to text using Sarvam AI's Saaras model. Handles speech recognition, transcription, and voice interfaces for 23 Indian languages. Supports 5 output modes, auto language detection, WebSocket streaming, and batch diarization. Use when converting speech to text or building voice-enabled apps.

2026-03-1364

text-to-speech

sarvamai/skills

Convert text to natural speech using Sarvam AI's Bulbul v3 model. Handles audio generation, voiceovers, and voice interfaces for 11 Indian languages with 30+ voices. Supports REST, HTTP streaming, WebSocket, and pronunciation dictionaries. Use when generating spoken audio from text.

2026-03-1364

translate

sarvamai/skills

Translate text between English and Indian languages using Sarvam AI (Sarvam-Translate, Mayura). Handles content translation and app localization across 22+ languages with mode control, script options, and numeral formats. Use when translating or localizing content for Indian users.

2026-03-1364

name	voice-agents
description	Build conversational voice agents using Sarvam AI with LiveKit or Pipecat. Handles voice assistants, phone bots, IVR, and real-time conversational AI for Indian languages. Integrates Sarvam STT (Saaras v3), TTS (Bulbul v3), and LLM (Sarvam-30B) with low-latency streaming. Use when creating voice-enabled applications or real-time speech pipelines.
license	Apache-2.0
metadata	{"author":"sarvam-ai","version":"3.0"}

Voice Agents — Sarvam AI

[!IMPORTANT] Auth: api-subscription-key header — NOT Authorization: Bearer. Env var: SARVAM_API_KEY

LiveKit Quick Start

pip install livekit-agents livekit-plugins-sarvam livekit-plugins-silero

from livekit.agents import Agent, AgentSession, JobContext, WorkerOptions, cli
from livekit.plugins import sarvam, silero

class VoiceAssistant(Agent):
    def __init__(self):
        super().__init__(
            vad=silero.VAD.load(),
            stt=sarvam.STT(model="saaras:v3"),
            llm=sarvam.LLM(model="sarvam-30b"),
            tts=sarvam.TTS(model="bulbul:v3", voice="shubh")
        )

    async def on_enter(self, session: AgentSession):
        await session.say("नमस्ते! मैं आपकी कैसे मदद कर सकती हूं?")

async def entrypoint(ctx: JobContext):
    agent = VoiceAssistant()
    await agent.start(ctx)

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Pipecat Quick Start

pip install pipecat-ai "pipecat-ai[sarvam,silero,daily]"

from pipecat.pipeline import Pipeline
from pipecat.services.sarvam import SarvamSTT, SarvamTTS, SarvamLLM
from pipecat.vad.silero import SileroVAD
from pipecat.transports.local import LocalAudioTransport

transport = LocalAudioTransport()
pipeline = Pipeline([
    transport.input(), SileroVAD(),
    SarvamSTT(model="saaras:v3"),
    SarvamLLM(model="sarvam-30b", system_prompt="You are a helpful voice assistant."),
    SarvamTTS(model="bulbul:v3", voice="shubh"),
    transport.output()
])

JavaScript/TypeScript Note

LiveKit and Pipecat agents are Python-only. For JS/TS voice pipelines, use the individual SDK methods directly:

import { SarvamAIClient } from "sarvamai";
const client = new SarvamAIClient({ apiSubscriptionKey: "YOUR_SARVAM_API_KEY" });

// STT: client.speechToText.transcribe({...})
// TTS: client.textToSpeech.convertStream({...})  // returns BinaryResponse
// LLM: client.chat.completions({...})

Gotchas

Gotcha	Detail
Use `sarvam-30b`	Best latency for voice. Only use `sarvam-105b` when reasoning quality matters more than speed.
`max_tokens` budget	Sarvam models reason internally. Don't set low `max_tokens` or `content` will be `None`. Omit or set 500+.
TTS pitch/loudness	NOT supported on Bulbul v3 — API returns 400. Only `pace` works.
STT WebSocket codecs	Only `wav`/`pcm` — no MP3/AAC/OGG for streaming.
HTTP Stream for TTS	`convert_stream` returns binary audio directly (no base64), better for pipelines.

Full Docs

Fetch framework integration guides, environment setup, and advanced patterns from:

https://docs.sarvam.ai/llms.txt — comprehensive docs index
LiveKit Guide
Pipecat Guide
Rate Limits