Run any Skill in Manus with one click

$pwd:

deepgram-python-text-to-speech

Name: Deepgram Python Text To Speech
Author: deepgram

// Use when writing or reviewing Python code in this repo that calls Deepgram Text-to-Speech v1 (`/v1/speak`) for audio synthesis. Covers one-shot REST (`client.speak.v1.audio.generate`) and streaming WebSocket (`client.speak.v1.connect`). Also covers the in-repo `deepgram.helpers.TextBuilder` for incremental text assembly before synthesis. Use `deepgram-python-voice-agent` when you need full-duplex STT + LLM + TTS with barge-in. Triggers include "TTS", "speak", "synthesize voice", "aura", "text to speech", "speak.v1", "TextBuilder".

Run Skill in Manus

$ git log --oneline --stat

stars:436

forks:131

updated:April 27, 2026 at 11:03

SKILL.md

readonly

related-skills.json

same repository

deepgram-python-audio-intelligence.md

from "deepgram/deepgram-python-sdk"

Use when writing or reviewing Python code in this repo that calls Deepgram audio analytics overlays on `/v1/listen` - summarize, topics, intents, sentiment, diarize, redact, detect_language, entity detection. Same endpoint as plain STT but with analytics params. Covers both REST (`client.listen.v1.media.transcribe_url`/`transcribe_file`) and the WSS-supported subset (`client.listen.v1.connect`). Use `deepgram-python-speech-to-text` for plain transcription, `deepgram-python-text-intelligence` for analytics on already-transcribed text. Triggers include "diarize", "summarize audio", "sentiment from audio", "redact PII", "topic detection audio", "audio intelligence", "detect language audio".

2026-04-27436

deepgram-python-speech-to-text.md

from "deepgram/deepgram-python-sdk"

Use when writing or reviewing Python code in this repo that calls Deepgram Speech-to-Text v1 (`/v1/listen`) for prerecorded or live audio transcription. Covers `client.listen.v1.media.transcribe_url` / `transcribe_file` (REST) and `client.listen.v1.connect` (WebSocket). Use this skill for basic ASR; use `deepgram-python-audio-intelligence` for summarize/sentiment/topics/diarize overlays, `deepgram-python-conversational-stt` for turn-taking v2/Flux, and `deepgram-python-voice-agent` for full-duplex assistants. Triggers include "transcribe", "live transcription", "speech to text", "STT", "listen endpoint", "nova-3", "listen.v1".

2026-04-27436

deepgram-python-voice-agent.md

from "deepgram/deepgram-python-sdk"

Use when writing or reviewing Python code in this repo that builds an interactive voice agent via `agent.deepgram.com/v1/agent/converse`. Covers `client.agent.v1.connect()`, `AgentV1Settings`, `send_settings`, `send_media`, event handling, and function/tool calling. Full-duplex STT + LLM + TTS with barge-in. Use `deepgram-python-text-to-speech` for one-way synthesis, `deepgram-python-speech-to-text` / `deepgram-python-conversational-stt` for transcription only. Triggers include "voice agent", "agent converse", "full duplex", "interactive assistant", "barge-in", "agent.v1", "function calling", "AgentV1Settings".

2026-04-27436

deepgram-python-conversational-stt.md

from "deepgram/deepgram-python-sdk"

Use when writing or reviewing Python code in this repo that calls Deepgram Conversational STT v2 / Flux (`/v2/listen`) for turn-aware streaming transcription. Covers `client.listen.v2.connect(...)`, Flux models, end-of-turn detection. Use `deepgram-python-speech-to-text` for standard v1 ASR, `deepgram-python-voice-agent` for full-duplex interactive assistants. Triggers include "flux", "v2 listen", "conversational STT", "turn detection", "end of turn", "EOT", "listen.v2", "flux-general-en", "flux-general-multi".

2026-04-27436

deepgram-python-management-api.md

from "deepgram/deepgram-python-sdk"

Use when writing or reviewing Python code in this repo that calls Deepgram Management APIs - projects, API keys, members, invites, usage, billing, models, and reusable Voice Agent configurations. Covers `client.manage.v1.projects`, project-scoped resources under `client.manage.v1.projects.*` (keys, members, members.invites, usage, billing, models, requests), global `client.manage.v1.models`, think-model discovery at `client.agent.v1.settings.think.models`, and `client.voice_agent.configurations.*`. Use `deepgram-python-voice-agent` when you want to run an agent interactively, this skill to PERSIST/LIST agent configs. Triggers include "management API", "list projects", "API keys", "members", "usage stats", "billing", "list models", "agent configurations", "manage.v1".

2026-04-27436

deepgram-python-text-intelligence.md

from "deepgram/deepgram-python-sdk"

Use when writing or reviewing Python code in this repo that calls Deepgram Text Intelligence / Read (`/v1/read`) for sentiment, summarization, topic detection, and intent recognition on text input. Covers `client.read.v1.text.analyze(...)` with body `text` or `url`. Use `deepgram-python-audio-intelligence` when the source is audio instead of text. Triggers include "read API", "text intelligence", "analyze text", "sentiment", "summarize text", "topics", "intents", "read.v1".

2026-04-27436

package.json

"author": "deepgram"

"repository": "deepgram/deepgram-python-sdk"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name

deepgram-python-text-to-speech

description

Use when writing or reviewing Python code in this repo that calls Deepgram Text-to-Speech v1 (`/v1/speak`) for audio synthesis. Covers one-shot REST (`client.speak.v1.audio.generate`) and streaming WebSocket (`client.speak.v1.connect`). Also covers the in-repo `deepgram.helpers.TextBuilder` for incremental text assembly before synthesis. Use `deepgram-python-voice-agent` when you need full-duplex STT + LLM + TTS with barge-in. Triggers include "TTS", "speak", "synthesize voice", "aura", "text to speech", "speak.v1", "TextBuilder".

Using Deepgram Text-to-Speech (Python SDK)

Convert text to audio: one-shot REST download or low-latency streaming synthesis via /v1/speak.

When to use this product

REST (speak.v1.audio.generate) — one-shot synthesis, returns audio bytes. Use for rendered files, pre-generated prompts, anything where you have the full text upfront.
WebSocket (speak.v1.connect) — incremental text input, streaming audio output. Use for low-latency playback while an LLM is still producing tokens.

Use a different skill when:

You need the agent to also listen and converse (full-duplex) → deepgram-python-voice-agent.

Authentication

from dotenv import load_dotenv
load_dotenv()

from deepgram import DeepgramClient
client = DeepgramClient()  # reads DEEPGRAM_API_KEY

Header: Authorization: Token <api_key> (NOT Bearer).

Quick start — REST (one-shot)

audio_iter = client.speak.v1.audio.generate(
    text="Hello, this is a text to speech example.",
    model="aura-2-asteria-en",
    encoding="linear16",
    sample_rate=24000,
)

with open("output.raw", "wb") as f:
    for chunk in audio_iter:
        f.write(chunk)

Returns an iterator of bytes (streaming audio response). The response body is audio/*, NOT JSON. Useful response headers: dg-model-name, dg-char-count, dg-request-id.

Quick start — WebSocket (streaming)

from deepgram.core.events import EventType
from deepgram.speak.v1.types import SpeakV1Text

with client.speak.v1.connect(
    model="aura-2-asteria-en",
    encoding="linear16",
    sample_rate=24000,
) as conn:
    def on_message(m):
        if isinstance(m, bytes):
            # audio chunk — write to file or audio output
            ...
        else:
            print(f"event: {getattr(m, 'type', 'Unknown')}")

    conn.on(EventType.OPEN,    lambda _: print("open"))
    conn.on(EventType.MESSAGE, on_message)
    conn.on(EventType.CLOSE,   lambda _: print("close"))
    conn.on(EventType.ERROR,   lambda e: print(f"err: {e}"))

    conn.send_text(SpeakV1Text(text="Hello, this is streaming TTS."))
    conn.send_flush()
    conn.send_close()
    conn.start_listening()   # blocks until server closes

In sync mode, start_listening() blocks — send all text + flush + close BEFORE calling it, OR run it in a thread. In async mode, run start_listening() as a task and send concurrently.

TextBuilder helper (incremental text assembly)

deepgram.helpers.TextBuilder is a hand-maintained helper (NOT Fern-generated) that assembles text incrementally — useful when streaming LLM tokens into TTS.

from deepgram.helpers import TextBuilder

final_text = (
    TextBuilder()
    .text("Hello,")
    .text(" this is built incrementally.")
    .pronunciation("Deepgram", "ˈdiːpɡɹæm")
    .pause(200)
    .build()
)

The fluent API is .text(...) (append raw text), .pronunciation(word, ipa) (pin pronunciation), .pause(duration_ms) (insert a pause), and .build() (return the final SSML-ish string). There is no .add(...) method.

See examples/22-text-builder-demo.py, examples/23-text-builder-helper.py, examples/24-text-builder-streaming.py.

Async equivalents

from deepgram import AsyncDeepgramClient
client = AsyncDeepgramClient()

# REST
audio_iter = await client.speak.v1.audio.generate(text=..., model="aura-2-asteria-en")
async for chunk in audio_iter:
    ...

# WSS
async with client.speak.v1.connect(model="aura-2-asteria-en", ...) as conn:
    listen_task = asyncio.create_task(conn.start_listening())
    await conn.send_text(SpeakV1Text(text="..."))
    await conn.send_flush()
    await conn.send_close()
    await listen_task

Key parameters

REST & WSS: model (e.g. aura-2-asteria-en), encoding (linear16, mulaw, alaw, opus, flac, mp3, aac), sample_rate, bit_rate, container, callback (REST async), tag, mip_opt_out.

WSS client messages: SpeakV1Text, Flush, Clear, Close.

API reference (layered)

In-repo reference: reference.md — sections "Speak V1 Audio" (REST) and "Speak V1 Connect" (WSS).
OpenAPI (REST): https://developers.deepgram.com/openapi.yaml
AsyncAPI (WSS): https://developers.deepgram.com/asyncapi.yaml
Context7: library ID /llmstxt/developers_deepgram_llms_txt.
Product docs:

Gotchas

Token auth, not Bearer.
REST response is audio bytes, not JSON. Iterate the response; don't .json() it.
Flush before close (WSS). send_close() without send_flush() may drop trailing audio.
Sync start_listening() blocks. Queue all messages first, or use async.
SpeakV1Text is required for WSS text input — don't send raw strings.
encoding/sample_rate/container must match your playback path. Mismatches cause silent failure or distortion.
TextBuilder helpers are hand-maintained (listed in .fernignore as permanently frozen). Don't move them under src/deepgram/ auto-generated paths.

Example files in this repo

examples/20-text-to-speech-single.py — REST one-shot
examples/21-text-to-speech-streaming.py — WSS streaming
examples/22-text-builder-demo.py — TextBuilder (no API key)
examples/23-text-builder-helper.py — TextBuilder + REST
examples/24-text-builder-streaming.py — TextBuilder + WSS
tests/wire/test_speak_v1_audio.py — REST wire test
tests/manual/speak/v1/connect/main.py — live WSS test

Central product skills

For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills:

npx skills add deepgram/skills

This SDK ships language-idiomatic code skills; deepgram/skills ships cross-language product knowledge (see api, docs, recipes, examples, starters, setup-mcp).

deepgram-python-text-to-speech

More from this repository

More from this repository

Using Deepgram Text-to-Speech (Python SDK)

When to use this product

Authentication

Quick start — REST (one-shot)

Quick start — WebSocket (streaming)

TextBuilder helper (incremental text assembly)

Async equivalents

Key parameters

API reference (layered)

Gotchas

Example files in this repo

Central product skills

Using Deepgram Text-to-Speech (Python SDK)

When to use this product

Authentication

Quick start — REST (one-shot)

Quick start — WebSocket (streaming)

TextBuilder helper (incremental text assembly)

Async equivalents

Key parameters

API reference (layered)

Gotchas

Example files in this repo

Central product skills