Run any Skill in Manus with one click

$pwd:

deepgram-python-speech-to-text

Name: Deepgram Python Speech To Text
Author: deepgram

// Use when writing or reviewing Python code in this repo that calls Deepgram Speech-to-Text v1 (`/v1/listen`) for prerecorded or live audio transcription. Covers `client.listen.v1.media.transcribe_url` / `transcribe_file` (REST) and `client.listen.v1.connect` (WebSocket). Use this skill for basic ASR; use `deepgram-python-audio-intelligence` for summarize/sentiment/topics/diarize overlays, `deepgram-python-conversational-stt` for turn-taking v2/Flux, and `deepgram-python-voice-agent` for full-duplex assistants. Triggers include "transcribe", "live transcription", "speech to text", "STT", "listen endpoint", "nova-3", "listen.v1".

Run Skill in Manus

$ git log --oneline --stat

stars:436

forks:131

updated:April 27, 2026 at 16:14

SKILL.md

readonly

related-skills.json

same repository

deepgram-python-audio-intelligence.md

from "deepgram/deepgram-python-sdk"

Use when writing or reviewing Python code in this repo that calls Deepgram audio analytics overlays on `/v1/listen` - summarize, topics, intents, sentiment, diarize, redact, detect_language, entity detection. Same endpoint as plain STT but with analytics params. Covers both REST (`client.listen.v1.media.transcribe_url`/`transcribe_file`) and the WSS-supported subset (`client.listen.v1.connect`). Use `deepgram-python-speech-to-text` for plain transcription, `deepgram-python-text-intelligence` for analytics on already-transcribed text. Triggers include "diarize", "summarize audio", "sentiment from audio", "redact PII", "topic detection audio", "audio intelligence", "detect language audio".

2026-04-27436

deepgram-python-voice-agent.md

from "deepgram/deepgram-python-sdk"

Use when writing or reviewing Python code in this repo that builds an interactive voice agent via `agent.deepgram.com/v1/agent/converse`. Covers `client.agent.v1.connect()`, `AgentV1Settings`, `send_settings`, `send_media`, event handling, and function/tool calling. Full-duplex STT + LLM + TTS with barge-in. Use `deepgram-python-text-to-speech` for one-way synthesis, `deepgram-python-speech-to-text` / `deepgram-python-conversational-stt` for transcription only. Triggers include "voice agent", "agent converse", "full duplex", "interactive assistant", "barge-in", "agent.v1", "function calling", "AgentV1Settings".

2026-04-27436

deepgram-python-conversational-stt.md

from "deepgram/deepgram-python-sdk"

Use when writing or reviewing Python code in this repo that calls Deepgram Conversational STT v2 / Flux (`/v2/listen`) for turn-aware streaming transcription. Covers `client.listen.v2.connect(...)`, Flux models, end-of-turn detection. Use `deepgram-python-speech-to-text` for standard v1 ASR, `deepgram-python-voice-agent` for full-duplex interactive assistants. Triggers include "flux", "v2 listen", "conversational STT", "turn detection", "end of turn", "EOT", "listen.v2", "flux-general-en", "flux-general-multi".

2026-04-27436

deepgram-python-management-api.md

from "deepgram/deepgram-python-sdk"

Use when writing or reviewing Python code in this repo that calls Deepgram Management APIs - projects, API keys, members, invites, usage, billing, models, and reusable Voice Agent configurations. Covers `client.manage.v1.projects`, project-scoped resources under `client.manage.v1.projects.*` (keys, members, members.invites, usage, billing, models, requests), global `client.manage.v1.models`, think-model discovery at `client.agent.v1.settings.think.models`, and `client.voice_agent.configurations.*`. Use `deepgram-python-voice-agent` when you want to run an agent interactively, this skill to PERSIST/LIST agent configs. Triggers include "management API", "list projects", "API keys", "members", "usage stats", "billing", "list models", "agent configurations", "manage.v1".

2026-04-27436

deepgram-python-text-intelligence.md

from "deepgram/deepgram-python-sdk"

Use when writing or reviewing Python code in this repo that calls Deepgram Text Intelligence / Read (`/v1/read`) for sentiment, summarization, topic detection, and intent recognition on text input. Covers `client.read.v1.text.analyze(...)` with body `text` or `url`. Use `deepgram-python-audio-intelligence` when the source is audio instead of text. Triggers include "read API", "text intelligence", "analyze text", "sentiment", "summarize text", "topics", "intents", "read.v1".

2026-04-27436

deepgram-python-text-to-speech.md

from "deepgram/deepgram-python-sdk"

Use when writing or reviewing Python code in this repo that calls Deepgram Text-to-Speech v1 (`/v1/speak`) for audio synthesis. Covers one-shot REST (`client.speak.v1.audio.generate`) and streaming WebSocket (`client.speak.v1.connect`). Also covers the in-repo `deepgram.helpers.TextBuilder` for incremental text assembly before synthesis. Use `deepgram-python-voice-agent` when you need full-duplex STT + LLM + TTS with barge-in. Triggers include "TTS", "speak", "synthesize voice", "aura", "text to speech", "speak.v1", "TextBuilder".

2026-04-27436

package.json

"author": "deepgram"

"repository": "deepgram/deepgram-python-sdk"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name

deepgram-python-speech-to-text

description

Use when writing or reviewing Python code in this repo that calls Deepgram Speech-to-Text v1 (`/v1/listen`) for prerecorded or live audio transcription. Covers `client.listen.v1.media.transcribe_url` / `transcribe_file` (REST) and `client.listen.v1.connect` (WebSocket). Use this skill for basic ASR; use `deepgram-python-audio-intelligence` for summarize/sentiment/topics/diarize overlays, `deepgram-python-conversational-stt` for turn-taking v2/Flux, and `deepgram-python-voice-agent` for full-duplex assistants. Triggers include "transcribe", "live transcription", "speech to text", "STT", "listen endpoint", "nova-3", "listen.v1".

Using Deepgram Speech-to-Text (Python SDK)

Basic transcription (ASR) for prerecorded audio (REST) or live audio (WebSocket) via /v1/listen.

When to use this product

REST (transcribe_url / transcribe_file) — one-shot transcription of a complete file or URL. Use for batch jobs, captioning pipelines, offline analysis.
WebSocket (listen.v1.connect) — continuous streaming transcription. Use for live captions, real-time microphone input, phone audio.

Use a different skill when:

You want summaries, sentiment, topics, intents, diarization, or redaction on the audio → deepgram-python-audio-intelligence (same endpoint, different params).
You need turn-taking / end-of-turn events → deepgram-python-conversational-stt (v2 / Flux).
You need a full-duplex interactive assistant (STT + LLM + TTS + function calls) → deepgram-python-voice-agent.

Authentication

import os
from dotenv import load_dotenv
load_dotenv()

from deepgram import DeepgramClient

client = DeepgramClient()  # reads DEEPGRAM_API_KEY from env
# or: DeepgramClient(api_key=os.environ["DEEPGRAM_API_KEY"])

Header sent on every request: Authorization: Token <api_key> (NOT Bearer).

Quick start — REST (prerecorded URL)

response = client.listen.v1.media.transcribe_url(
    url="https://dpgr.am/spacewalk.wav",
    model="nova-3",
    smart_format=True,
    punctuate=True,
)
transcript = response.results.channels[0].alternatives[0].transcript

Quick start — REST (prerecorded file)

with open("audio.wav", "rb") as f:
    audio_bytes = f.read()

response = client.listen.v1.media.transcribe_file(
    request=audio_bytes,
    model="nova-3",
)

request= accepts raw bytes or an iterator of bytes (stream large files chunk-by-chunk). Do NOT pass a file handle.

Quick start — WebSocket (live streaming with interim results)

Live transcription emits interim (partial) and final results. Pass interim_results=True and switch on is_final to display partial text in real time, then overwrite it with the final transcript when the speaker pauses.

import threading
from deepgram.core.events import EventType
from deepgram.listen.v1.types import (
    ListenV1Results, ListenV1Metadata,
    ListenV1SpeechStarted, ListenV1UtteranceEnd,
)

with client.listen.v1.connect(
    model="nova-3",
    interim_results=True,    # ← emit partial results while user is still speaking
    utterance_end_ms=1000,   # silence (ms) before server emits UtteranceEnd
    vad_events=True,         # SpeechStarted events
    smart_format=True,
) as conn:
    # Mutable container so the on_message closure can update state without `global`
    state = {"last_interim_len": 0}

    def on_message(m):
        if isinstance(m, ListenV1Results) and m.channel and m.channel.alternatives:
            transcript = m.channel.alternatives[0].transcript
            if not transcript:
                return
            if m.is_final:
                # Final segment: overwrite the running interim line, newline if utterance ended
                pad = " " * max(0, state["last_interim_len"] - len(transcript))
                end = "\n" if m.speech_final else ""
                print(f"\r{transcript}{pad}", end=end, flush=True)
                state["last_interim_len"] = 0
            else:
                # Interim: keep overwriting the same console line as the user speaks
                print(f"\r{transcript}", end="", flush=True)
                state["last_interim_len"] = len(transcript)
        elif isinstance(m, ListenV1UtteranceEnd):
            print()  # newline; UtteranceEnd fires after final results when audio goes silent
        elif isinstance(m, ListenV1SpeechStarted):
            pass  # optional: reset UI when a new utterance begins

    conn.on(EventType.OPEN,    lambda _: print("connected"))
    conn.on(EventType.MESSAGE, on_message)
    conn.on(EventType.CLOSE,   lambda _: print("\nclosed"))
    conn.on(EventType.ERROR,   lambda e: print(f"\nerr: {e}"))

    # Start receive loop in background so we can send concurrently
    threading.Thread(target=conn.start_listening, daemon=True).start()

    for chunk in audio_chunks:         # raw PCM bytes at declared encoding/sample_rate
        conn.send_media(chunk)

    conn.send_finalize()               # flush final partial before closing

Interim vs. final flag semantics

is_final = False — interim hypothesis. Will be revised. Display in a non-committal style (lighter colour, italic) and overwrite when the next message arrives.
is_final = True, speech_final = False — confirmed segment, but the speaker is still talking. Append to the transcript; another final will follow.
is_final = True, speech_final = True — confirmed segment AND the utterance ended (silence detected). Commit the line and start a new one.
from_finalize = True — this final was triggered by your explicit send_finalize() call (vs natural endpointing). Useful to distinguish "I asked for a flush" from "the speaker paused".

Send send_finalize() to force the server to emit final results immediately (e.g. user clicks "stop"). Send send_close_stream() after send_finalize to terminate cleanly.

WSS message types live under deepgram.listen.v1.types.

Async equivalents

from deepgram import AsyncDeepgramClient
client = AsyncDeepgramClient()

response = await client.listen.v1.media.transcribe_url(url=..., model="nova-3")

async with client.listen.v1.connect(model="nova-3") as conn:
    # same .on(...) handlers, then:
    await conn.start_listening()

Async / deferred result patterns

There are two distinct notions of "async" — don't confuse them.

1. Python `async/await` (sync-style, immediate result)

AsyncDeepgramClient returns Awaitable[<full response>]. The result is delivered when you await, not later. Use this when integrating with FastAPI, aiohttp, or any asyncio app.

import asyncio
from deepgram import AsyncDeepgramClient

client = AsyncDeepgramClient()

async def transcribe(url: str) -> str:
    response = await client.listen.v1.media.transcribe_url(
        url=url,
        model="nova-3",
        smart_format=True,
    )
    # `response` is the FULL transcription — no polling, no callback, just await.
    return response.results.channels[0].alternatives[0].transcript

text = asyncio.run(transcribe("https://dpgr.am/spacewalk.wav"))

2. Deferred via callback URL (webhook, results posted later)

Pass callback="https://your.app/webhook" and the request returns immediately with a request_id. Deepgram processes the audio in the background and POSTs the final result to your webhook URL. There is no polling endpoint — your server must be reachable to receive the result.

response = client.listen.v1.media.transcribe_url(
    url="https://dpgr.am/spacewalk.wav",
    callback="https://your.app/deepgram-webhook",
    callback_method="POST",         # or "PUT"
    model="nova-3",
    smart_format=True,
)
print(f"Accepted; tracking id: {response.request_id}")
# response is a "listen accepted" — NOT the transcript. Wait for your webhook.

The webhook receives the same JSON body you would have received from a synchronous transcribe_url call. Use this for very long files or when you don't want the request hanging open.

Pattern	Returns	When to use
`client.listen.v1.media.transcribe_url(...)`	full transcription synchronously	files up to ~10 min; HTTP timeout-bound
`await AsyncDeepgramClient().listen.v1.media.transcribe_url(...)`	full transcription, non-blocking	inside asyncio apps
`transcribe_url(..., callback="https://...")`	`{request_id}` immediately, transcription POSTs to webhook later	very long files; no long-lived HTTP connection
`client.listen.v1.connect(...)` (WebSocket)	streaming events as audio is sent	live audio (mic, telephony)

See examples/12-transcription-prerecorded-callback.py for a working callback example.

Key parameters

model, language, encoding, sample_rate, channels, multichannel, punctuate, smart_format, diarize, endpointing, interim_results, utterance_end_ms, vad_events, keywords, search, redact, numerals, paragraphs, utterances.

API reference (layered)

In-repo Fern-generated reference: reference.md — sections "Listen V1 Media" (REST) and "Listen V1 Connect" (WSS).
Canonical OpenAPI (REST): https://developers.deepgram.com/openapi.yaml
Canonical AsyncAPI (WSS): https://developers.deepgram.com/asyncapi.yaml
Context7 — natural-language queries over the full Deepgram docs corpus. Library ID: /llmstxt/developers_deepgram_llms_txt.
Product docs:
- https://developers.deepgram.com/reference/speech-to-text/listen-pre-recorded
- https://developers.deepgram.com/reference/speech-to-text/listen-streaming

Gotchas

Use the right auth scheme for the credential type. API keys use Authorization: Token <api_key>. Temporary / access tokens (from client.auth.v1.tokens.grant() or an equivalent server) use Authorization: Bearer <access_token> — the custom DeepgramClient installs a Bearer override when you pass access_token=... (see src/deepgram/client.py). Sending Bearer <api_key> with a long-lived API key is what fails.
Encoding must match the audio. Declaring encoding="linear16" but sending Opus → garbage output or 400.
Close streams cleanly. Call send_finalize() before exiting the WSS context — otherwise the last partial is dropped.
Keepalive on long WSS sessions. If idle > ~10s, the server closes. Send KeepAlive messages or audio chunks.
Intelligence features are REST-only. summarize, topics, intents, sentiment, detect_language do NOT work over WSS — see deepgram-python-audio-intelligence.
transcribe_file(request=...) takes bytes or an iterator, not a file handle.
nova-3 is the current flagship STT model. Check client.manage.v1.models.list() for the live set.
Sync connection.start_listening() blocks. Run it in a thread (sync) or as a task (async) so you can send audio concurrently.

Example files in this repo

examples/10-transcription-prerecorded-url.py
examples/11-transcription-prerecorded-file.py
examples/12-transcription-prerecorded-callback.py
examples/13-transcription-live-websocket.py
tests/wire/test_listen_v1_media.py — wire-level fixtures
tests/manual/listen/v1/connect/main.py — live WSS connection test

Central product skills

For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills:

npx skills add deepgram/skills

This SDK ships language-idiomatic code skills; deepgram/skills ships cross-language product knowledge (see api, docs, recipes, examples, starters, setup-mcp).

deepgram-python-speech-to-text

More from this repository

More from this repository

Using Deepgram Speech-to-Text (Python SDK)

When to use this product

Authentication

Quick start — REST (prerecorded URL)

Quick start — REST (prerecorded file)

Quick start — WebSocket (live streaming with interim results)

Interim vs. final flag semantics

Async equivalents

Async / deferred result patterns

1. Python async/await (sync-style, immediate result)

2. Deferred via callback URL (webhook, results posted later)

Key parameters

API reference (layered)

Gotchas

Example files in this repo

Central product skills

Using Deepgram Speech-to-Text (Python SDK)

When to use this product

Authentication

Quick start — REST (prerecorded URL)

Quick start — REST (prerecorded file)

Quick start — WebSocket (live streaming with interim results)

Interim vs. final flag semantics

Async equivalents

Async / deferred result patterns

1. Python async/await (sync-style, immediate result)

2. Deferred via callback URL (webhook, results posted later)

Key parameters

API reference (layered)

Gotchas

Example files in this repo

Central product skills

1. Python `async/await` (sync-style, immediate result)

1. Python `async/await` (sync-style, immediate result)