| name | deepgram-python-speech-to-text |
| description | Use when writing or reviewing Python code in this repo that calls Deepgram Speech-to-Text v1 (`/v1/listen`) for prerecorded or live audio transcription. Covers `client.listen.v1.media.transcribe_url` / `transcribe_file` (REST) and `client.listen.v1.connect` (WebSocket). Use this skill for basic ASR; use `deepgram-python-audio-intelligence` for summarize/sentiment/topics/diarize overlays, `deepgram-python-conversational-stt` for turn-taking v2/Flux, and `deepgram-python-voice-agent` for full-duplex assistants. Triggers include "transcribe", "live transcription", "speech to text", "STT", "listen endpoint", "nova-3", "listen.v1". |
Using Deepgram Speech-to-Text (Python SDK)
Basic transcription (ASR) for prerecorded audio (REST) or live audio (WebSocket) via /v1/listen.
When to use this product
- REST (
transcribe_url / transcribe_file) — one-shot transcription of a complete file or URL. Use for batch jobs, captioning pipelines, offline analysis.
- WebSocket (
listen.v1.connect) — continuous streaming transcription. Use for live captions, real-time microphone input, phone audio.
Use a different skill when:
- You want summaries, sentiment, topics, intents, diarization, or redaction on the audio →
deepgram-python-audio-intelligence (same endpoint, different params).
- You need turn-taking / end-of-turn events →
deepgram-python-conversational-stt (v2 / Flux).
- You need a full-duplex interactive assistant (STT + LLM + TTS + function calls) →
deepgram-python-voice-agent.
Authentication
import os
from dotenv import load_dotenv
load_dotenv()
from deepgram import DeepgramClient
client = DeepgramClient()
Header sent on every request: Authorization: Token <api_key> (NOT Bearer).
Quick start — REST (prerecorded URL)
response = client.listen.v1.media.transcribe_url(
url="https://dpgr.am/spacewalk.wav",
model="nova-3",
smart_format=True,
punctuate=True,
)
transcript = response.results.channels[0].alternatives[0].transcript
Quick start — REST (prerecorded file)
with open("audio.wav", "rb") as f:
audio_bytes = f.read()
response = client.listen.v1.media.transcribe_file(
request=audio_bytes,
model="nova-3",
)
request= accepts raw bytes or an iterator of bytes (stream large files chunk-by-chunk). Do NOT pass a file handle.
Quick start — WebSocket (live streaming with interim results)
Live transcription emits interim (partial) and final results. Pass interim_results=True and switch on is_final to display partial text in real time, then overwrite it with the final transcript when the speaker pauses.
import threading
from deepgram.core.events import EventType
from deepgram.listen.v1.types import (
ListenV1Results, ListenV1Metadata,
ListenV1SpeechStarted, ListenV1UtteranceEnd,
)
with client.listen.v1.connect(
model="nova-3",
interim_results=True,
utterance_end_ms=1000,
vad_events=True,
smart_format=True,
) as conn:
state = {"last_interim_len": 0}
def on_message(m):
if isinstance(m, ListenV1Results) and m.channel and m.channel.alternatives:
transcript = m.channel.alternatives[0].transcript
if not transcript:
return
if m.is_final:
pad = " " * max(0, state["last_interim_len"] - len(transcript))
end = "\n" if m.speech_final else ""
print(f"\r{transcript}{pad}", end=end, flush=True)
state["last_interim_len"] = 0
else:
print(f"\r{transcript}", end="", flush=True)
state["last_interim_len"] = len(transcript)
elif isinstance(m, ListenV1UtteranceEnd):
print()
elif isinstance(m, ListenV1SpeechStarted):
pass
conn.on(EventType.OPEN, lambda _: print("connected"))
conn.on(EventType.MESSAGE, on_message)
conn.on(EventType.CLOSE, lambda _: print("\nclosed"))
conn.on(EventType.ERROR, lambda e: print(f"\nerr: {e}"))
threading.Thread(target=conn.start_listening, daemon=True).start()
for chunk in audio_chunks:
conn.send_media(chunk)
conn.send_finalize()
Interim vs. final flag semantics
is_final = False — interim hypothesis. Will be revised. Display in a non-committal style (lighter colour, italic) and overwrite when the next message arrives.
is_final = True, speech_final = False — confirmed segment, but the speaker is still talking. Append to the transcript; another final will follow.
is_final = True, speech_final = True — confirmed segment AND the utterance ended (silence detected). Commit the line and start a new one.
from_finalize = True — this final was triggered by your explicit send_finalize() call (vs natural endpointing). Useful to distinguish "I asked for a flush" from "the speaker paused".
Send send_finalize() to force the server to emit final results immediately (e.g. user clicks "stop"). Send send_close_stream() after send_finalize to terminate cleanly.
WSS message types live under deepgram.listen.v1.types.
Async equivalents
from deepgram import AsyncDeepgramClient
client = AsyncDeepgramClient()
response = await client.listen.v1.media.transcribe_url(url=..., model="nova-3")
async with client.listen.v1.connect(model="nova-3") as conn:
await conn.start_listening()
Async / deferred result patterns
There are two distinct notions of "async" — don't confuse them.
1. Python async/await (sync-style, immediate result)
AsyncDeepgramClient returns Awaitable[<full response>]. The result is delivered when you await, not later. Use this when integrating with FastAPI, aiohttp, or any asyncio app.
import asyncio
from deepgram import AsyncDeepgramClient
client = AsyncDeepgramClient()
async def transcribe(url: str) -> str:
response = await client.listen.v1.media.transcribe_url(
url=url,
model="nova-3",
smart_format=True,
)
return response.results.channels[0].alternatives[0].transcript
text = asyncio.run(transcribe("https://dpgr.am/spacewalk.wav"))
2. Deferred via callback URL (webhook, results posted later)
Pass callback="https://your.app/webhook" and the request returns immediately with a request_id. Deepgram processes the audio in the background and POSTs the final result to your webhook URL. There is no polling endpoint — your server must be reachable to receive the result.
response = client.listen.v1.media.transcribe_url(
url="https://dpgr.am/spacewalk.wav",
callback="https://your.app/deepgram-webhook",
callback_method="POST",
model="nova-3",
smart_format=True,
)
print(f"Accepted; tracking id: {response.request_id}")
The webhook receives the same JSON body you would have received from a synchronous transcribe_url call. Use this for very long files or when you don't want the request hanging open.
| Pattern | Returns | When to use |
|---|
client.listen.v1.media.transcribe_url(...) | full transcription synchronously | files up to ~10 min; HTTP timeout-bound |
await AsyncDeepgramClient().listen.v1.media.transcribe_url(...) | full transcription, non-blocking | inside asyncio apps |
transcribe_url(..., callback="https://...") | {request_id} immediately, transcription POSTs to webhook later | very long files; no long-lived HTTP connection |
client.listen.v1.connect(...) (WebSocket) | streaming events as audio is sent | live audio (mic, telephony) |
See examples/12-transcription-prerecorded-callback.py for a working callback example.
Key parameters
model, language, encoding, sample_rate, channels, multichannel, punctuate, smart_format, diarize, endpointing, interim_results, utterance_end_ms, vad_events, keywords, search, redact, numerals, paragraphs, utterances.
API reference (layered)
- In-repo Fern-generated reference:
reference.md — sections "Listen V1 Media" (REST) and "Listen V1 Connect" (WSS).
- Canonical OpenAPI (REST): https://developers.deepgram.com/openapi.yaml
- Canonical AsyncAPI (WSS): https://developers.deepgram.com/asyncapi.yaml
- Context7 — natural-language queries over the full Deepgram docs corpus. Library ID:
/llmstxt/developers_deepgram_llms_txt.
- Product docs:
Gotchas
- Use the right auth scheme for the credential type. API keys use
Authorization: Token <api_key>. Temporary / access tokens (from client.auth.v1.tokens.grant() or an equivalent server) use Authorization: Bearer <access_token> — the custom DeepgramClient installs a Bearer override when you pass access_token=... (see src/deepgram/client.py). Sending Bearer <api_key> with a long-lived API key is what fails.
- Encoding must match the audio. Declaring
encoding="linear16" but sending Opus → garbage output or 400.
- Close streams cleanly. Call
send_finalize() before exiting the WSS context — otherwise the last partial is dropped.
- Keepalive on long WSS sessions. If idle > ~10s, the server closes. Send
KeepAlive messages or audio chunks.
- Intelligence features are REST-only.
summarize, topics, intents, sentiment, detect_language do NOT work over WSS — see deepgram-python-audio-intelligence.
transcribe_file(request=...) takes bytes or an iterator, not a file handle.
nova-3 is the current flagship STT model. Check client.manage.v1.models.list() for the live set.
- Sync
connection.start_listening() blocks. Run it in a thread (sync) or as a task (async) so you can send audio concurrently.
Example files in this repo
examples/10-transcription-prerecorded-url.py
examples/11-transcription-prerecorded-file.py
examples/12-transcription-prerecorded-callback.py
examples/13-transcription-live-websocket.py
tests/wire/test_listen_v1_media.py — wire-level fixtures
tests/manual/listen/v1/connect/main.py — live WSS connection test
Central product skills
For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills:
npx skills add deepgram/skills
This SDK ships language-idiomatic code skills; deepgram/skills ships cross-language product knowledge (see api, docs, recipes, examples, starters, setup-mcp).