| name | deepgram-python-audio-intelligence |
| description | Use when writing or reviewing Python code in this repo that calls Deepgram audio analytics overlays on `/v1/listen` - summarize, topics, intents, sentiment, diarize, redact, detect_language, entity detection. Same endpoint as plain STT but with analytics params. Covers both REST (`client.listen.v1.media.transcribe_url`/`transcribe_file`) and the WSS-supported subset (`client.listen.v1.connect`). Use `deepgram-python-speech-to-text` for plain transcription, `deepgram-python-text-intelligence` for analytics on already-transcribed text. Triggers include "diarize", "summarize audio", "sentiment from audio", "redact PII", "topic detection audio", "audio intelligence", "detect language audio". |
Using Deepgram Audio Intelligence (Python SDK)
Analytics overlays applied to /v1/listen transcription: summarize, topics, intents, sentiment, language detection, diarization, redaction, entities. Same endpoint / same client methods as STT ā enable features via params.
When to use this product
- You have audio (file, URL, or live stream) and want analytics alongside the transcript.
- REST is the primary path ā most analytics are REST-only.
Use a different skill when:
- You want a pure transcript with no analytics ā
deepgram-python-speech-to-text.
- Your input is already transcribed text ā
deepgram-python-text-intelligence (/v1/read).
- You need conversational turn-taking ā
deepgram-python-conversational-stt.
- You need a full interactive agent ā
deepgram-python-voice-agent.
Feature availability: REST vs WSS
| Feature | REST | WSS |
|---|
diarize | yes | yes |
redact | yes | yes |
punctuate, smart_format | yes | yes |
| Entity detection | yes | yes |
summarize | yes | no |
topics | yes | no |
intents | yes | no |
sentiment | yes | no |
detect_language | yes | no |
custom_topic / custom_intent | yes | no |
For the WSS-only subset, same code path as deepgram-python-speech-to-text.
Authentication
from dotenv import load_dotenv
load_dotenv()
from deepgram import DeepgramClient
client = DeepgramClient()
Header: Authorization: Token <api_key>.
Quick start ā REST with full analytics
response = client.listen.v1.media.transcribe_url(
url="https://dpgr.am/spacewalk.wav",
model="nova-3",
smart_format=True,
punctuate=True,
diarize=True,
summarize="v2",
topics=True,
intents=True,
sentiment=True,
detect_language=True,
redact=["pci", "pii"],
language="en-US",
)
r = response.results
print("transcript:", r.channels[0].alternatives[0].transcript)
print("summary:", r.summary)
print("topics:", r.topics)
print("intents:", r.intents)
print("sentiments:", r.sentiments)
print("detected_language:", r.channels[0].detected_language)
for word in r.channels[0].alternatives[0].words or []:
speaker = getattr(word, "speaker", None)
if speaker is not None:
print(f"Speaker {speaker}: {word.word}")
Quick start ā REST file
with open("call.wav", "rb") as f:
audio = f.read()
response = client.listen.v1.media.transcribe_file(
request=audio,
model="nova-3",
diarize=True,
redact=["pii"],
summarize="v2",
topics=True,
)
Quick start ā diarization with word-level timings
Enable speaker separation and word-level timestamps in a single request, then iterate the per-word objects to build a speaker-labelled transcript with timing.
response = client.listen.v1.media.transcribe_url(
url="https://dpgr.am/spacewalk.wav",
model="nova-3",
diarize=True,
smart_format=True,
punctuate=True,
)
words = response.results.channels[0].alternatives[0].words or []
for w in words:
speaker = getattr(w, "speaker", None)
text = w.punctuated_word or w.word
print(f"[speaker {speaker}] {text} ({w.start:.2f}sā{w.end:.2f}s, conf={w.confidence:.2f})")
from itertools import groupby
for speaker, group in groupby(words, key=lambda w: getattr(w, "speaker", None)):
text = " ".join((w.punctuated_word or w.word) for w in group)
print(f"Speaker {speaker}: {text}")
Per-word fields available on each entry:
| Field | Type | Description |
|---|
word | str | Lowercase token |
punctuated_word | str | None | Token with smart-formatted casing/punctuation (when smart_format=True) |
start, end | float | Audio timestamps in seconds |
confidence | float | 0.0ā1.0 confidence |
speaker | int | None | Speaker id (when diarize=True); None if diarization disabled |
speaker_confidence | float | None | Speaker-id confidence |
For a higher-level breakdown, set utterances=True to get pre-grouped speaker turns at response.results.utterances. Set paragraphs=True for a paragraphs view organised by speaker turn boundaries.
Quick start ā WSS subset (diarize / redact / entities only)
import threading
from deepgram.core.events import EventType
with client.listen.v1.connect(model="nova-3", diarize=True, redact=["pii"]) as conn:
conn.on(EventType.MESSAGE, lambda m: print(m))
threading.Thread(target=conn.start_listening, daemon=True).start()
for chunk in audio_chunks:
conn.send_media(chunk)
conn.send_finalize()
Key parameters
summarize, topics, intents, sentiment, detect_language, diarize, redact, custom_topic, custom_topic_mode, custom_intent, custom_intent_mode, detect_entities, plus all the standard STT params (model, language, encoding, sample_rate, ...).
redact is typed as Optional[str] in the current generated SDK (src/deepgram/listen/v1/media/client.py). Pass a single redaction mode such as "pci", "pii", "numbers", or "phi". Multi-mode redaction at the transport level is supported by sending redact as a repeated query parameter ā check src/deepgram/types/listen_v1redact.py for the current type and fall back to raw query-param construction (or multiple calls) if you need several modes. The earlier Union[str, Sequence[str]] override is no longer carried in .fernignore.
API reference (layered)
- In-repo reference:
reference.md ā "Listen V1 Media" (REST params include all analytics flags), "Listen V1 Connect" (WSS-supported subset).
- OpenAPI (REST): https://developers.deepgram.com/openapi.yaml
- AsyncAPI (WSS): https://developers.deepgram.com/asyncapi.yaml
- Context7: library ID
/llmstxt/developers_deepgram_llms_txt.
- Product docs:
Gotchas
summarize on /v1/listen accepts a boolean OR the string "v2". Use "v2" to pin the current summarization model; True also works (maps to the default model). /v1/read is the reverse ā it accepts boolean only. If you need summarization on already-transcribed text, see deepgram-python-text-intelligence.
- Sentiment / topics / intents / summarize / detect_language are REST-only. Don't pass them on WSS ā they'll be ignored or rejected.
- English-only for sentiment / topics / intents / summarize.
- Not all models support all overlays. Flux / Base models have restrictions. Stick to
nova-3 unless you have a reason.
- Redaction values are
pci, pii, phi, numbers, etc. ā not arbitrary strings.
custom_topic / custom_intent need a mode ("extended" or "strict").
- Diarization is noisy on short / low-quality audio. Expect speaker churn on <30s clips.
Example files in this repo
examples/15-transcription-advanced-options.py ā smart_format, punctuate, diarize
tests/wire/test_listen_v1_media.py ā wire test covering intelligence params
Related skills
deepgram-python-speech-to-text ā same endpoint, plain transcription
deepgram-python-text-intelligence ā same analytics, text input
deepgram-python-conversational-stt ā Flux for turn-taking
deepgram-python-voice-agent ā interactive assistants
Central product skills
For cross-language Deepgram product knowledge ā the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup ā install the central skills:
npx skills add deepgram/skills
This SDK ships language-idiomatic code skills; deepgram/skills ships cross-language product knowledge (see api, docs, recipes, examples, starters, setup-mcp).