Run any Skill in Manus with one click

$pwd:

deepgram-python-audio-intelligence

Name: Deepgram Python Audio Intelligence
Author: deepgram

// Use when writing or reviewing Python code in this repo that calls Deepgram audio analytics overlays on `/v1/listen` - summarize, topics, intents, sentiment, diarize, redact, detect_language, entity detection. Same endpoint as plain STT but with analytics params. Covers both REST (`client.listen.v1.media.transcribe_url`/`transcribe_file`) and the WSS-supported subset (`client.listen.v1.connect`). Use `deepgram-python-speech-to-text` for plain transcription, `deepgram-python-text-intelligence` for analytics on already-transcribed text. Triggers include "diarize", "summarize audio", "sentiment from audio", "redact PII", "topic detection audio", "audio intelligence", "detect language audio".

Run Skill in Manus

$ git log --oneline --stat

stars:436

forks:131

updated:April 27, 2026 at 16:14

SKILL.md

readonly

related-skills.json

same repository

deepgram-python-speech-to-text.md

from "deepgram/deepgram-python-sdk"

Use when writing or reviewing Python code in this repo that calls Deepgram Speech-to-Text v1 (`/v1/listen`) for prerecorded or live audio transcription. Covers `client.listen.v1.media.transcribe_url` / `transcribe_file` (REST) and `client.listen.v1.connect` (WebSocket). Use this skill for basic ASR; use `deepgram-python-audio-intelligence` for summarize/sentiment/topics/diarize overlays, `deepgram-python-conversational-stt` for turn-taking v2/Flux, and `deepgram-python-voice-agent` for full-duplex assistants. Triggers include "transcribe", "live transcription", "speech to text", "STT", "listen endpoint", "nova-3", "listen.v1".

2026-04-27436

deepgram-python-voice-agent.md

from "deepgram/deepgram-python-sdk"

Use when writing or reviewing Python code in this repo that builds an interactive voice agent via `agent.deepgram.com/v1/agent/converse`. Covers `client.agent.v1.connect()`, `AgentV1Settings`, `send_settings`, `send_media`, event handling, and function/tool calling. Full-duplex STT + LLM + TTS with barge-in. Use `deepgram-python-text-to-speech` for one-way synthesis, `deepgram-python-speech-to-text` / `deepgram-python-conversational-stt` for transcription only. Triggers include "voice agent", "agent converse", "full duplex", "interactive assistant", "barge-in", "agent.v1", "function calling", "AgentV1Settings".

2026-04-27436

deepgram-python-conversational-stt.md

from "deepgram/deepgram-python-sdk"

Use when writing or reviewing Python code in this repo that calls Deepgram Conversational STT v2 / Flux (`/v2/listen`) for turn-aware streaming transcription. Covers `client.listen.v2.connect(...)`, Flux models, end-of-turn detection. Use `deepgram-python-speech-to-text` for standard v1 ASR, `deepgram-python-voice-agent` for full-duplex interactive assistants. Triggers include "flux", "v2 listen", "conversational STT", "turn detection", "end of turn", "EOT", "listen.v2", "flux-general-en", "flux-general-multi".

2026-04-27436

deepgram-python-management-api.md

from "deepgram/deepgram-python-sdk"

Use when writing or reviewing Python code in this repo that calls Deepgram Management APIs - projects, API keys, members, invites, usage, billing, models, and reusable Voice Agent configurations. Covers `client.manage.v1.projects`, project-scoped resources under `client.manage.v1.projects.*` (keys, members, members.invites, usage, billing, models, requests), global `client.manage.v1.models`, think-model discovery at `client.agent.v1.settings.think.models`, and `client.voice_agent.configurations.*`. Use `deepgram-python-voice-agent` when you want to run an agent interactively, this skill to PERSIST/LIST agent configs. Triggers include "management API", "list projects", "API keys", "members", "usage stats", "billing", "list models", "agent configurations", "manage.v1".

2026-04-27436

deepgram-python-text-intelligence.md

from "deepgram/deepgram-python-sdk"

Use when writing or reviewing Python code in this repo that calls Deepgram Text Intelligence / Read (`/v1/read`) for sentiment, summarization, topic detection, and intent recognition on text input. Covers `client.read.v1.text.analyze(...)` with body `text` or `url`. Use `deepgram-python-audio-intelligence` when the source is audio instead of text. Triggers include "read API", "text intelligence", "analyze text", "sentiment", "summarize text", "topics", "intents", "read.v1".

2026-04-27436

deepgram-python-text-to-speech.md

from "deepgram/deepgram-python-sdk"

Use when writing or reviewing Python code in this repo that calls Deepgram Text-to-Speech v1 (`/v1/speak`) for audio synthesis. Covers one-shot REST (`client.speak.v1.audio.generate`) and streaming WebSocket (`client.speak.v1.connect`). Also covers the in-repo `deepgram.helpers.TextBuilder` for incremental text assembly before synthesis. Use `deepgram-python-voice-agent` when you need full-duplex STT + LLM + TTS with barge-in. Triggers include "TTS", "speak", "synthesize voice", "aura", "text to speech", "speak.v1", "TextBuilder".

2026-04-27436

package.json

"author": "deepgram"

"repository": "deepgram/deepgram-python-sdk"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name

deepgram-python-audio-intelligence

description

Use when writing or reviewing Python code in this repo that calls Deepgram audio analytics overlays on `/v1/listen` - summarize, topics, intents, sentiment, diarize, redact, detect_language, entity detection. Same endpoint as plain STT but with analytics params. Covers both REST (`client.listen.v1.media.transcribe_url`/`transcribe_file`) and the WSS-supported subset (`client.listen.v1.connect`). Use `deepgram-python-speech-to-text` for plain transcription, `deepgram-python-text-intelligence` for analytics on already-transcribed text. Triggers include "diarize", "summarize audio", "sentiment from audio", "redact PII", "topic detection audio", "audio intelligence", "detect language audio".

Using Deepgram Audio Intelligence (Python SDK)

Analytics overlays applied to /v1/listen transcription: summarize, topics, intents, sentiment, language detection, diarization, redaction, entities. Same endpoint / same client methods as STT — enable features via params.

When to use this product

You have audio (file, URL, or live stream) and want analytics alongside the transcript.
REST is the primary path — most analytics are REST-only.

Use a different skill when:

You want a pure transcript with no analytics → deepgram-python-speech-to-text.
Your input is already transcribed text → deepgram-python-text-intelligence (/v1/read).
You need conversational turn-taking → deepgram-python-conversational-stt.
You need a full interactive agent → deepgram-python-voice-agent.

Feature availability: REST vs WSS

Feature	REST	WSS
`diarize`	yes	yes
`redact`	yes	yes
`punctuate`, `smart_format`	yes	yes
Entity detection	yes	yes
`summarize`	yes	no
`topics`	yes	no
`intents`	yes	no
`sentiment`	yes	no
`detect_language`	yes	no
`custom_topic` / `custom_intent`	yes	no

For the WSS-only subset, same code path as deepgram-python-speech-to-text.

Authentication

from dotenv import load_dotenv
load_dotenv()

from deepgram import DeepgramClient
client = DeepgramClient()

Header: Authorization: Token <api_key>.

Quick start — REST with full analytics

response = client.listen.v1.media.transcribe_url(
    url="https://dpgr.am/spacewalk.wav",
    model="nova-3",
    smart_format=True,
    punctuate=True,
    diarize=True,              # speaker separation
    summarize="v2",            # "v2" for the current model; True also accepted on /v1/listen
    topics=True,
    intents=True,
    sentiment=True,
    detect_language=True,
    redact=["pci", "pii"],     # or Sequence[str]
    language="en-US",
)

r = response.results
print("transcript:", r.channels[0].alternatives[0].transcript)
print("summary:",    r.summary)
print("topics:",     r.topics)
print("intents:",    r.intents)
print("sentiments:", r.sentiments)
print("detected_language:", r.channels[0].detected_language)

# Speaker diarization
for word in r.channels[0].alternatives[0].words or []:
    speaker = getattr(word, "speaker", None)
    if speaker is not None:
        print(f"Speaker {speaker}: {word.word}")

Quick start — REST file

with open("call.wav", "rb") as f:
    audio = f.read()

response = client.listen.v1.media.transcribe_file(
    request=audio,
    model="nova-3",
    diarize=True,
    redact=["pii"],
    summarize="v2",
    topics=True,
)

Quick start — diarization with word-level timings

Enable speaker separation and word-level timestamps in a single request, then iterate the per-word objects to build a speaker-labelled transcript with timing.

response = client.listen.v1.media.transcribe_url(
    url="https://dpgr.am/spacewalk.wav",
    model="nova-3",
    diarize=True,        # tag each word with a speaker id
    smart_format=True,   # punctuated_word for cleaner output
    punctuate=True,
)

words = response.results.channels[0].alternatives[0].words or []

# Per-word: speaker, timestamps, confidence
for w in words:
    speaker = getattr(w, "speaker", None)
    text = w.punctuated_word or w.word
    print(f"[speaker {speaker}] {text}  ({w.start:.2f}s–{w.end:.2f}s, conf={w.confidence:.2f})")

# Group consecutive words by speaker into utterances
from itertools import groupby
for speaker, group in groupby(words, key=lambda w: getattr(w, "speaker", None)):
    text = " ".join((w.punctuated_word or w.word) for w in group)
    print(f"Speaker {speaker}: {text}")

Per-word fields available on each entry:

Field	Type	Description
`word`	`str`	Lowercase token
`punctuated_word`	`str \| None`	Token with smart-formatted casing/punctuation (when `smart_format=True`)
`start`, `end`	`float`	Audio timestamps in seconds
`confidence`	`float`	0.0–1.0 confidence
`speaker`	`int \| None`	Speaker id (when `diarize=True`); `None` if diarization disabled
`speaker_confidence`	`float \| None`	Speaker-id confidence

For a higher-level breakdown, set utterances=True to get pre-grouped speaker turns at response.results.utterances. Set paragraphs=True for a paragraphs view organised by speaker turn boundaries.

Quick start — WSS subset (diarize / redact / entities only)

import threading
from deepgram.core.events import EventType

with client.listen.v1.connect(model="nova-3", diarize=True, redact=["pii"]) as conn:
    conn.on(EventType.MESSAGE, lambda m: print(m))
    threading.Thread(target=conn.start_listening, daemon=True).start()
    for chunk in audio_chunks:
        conn.send_media(chunk)
    conn.send_finalize()

Key parameters

summarize, topics, intents, sentiment, detect_language, diarize, redact, custom_topic, custom_topic_mode, custom_intent, custom_intent_mode, detect_entities, plus all the standard STT params (model, language, encoding, sample_rate, ...).

redact is typed as Optional[str] in the current generated SDK (src/deepgram/listen/v1/media/client.py). Pass a single redaction mode such as "pci", "pii", "numbers", or "phi". Multi-mode redaction at the transport level is supported by sending redact as a repeated query parameter — check src/deepgram/types/listen_v1redact.py for the current type and fall back to raw query-param construction (or multiple calls) if you need several modes. The earlier Union[str, Sequence[str]] override is no longer carried in .fernignore.

API reference (layered)

In-repo reference: reference.md — "Listen V1 Media" (REST params include all analytics flags), "Listen V1 Connect" (WSS-supported subset).
OpenAPI (REST): https://developers.deepgram.com/openapi.yaml
AsyncAPI (WSS): https://developers.deepgram.com/asyncapi.yaml
Context7: library ID /llmstxt/developers_deepgram_llms_txt.
Product docs:

Gotchas

summarize on /v1/listen accepts a boolean OR the string "v2". Use "v2" to pin the current summarization model; True also works (maps to the default model). /v1/read is the reverse — it accepts boolean only. If you need summarization on already-transcribed text, see deepgram-python-text-intelligence.
Sentiment / topics / intents / summarize / detect_language are REST-only. Don't pass them on WSS — they'll be ignored or rejected.
English-only for sentiment / topics / intents / summarize.
Not all models support all overlays. Flux / Base models have restrictions. Stick to nova-3 unless you have a reason.
Redaction values are pci, pii, phi, numbers, etc. — not arbitrary strings.
custom_topic / custom_intent need a mode ("extended" or "strict").
Diarization is noisy on short / low-quality audio. Expect speaker churn on <30s clips.

Example files in this repo

examples/15-transcription-advanced-options.py — smart_format, punctuate, diarize
tests/wire/test_listen_v1_media.py — wire test covering intelligence params

Related skills

deepgram-python-speech-to-text — same endpoint, plain transcription
deepgram-python-text-intelligence — same analytics, text input
deepgram-python-conversational-stt — Flux for turn-taking
deepgram-python-voice-agent — interactive assistants

Central product skills

For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills:

npx skills add deepgram/skills

This SDK ships language-idiomatic code skills; deepgram/skills ships cross-language product knowledge (see api, docs, recipes, examples, starters, setup-mcp).

deepgram-python-audio-intelligence

More from this repository

More from this repository

Using Deepgram Audio Intelligence (Python SDK)

When to use this product

Feature availability: REST vs WSS

Authentication

Quick start — REST with full analytics

Quick start — REST file

Quick start — diarization with word-level timings

Quick start — WSS subset (diarize / redact / entities only)

Key parameters

API reference (layered)

Gotchas

Example files in this repo

Related skills

Central product skills

Using Deepgram Audio Intelligence (Python SDK)

When to use this product

Feature availability: REST vs WSS

Authentication

Quick start — REST with full analytics

Quick start — REST file

Quick start — diarization with word-level timings

Quick start — WSS subset (diarize / redact / entities only)

Key parameters

API reference (layered)

Gotchas

Example files in this repo

Related skills

Central product skills