Run any Skill in Manus with one click

$pwd:

deepgram-python-conversational-stt

Name: Deepgram Python Conversational Stt
Author: deepgram

// Use when writing or reviewing Python code in this repo that calls Deepgram Conversational STT v2 / Flux (`/v2/listen`) for turn-aware streaming transcription. Covers `client.listen.v2.connect(...)`, Flux models, end-of-turn detection. Use `deepgram-python-speech-to-text` for standard v1 ASR, `deepgram-python-voice-agent` for full-duplex interactive assistants. Triggers include "flux", "v2 listen", "conversational STT", "turn detection", "end of turn", "EOT", "listen.v2", "flux-general-en", "flux-general-multi".

Run Skill in Manus

$ git log --oneline --stat

stars:436

forks:131

updated:April 27, 2026 at 11:03

SKILL.md

readonly

related-skills.json

same repository

deepgram-python-audio-intelligence.md

from "deepgram/deepgram-python-sdk"

Use when writing or reviewing Python code in this repo that calls Deepgram audio analytics overlays on `/v1/listen` - summarize, topics, intents, sentiment, diarize, redact, detect_language, entity detection. Same endpoint as plain STT but with analytics params. Covers both REST (`client.listen.v1.media.transcribe_url`/`transcribe_file`) and the WSS-supported subset (`client.listen.v1.connect`). Use `deepgram-python-speech-to-text` for plain transcription, `deepgram-python-text-intelligence` for analytics on already-transcribed text. Triggers include "diarize", "summarize audio", "sentiment from audio", "redact PII", "topic detection audio", "audio intelligence", "detect language audio".

2026-04-27436

deepgram-python-speech-to-text.md

from "deepgram/deepgram-python-sdk"

Use when writing or reviewing Python code in this repo that calls Deepgram Speech-to-Text v1 (`/v1/listen`) for prerecorded or live audio transcription. Covers `client.listen.v1.media.transcribe_url` / `transcribe_file` (REST) and `client.listen.v1.connect` (WebSocket). Use this skill for basic ASR; use `deepgram-python-audio-intelligence` for summarize/sentiment/topics/diarize overlays, `deepgram-python-conversational-stt` for turn-taking v2/Flux, and `deepgram-python-voice-agent` for full-duplex assistants. Triggers include "transcribe", "live transcription", "speech to text", "STT", "listen endpoint", "nova-3", "listen.v1".

2026-04-27436

deepgram-python-voice-agent.md

from "deepgram/deepgram-python-sdk"

Use when writing or reviewing Python code in this repo that builds an interactive voice agent via `agent.deepgram.com/v1/agent/converse`. Covers `client.agent.v1.connect()`, `AgentV1Settings`, `send_settings`, `send_media`, event handling, and function/tool calling. Full-duplex STT + LLM + TTS with barge-in. Use `deepgram-python-text-to-speech` for one-way synthesis, `deepgram-python-speech-to-text` / `deepgram-python-conversational-stt` for transcription only. Triggers include "voice agent", "agent converse", "full duplex", "interactive assistant", "barge-in", "agent.v1", "function calling", "AgentV1Settings".

2026-04-27436

deepgram-python-management-api.md

from "deepgram/deepgram-python-sdk"

Use when writing or reviewing Python code in this repo that calls Deepgram Management APIs - projects, API keys, members, invites, usage, billing, models, and reusable Voice Agent configurations. Covers `client.manage.v1.projects`, project-scoped resources under `client.manage.v1.projects.*` (keys, members, members.invites, usage, billing, models, requests), global `client.manage.v1.models`, think-model discovery at `client.agent.v1.settings.think.models`, and `client.voice_agent.configurations.*`. Use `deepgram-python-voice-agent` when you want to run an agent interactively, this skill to PERSIST/LIST agent configs. Triggers include "management API", "list projects", "API keys", "members", "usage stats", "billing", "list models", "agent configurations", "manage.v1".

2026-04-27436

deepgram-python-text-intelligence.md

from "deepgram/deepgram-python-sdk"

Use when writing or reviewing Python code in this repo that calls Deepgram Text Intelligence / Read (`/v1/read`) for sentiment, summarization, topic detection, and intent recognition on text input. Covers `client.read.v1.text.analyze(...)` with body `text` or `url`. Use `deepgram-python-audio-intelligence` when the source is audio instead of text. Triggers include "read API", "text intelligence", "analyze text", "sentiment", "summarize text", "topics", "intents", "read.v1".

2026-04-27436

deepgram-python-text-to-speech.md

from "deepgram/deepgram-python-sdk"

Use when writing or reviewing Python code in this repo that calls Deepgram Text-to-Speech v1 (`/v1/speak`) for audio synthesis. Covers one-shot REST (`client.speak.v1.audio.generate`) and streaming WebSocket (`client.speak.v1.connect`). Also covers the in-repo `deepgram.helpers.TextBuilder` for incremental text assembly before synthesis. Use `deepgram-python-voice-agent` when you need full-duplex STT + LLM + TTS with barge-in. Triggers include "TTS", "speak", "synthesize voice", "aura", "text to speech", "speak.v1", "TextBuilder".

2026-04-27436

package.json

"author": "deepgram"

"repository": "deepgram/deepgram-python-sdk"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name

deepgram-python-conversational-stt

description

Use when writing or reviewing Python code in this repo that calls Deepgram Conversational STT v2 / Flux (`/v2/listen`) for turn-aware streaming transcription. Covers `client.listen.v2.connect(...)`, Flux models, end-of-turn detection. Use `deepgram-python-speech-to-text` for standard v1 ASR, `deepgram-python-voice-agent` for full-duplex interactive assistants. Triggers include "flux", "v2 listen", "conversational STT", "turn detection", "end of turn", "EOT", "listen.v2", "flux-general-en", "flux-general-multi".

Using Deepgram Conversational STT / Flux (Python SDK)

Turn-aware streaming STT at /v2/listen — optimized for conversational audio (end-of-turn detection, eager EOT, barge-in scenarios).

When to use this product

You're building a conversational UI and need explicit turn boundaries.
You want Flux models (optimized for human-to-human or human-to-agent conversation).
You want lower latency turn signals than v1 utterance_end.

Use a different skill when:

You want general-purpose transcription (captions, batch, non-conversational) → deepgram-python-speech-to-text.
You want a full interactive agent (STT + LLM + TTS) → deepgram-python-voice-agent.
You want analytics (summarize/sentiment) → deepgram-python-audio-intelligence.

Authentication

import os
from dotenv import load_dotenv
load_dotenv()

from deepgram import DeepgramClient
client = DeepgramClient(api_key=os.environ["DEEPGRAM_API_KEY"])

Header: Authorization: Token <api_key>. WSS only — no REST path on v2.

Quick start

import threading, time
from pathlib import Path
from deepgram.core.events import EventType
from deepgram.listen.v2.types import (
    ListenV2CloseStream,
    ListenV2Connected,
    ListenV2FatalError,
    ListenV2TurnInfo,
)

with client.listen.v2.connect(
    model="flux-general-en",
    encoding="linear16",
    sample_rate="16000",
) as conn:

    def on_message(m):
        if isinstance(m, ListenV2TurnInfo):
            print(f"turn {m.turn_index} [{m.event}] {m.transcript}")
        elif isinstance(m, dict):                     # untyped fallback
            if m.get("type") == "TurnInfo":
                print(f"turn {m.get('turn_index')} [{m.get('event')}] {m.get('transcript')}")
        else:
            print(f"event: {getattr(m, 'type', type(m).__name__)}")

    conn.on(EventType.OPEN,    lambda _: print("open"))
    conn.on(EventType.MESSAGE, on_message)
    conn.on(EventType.CLOSE,   lambda _: print("close"))
    conn.on(EventType.ERROR,   lambda e: print(f"err: {type(e).__name__}: {e}"))

    def send_audio():
        for chunk in mic_chunks():                     # 80ms recommended
            conn.send_media(chunk)
            time.sleep(0.01)
        conn.send_close_stream(ListenV2CloseStream(type="CloseStream"))

    threading.Thread(target=send_audio, daemon=True).start()
    conn.start_listening()

Key parameters

Param	Notes
`model`	`flux-general-en` (English) or `flux-general-multi` (multilingual) — REQUIRED, must be a Flux model
`encoding`	`linear16`, `mulaw`, etc. Omit for containerized audio
`sample_rate`	String in the SDK signature, e.g. `"16000"`
`eager_eot_threshold`	Fire end-of-turn early at this confidence
`eot_threshold`	Primary end-of-turn confidence
`eot_timeout_ms`	Time-based fallback turn end
`keyterm`	Bias for domain keywords
`mip_opt_out`, `tag`	Metadata / privacy flags
`language_hint`	ONLY for `flux-general-multi`
`authorization`, `request_options`	Override auth or request options

No language parameter on v2 — language is implied by model (flux-general-en) or hinted via language_hint on multi.

Events (server → client)

ListenV2Connected — connection established
ListenV2ConfigureSuccess / ListenV2ConfigureFailure — mid-session config changes
ListenV2TurnInfo — per-turn transcript + event (Update, EndOfTurn, EagerEndOfTurn, ...) + turn_index
ListenV2FatalError — terminal error

Client messages: ListenV2Media, ListenV2Configure, ListenV2CloseStream.

Async equivalent

from deepgram import AsyncDeepgramClient
client = AsyncDeepgramClient()

async with client.listen.v2.connect(model="flux-general-en", ...) as conn:
    # same .on(...) handlers, then:
    await conn.start_listening()

API reference (layered)

In-repo reference: reference.md — "Listen V2 Connect".
AsyncAPI (WSS): https://developers.deepgram.com/asyncapi.yaml
Context7: library ID /llmstxt/developers_deepgram_llms_txt.
Product docs:

Gotchas

/v2/listen, not /v1/listen. Different route, different client path (listen.v2 vs listen.v1).
Flux models only. nova-3, base, etc. will be rejected. Use flux-general-en or flux-general-multi.
No language parameter. Language is set by model choice. Use language_hint on flux-general-multi.
sample_rate is a STRING in the SDK (e.g. "16000").
Send ~80ms audio chunks for best turn-detection latency.
Close with send_close_stream(ListenV2CloseStream(type="CloseStream")) — not send_finalize (that's v1).
Messages may arrive as typed objects OR raw dicts — the SDK uses a tagged union with construct_type for unknowns. Handle both branches (see socket_client.py patch in .fernignore).
socket_client.py is patched / frozen (see .fernignore → src/deepgram/listen/v2/socket_client.py). Don't overwrite that manual patch during regeneration; treat other listen/v2 files as generated unless the regen workflow says otherwise.
Omit encoding/sample_rate for containerized audio (WAV, OGG, etc.) — the server detects them from the container.

Example files in this repo

examples/14-transcription-live-websocket-v2.py
tests/manual/listen/v2/connect/main.py

Related skills

deepgram-python-speech-to-text — v1 general-purpose STT (REST + WSS)
deepgram-python-voice-agent — full interactive assistant

Central product skills

For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills:

npx skills add deepgram/skills

This SDK ships language-idiomatic code skills; deepgram/skills ships cross-language product knowledge (see api, docs, recipes, examples, starters, setup-mcp).

deepgram-python-conversational-stt

More from this repository

Using Deepgram Conversational STT / Flux (Python SDK)

When to use this product

Authentication

Quick start

Key parameters

Events (server → client)

Async equivalent

API reference (layered)

Gotchas

Example files in this repo

Related skills

Central product skills

Using Deepgram Conversational STT / Flux (Python SDK)

When to use this product

Authentication

Quick start

Key parameters

Events (server → client)

Async equivalent

API reference (layered)

Gotchas

Example files in this repo

Related skills

Central product skills

More from this repository