Run any Skill in Manus with one click

$pwd:

deepgram-python-voice-agent

Name: Deepgram Python Voice Agent
Author: deepgram

// Use when writing or reviewing Python code in this repo that builds an interactive voice agent via `agent.deepgram.com/v1/agent/converse`. Covers `client.agent.v1.connect()`, `AgentV1Settings`, `send_settings`, `send_media`, event handling, and function/tool calling. Full-duplex STT + LLM + TTS with barge-in. Use `deepgram-python-text-to-speech` for one-way synthesis, `deepgram-python-speech-to-text` / `deepgram-python-conversational-stt` for transcription only. Triggers include "voice agent", "agent converse", "full duplex", "interactive assistant", "barge-in", "agent.v1", "function calling", "AgentV1Settings".

Run Skill in Manus

$ git log --oneline --stat

stars:436

forks:131

updated:April 27, 2026 at 16:14

SKILL.md

readonly

related-skills.json

same repository

deepgram-python-audio-intelligence.md

from "deepgram/deepgram-python-sdk"

Use when writing or reviewing Python code in this repo that calls Deepgram audio analytics overlays on `/v1/listen` - summarize, topics, intents, sentiment, diarize, redact, detect_language, entity detection. Same endpoint as plain STT but with analytics params. Covers both REST (`client.listen.v1.media.transcribe_url`/`transcribe_file`) and the WSS-supported subset (`client.listen.v1.connect`). Use `deepgram-python-speech-to-text` for plain transcription, `deepgram-python-text-intelligence` for analytics on already-transcribed text. Triggers include "diarize", "summarize audio", "sentiment from audio", "redact PII", "topic detection audio", "audio intelligence", "detect language audio".

2026-04-27436

deepgram-python-speech-to-text.md

from "deepgram/deepgram-python-sdk"

Use when writing or reviewing Python code in this repo that calls Deepgram Speech-to-Text v1 (`/v1/listen`) for prerecorded or live audio transcription. Covers `client.listen.v1.media.transcribe_url` / `transcribe_file` (REST) and `client.listen.v1.connect` (WebSocket). Use this skill for basic ASR; use `deepgram-python-audio-intelligence` for summarize/sentiment/topics/diarize overlays, `deepgram-python-conversational-stt` for turn-taking v2/Flux, and `deepgram-python-voice-agent` for full-duplex assistants. Triggers include "transcribe", "live transcription", "speech to text", "STT", "listen endpoint", "nova-3", "listen.v1".

2026-04-27436

deepgram-python-conversational-stt.md

from "deepgram/deepgram-python-sdk"

Use when writing or reviewing Python code in this repo that calls Deepgram Conversational STT v2 / Flux (`/v2/listen`) for turn-aware streaming transcription. Covers `client.listen.v2.connect(...)`, Flux models, end-of-turn detection. Use `deepgram-python-speech-to-text` for standard v1 ASR, `deepgram-python-voice-agent` for full-duplex interactive assistants. Triggers include "flux", "v2 listen", "conversational STT", "turn detection", "end of turn", "EOT", "listen.v2", "flux-general-en", "flux-general-multi".

2026-04-27436

deepgram-python-management-api.md

from "deepgram/deepgram-python-sdk"

Use when writing or reviewing Python code in this repo that calls Deepgram Management APIs - projects, API keys, members, invites, usage, billing, models, and reusable Voice Agent configurations. Covers `client.manage.v1.projects`, project-scoped resources under `client.manage.v1.projects.*` (keys, members, members.invites, usage, billing, models, requests), global `client.manage.v1.models`, think-model discovery at `client.agent.v1.settings.think.models`, and `client.voice_agent.configurations.*`. Use `deepgram-python-voice-agent` when you want to run an agent interactively, this skill to PERSIST/LIST agent configs. Triggers include "management API", "list projects", "API keys", "members", "usage stats", "billing", "list models", "agent configurations", "manage.v1".

2026-04-27436

deepgram-python-text-intelligence.md

from "deepgram/deepgram-python-sdk"

Use when writing or reviewing Python code in this repo that calls Deepgram Text Intelligence / Read (`/v1/read`) for sentiment, summarization, topic detection, and intent recognition on text input. Covers `client.read.v1.text.analyze(...)` with body `text` or `url`. Use `deepgram-python-audio-intelligence` when the source is audio instead of text. Triggers include "read API", "text intelligence", "analyze text", "sentiment", "summarize text", "topics", "intents", "read.v1".

2026-04-27436

deepgram-python-text-to-speech.md

from "deepgram/deepgram-python-sdk"

Use when writing or reviewing Python code in this repo that calls Deepgram Text-to-Speech v1 (`/v1/speak`) for audio synthesis. Covers one-shot REST (`client.speak.v1.audio.generate`) and streaming WebSocket (`client.speak.v1.connect`). Also covers the in-repo `deepgram.helpers.TextBuilder` for incremental text assembly before synthesis. Use `deepgram-python-voice-agent` when you need full-duplex STT + LLM + TTS with barge-in. Triggers include "TTS", "speak", "synthesize voice", "aura", "text to speech", "speak.v1", "TextBuilder".

2026-04-27436

package.json

"author": "deepgram"

"repository": "deepgram/deepgram-python-sdk"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name

deepgram-python-voice-agent

description

Use when writing or reviewing Python code in this repo that builds an interactive voice agent via `agent.deepgram.com/v1/agent/converse`. Covers `client.agent.v1.connect()`, `AgentV1Settings`, `send_settings`, `send_media`, event handling, and function/tool calling. Full-duplex STT + LLM + TTS with barge-in. Use `deepgram-python-text-to-speech` for one-way synthesis, `deepgram-python-speech-to-text` / `deepgram-python-conversational-stt` for transcription only. Triggers include "voice agent", "agent converse", "full duplex", "interactive assistant", "barge-in", "agent.v1", "function calling", "AgentV1Settings".

Using Deepgram Voice Agent (Python SDK)

Full-duplex voice agent runtime: STT + LLM (think) + TTS + function calling over a single WebSocket at agent.deepgram.com/v1/agent/converse.

When to use this product

You want an interactive voice assistant: user speaks, agent thinks, agent speaks, interruptions allowed.
You want function / tool calling triggered by the conversation.
You want Deepgram to host the orchestration (vs wiring STT + LLM + TTS yourself).

Use a different skill when:

One-way transcription → deepgram-python-speech-to-text or deepgram-python-conversational-stt.
One-way synthesis → deepgram-python-text-to-speech.
Analytics on finished audio → deepgram-python-audio-intelligence.
Managing reusable agent configs (persisted on the server) → deepgram-python-management-api.

Authentication

from dotenv import load_dotenv
load_dotenv()

from deepgram import DeepgramClient
client = DeepgramClient()

Header: Authorization: Token <api_key>. Base URL: wss://agent.deepgram.com/v1/agent/converse.

Quick start

import threading, time
from deepgram.core.events import EventType
from deepgram.agent.v1.types import (
    AgentV1Settings,
    AgentV1SettingsAgent,
    AgentV1SettingsAgentListen,
    AgentV1SettingsAgentListenProvider_V1,
    AgentV1SettingsAudio,
    AgentV1SettingsAudioInput,
)
from deepgram.types.speak_settings_v1 import SpeakSettingsV1
from deepgram.types.speak_settings_v1provider import SpeakSettingsV1Provider_Deepgram
from deepgram.types.think_settings_v1 import ThinkSettingsV1
from deepgram.types.think_settings_v1provider import ThinkSettingsV1Provider_OpenAi

with client.agent.v1.connect() as agent:
    settings = AgentV1Settings(
        audio=AgentV1SettingsAudio(
            input=AgentV1SettingsAudioInput(encoding="linear16", sample_rate=24000),
        ),
        agent=AgentV1SettingsAgent(
            listen=AgentV1SettingsAgentListen(
                provider=AgentV1SettingsAgentListenProvider_V1(type="deepgram", model="nova-3"),
            ),
            think=ThinkSettingsV1(
                provider=ThinkSettingsV1Provider_OpenAi(
                    type="open_ai", model="gpt-4o-mini", temperature=0.7,
                ),
                prompt="You are a helpful assistant. Keep replies brief.",
            ),
            speak=SpeakSettingsV1(
                provider=SpeakSettingsV1Provider_Deepgram(type="deepgram", model="aura-2-asteria-en"),
            ),
        ),
    )

    agent.send_settings(settings)   # MUST be first message after connect

    def on_message(m):
        if isinstance(m, bytes):
            # agent speech audio — play or append to output buffer
            return
        t = getattr(m, "type", "Unknown")
        if t == "ConversationText":
            print(f"[{getattr(m, 'role', '?')}] {getattr(m, 'content', '')}")
        elif t == "UserStartedSpeaking":  print(">> user speaking")
        elif t == "AgentThinking":        print(">> agent thinking")
        elif t == "AgentStartedSpeaking": print(">> agent speaking")
        elif t == "AgentAudioDone":       print(">> agent done")
        elif t == "FunctionCallRequest":  handle_tool_call(m)

    agent.on(EventType.OPEN,    lambda _: print("open"))
    agent.on(EventType.MESSAGE, on_message)
    agent.on(EventType.CLOSE,   lambda _: print("close"))
    agent.on(EventType.ERROR,   lambda e: print(f"err: {e}"))

    def send_audio():
        for chunk in mic_chunks():
            agent.send_media(chunk)

    threading.Thread(target=send_audio, daemon=True).start()
    agent.start_listening()   # blocks

Event types (server → client)

Welcome — connection acknowledged
SettingsApplied — your Settings accepted
ConversationText — text of a turn (with role: user or assistant)
UserStartedSpeaking — VAD detected user
AgentThinking — LLM is working
FunctionCallRequest — tool/function call initiated by the model
AgentStartedSpeaking — TTS starting
Binary frames — audio chunks
AgentAudioDone — TTS finished for this turn
Warning, Error

Client messages

Initial Settings (send first)
Media (binary audio frames in declared encoding/sample_rate)
KeepAlive (on long sessions)
Prompt / think / speak update messages (change mid-session)
User / assistant text injection
Function call response (reply to FunctionCallRequest)

Reusable agent configurations

You can persist the agent block of a Settings message server-side and reuse it by agent_id. client.voice_agent.configurations.create stores a JSON string representing the agent object only (listen / think / speak providers + prompt) — NOT the full AgentV1Settings payload. Do not send top-level Settings fields like audio to that API; those still go in the live Settings message at connect time. The returned agent_id replaces the inline agent object in future Settings messages. Managed via client.voice_agent.configurations.* — see deepgram-python-management-api.

Dynamic mid-session adjustment

You can change agent behavior without disconnecting by sending control messages on the live socket. Each method is available on the agent connection object (agent in the quick-start) for both sync and async clients.

from deepgram.agent.v1.types import (
    AgentV1UpdatePrompt,
    AgentV1UpdateSpeak,
    AgentV1UpdateSpeakSpeak,        # type alias accepting SpeakSettingsV1 or list
    AgentV1UpdateThink,
    AgentV1UpdateThinkThink,        # type alias accepting ThinkSettingsV1 or list
    AgentV1InjectAgentMessage,
    AgentV1InjectUserMessage,
    AgentV1KeepAlive,
)
from deepgram.types.speak_settings_v1 import SpeakSettingsV1
from deepgram.types.speak_settings_v1provider import SpeakSettingsV1Provider_Deepgram
from deepgram.types.think_settings_v1 import ThinkSettingsV1
from deepgram.types.think_settings_v1provider import ThinkSettingsV1Provider_OpenAi

# 1. Swap the LLM system prompt mid-conversation (e.g. escalate to a different persona)
agent.send_update_prompt(
    AgentV1UpdatePrompt(prompt="You are now in expert escalation mode. Be precise and concise.")
)
# Server replies with a `PromptUpdated` event when the new prompt is in effect.

# 2. Swap the TTS voice without reconnecting (e.g. switch language or persona)
agent.send_update_speak(
    AgentV1UpdateSpeak(
        speak=SpeakSettingsV1(
            provider=SpeakSettingsV1Provider_Deepgram(
                type="deepgram", model="aura-2-luna-en",
            ),
        ),
    )
)
# Server replies with a `SpeakUpdated` event.

# 3. Swap the LLM provider/model (e.g. cheaper model for follow-ups)
agent.send_update_think(
    AgentV1UpdateThink(
        think=ThinkSettingsV1(
            provider=ThinkSettingsV1Provider_OpenAi(
                type="open_ai", model="gpt-4o-mini", temperature=0.3,
            ),
            prompt="You are a helpful assistant. Keep replies brief.",
        ),
    )
)
# Server replies with a `ThinkUpdated` event.

# 4. Force the agent to say something specific (without waiting for user audio)
agent.send_inject_agent_message(
    AgentV1InjectAgentMessage(message="Quick reminder: your call is being recorded.")
)
# Useful for proactive prompts, status updates, or scripted segues.

# 5. Inject a user message (e.g. text input from a chat sidebar alongside voice)
agent.send_inject_user_message(
    AgentV1InjectUserMessage(content="Schedule a follow-up for next Tuesday at 2pm.")
)
# Server may reply with `InjectionRefused` if the agent is mid-utterance — retry after `AgentAudioDone`.

# 6. Idle-period keep-alive (no payload required; the SDK fills in the type literal)
agent.send_keep_alive(AgentV1KeepAlive())
# Or simply: agent.send_keep_alive()  — the message arg is optional.

Async client equivalents are identical but await-prefixed:

await agent.send_update_prompt(AgentV1UpdatePrompt(prompt="..."))
await agent.send_inject_agent_message(AgentV1InjectAgentMessage(message="..."))

Stream lifecycle & recovery

Continuous voice agents need explicit handling for idle periods, stream pauses, and reconnects.

Pause / idle (no audio for several seconds): stop calling send_media, but emit a KeepAlive every ~5 seconds. Without it, the server closes the socket at ~10 seconds of idle.

import threading, time

stop = threading.Event()

def keepalive_loop():
    while not stop.is_set():
        if stop.wait(5):
            return
        try:
            agent.send_keep_alive()
        except Exception:
            return  # socket closed; outer loop will reconnect

threading.Thread(target=keepalive_loop, daemon=True).start()

Resume after pause: just call send_media again. No control message is required — the agent picks up VAD on the next chunk.

Reconnect after disconnect (preserve conversation context): Settings cannot be re-sent on the same closed socket; open a new connection and resend the same Settings. To carry conversation history forward, include it in the new Settings.agent.context.messages so the LLM resumes with prior turns:

from deepgram.agent.v1.types import (
    AgentV1SettingsAgentContext,
    AgentV1SettingsAgentContextMessagesItem,
    AgentV1SettingsAgentContextMessagesItemContent,
    AgentV1SettingsAgentContextMessagesItemContentRole,
)

# Build the new Settings with the captured prior turns
context = AgentV1SettingsAgentContext(
    messages=[
        AgentV1SettingsAgentContextMessagesItem(
            content=AgentV1SettingsAgentContextMessagesItemContent(
                role=AgentV1SettingsAgentContextMessagesItemContentRole.USER,
                content="Hi, I'd like to schedule a meeting.",
            ),
        ),
        AgentV1SettingsAgentContextMessagesItem(
            content=AgentV1SettingsAgentContextMessagesItemContent(
                role=AgentV1SettingsAgentContextMessagesItemContentRole.ASSISTANT,
                content="Sure — what day works best?",
            ),
        ),
    ],
)
new_settings = settings.model_copy(update={"agent": settings.agent.model_copy(update={"context": context})})

# Open a fresh connection and replay
with client.agent.v1.connect() as agent2:
    agent2.send_settings(new_settings)
    # ... same handlers + audio loop as before

The server emits a History message on connect when the SDK has captured prior turns; in Python you receive this as an AgentV1History object (wire type literal: "History"). Persist these turns in your application so a reconnect can rebuild context.messages.

Detect disconnects: the EventType.CLOSE handler fires before the with block exits. Catch it and trigger your reconnect logic from there. Check EventType.ERROR payloads for cause (network drop vs server-initiated close vs warning).

API reference (layered)

In-repo reference: reference.md — "Agent V1 Connect", "Voice Agent Configurations".
AsyncAPI (WSS): https://developers.deepgram.com/asyncapi.yaml
Context7: library ID /llmstxt/developers_deepgram_llms_txt.
Product docs:

Gotchas

Pick the right auth scheme for the credential type. API keys use Authorization: Token <api_key>. Temporary / access tokens (created via client.auth.v1.tokens.grant() or an equivalent server) use Authorization: Bearer <access_token>. The custom DeepgramClient in this repo accepts an access_token parameter and installs a Bearer override for all HTTP + WebSocket calls — see src/deepgram/client.py.
Base URL is agent.deepgram.com, not api.deepgram.com.
Send Settings IMMEDIATELY after connect — no audio before settings are applied.
Listen/speak encoding + sample_rate must match both your input audio and your playback path.
Keepalive on long idle sessions, otherwise the server closes.
Function call responses are synchronous to the turn — reply promptly.
Provider types are tagged unions (ThinkSettingsV1Provider_OpenAi, SpeakSettingsV1Provider_Deepgram, ...). Pick the right union variant; don't pass raw dicts.
socket_client.py is temporarily frozen (see .fernignore → src/deepgram/agent/v1/socket_client.py) and currently carries _sanitize_numeric_types plus the construct_type / broad-catch fixes — needed for unknown WS message shapes. Expected to be unfrozen during a future Fern regen and re-compared.

Example files in this repo

examples/30-voice-agent.py
tests/manual/agent/v1/connect/main.py — live connection test

Central product skills

For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills:

npx skills add deepgram/skills

This SDK ships language-idiomatic code skills; deepgram/skills ships cross-language product knowledge (see api, docs, recipes, examples, starters, setup-mcp).

deepgram-python-voice-agent

More from this repository

More from this repository

Using Deepgram Voice Agent (Python SDK)

When to use this product

Authentication

Quick start

Event types (server → client)

Client messages

Reusable agent configurations

Dynamic mid-session adjustment

Stream lifecycle & recovery

API reference (layered)

Gotchas

Example files in this repo

Central product skills

Using Deepgram Voice Agent (Python SDK)

When to use this product

Authentication

Quick start

Event types (server → client)

Client messages

Reusable agent configurations

Dynamic mid-session adjustment

Stream lifecycle & recovery

API reference (layered)

Gotchas

Example files in this repo

Central product skills