Run any Skill in Manus with one click

$pwd:

api

Name: Api
Author: deepgram

// Deepgram API reference for speech-to-text, text-to-speech, voice agents, audio intelligence, and account management. Use whenever building with Deepgram APIs — REST or WebSocket. Covers authentication, all endpoints, query parameters, request/response schemas, and WebSocket message formats. Reference files are organized by domain: listen (STT), speak (TTS), agent (voice agents), read (text/audio intelligence), models, projects, auth, and self-hosted.

Run Skill in Manus

$ git log --oneline --stat

stars:8

forks:1

updated:April 24, 2026 at 14:58

File Explorer

9 files

SKILL.md

readonly

related-skills.json

same repository

docs.md

from "deepgram/skills"

Find the right Deepgram documentation for any task. Use whenever someone needs help locating docs, understanding which API to use, or wants to ask questions about Deepgram. Covers all product areas: speech-to-text, text-to-speech, voice agents, audio intelligence, and self-hosted deployments.

2026-04-248

examples.md

from "deepgram/skills"

Find working Deepgram integration examples with third-party platforms and frameworks. Use whenever someone wants to integrate Deepgram with Twilio, LiveKit, LangChain, Vercel AI SDK, Discord, Vonage, Pipecat, Expo, FastAPI, Cloudflare Workers, Slack, Telegram, LlamaIndex, Zoom, Next.js, Nuxt, Django, SvelteKit, NestJS, Spring Boot, CrewAI, Riverside, SignalWire, and more. Examples are full runnable integration demos, not minimal feature snippets.

2026-04-248

recipes.md

from "deepgram/skills"

Find focused, runnable Deepgram recipes for a specific feature × language. Use whenever someone wants a minimal working code snippet for ONE feature (transcribe URL, diarize, smart-format, voice agent connect, etc.) rather than a full starter app. Recipes are under 50 lines, read DEEPGRAM_API_KEY from env, and ship with a runnable example_test. Covers Python, JavaScript, Go, .NET, Java, Rust, and the Deepgram CLI.

2026-04-248

starters.md

from "deepgram/skills"

Clone a ready-to-run Deepgram demo app and start building on top of it. Use whenever someone wants a quick working demo, needs to prototype with Deepgram, or is starting a new project that uses speech-to-text, text-to-speech, voice agents, audio intelligence, or live streaming. Match the user's language, framework, and desired Deepgram feature to the right starter.

2026-04-248

template-skill.md

from "deepgram/skills"

Replace with description of the skill and when to use it.

2026-04-248

setup-mcp.md

from "deepgram/skills"

Set up the Deepgram MCP server for your AI coding tool. Checks whether the Deepgram CLI (dg/deepctl) is installed: if so, uses the local CLI MCP server (dg mcp) for full tool access; otherwise offers the hosted documentation MCP or suggests installing the CLI. Use whenever someone wants to install Deepgram's agentic tools, set up the MCP server, or connect their editor to Deepgram.

2026-03-318

package.json

"author": "deepgram"

"repository": "deepgram/skills"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name

api

description

Deepgram API reference for speech-to-text, text-to-speech, voice agents, audio intelligence, and account management. Use whenever building with Deepgram APIs — REST or WebSocket. Covers authentication, all endpoints, query parameters, request/response schemas, and WebSocket message formats. Reference files are organized by domain: listen (STT), speak (TTS), agent (voice agents), read (text/audio intelligence), models, projects, auth, and self-hosted.

Deepgram API

Build with Deepgram's speech-to-text, text-to-speech, voice agent, and audio intelligence APIs.

Getting Started

All API requests require authentication via API key or JWT:

API Key: Authorization: Token <API_KEY>
JWT: Authorization: Bearer <JWT>

Base servers:

REST & STT/TTS WebSocket: https://api.deepgram.com
Voice Agent WebSocket: https://agent.deepgram.com

How Deepgram's APIs Fit Together

                   ┌──────────────────────────────┐
                   │       api.deepgram.com        │
                   └──────────────────────────────┘
                                │
  ┌──────────────┬──────────────┼──────────────┬──────────────┐
  ▼              ▼              ▼              ▼              ▼
/v1/listen   /v2/listen     /v1/speak      /v1/read    /v1/projects/*
 Nova — ASR   Flux — conv.   TTS            Text AI     Management
REST or WSS   WSS only       REST or WSS    REST only   REST only

                   ┌──────────────────────────────┐
                   │      agent.deepgram.com       │
                   └──────────────────────────────┘
                                │
                                ▼
                   /v1/agent/converse
                   WebSocket only
                   audio ──▶ STT ──▶ LLM ──▶ TTS ──▶ audio
                   (Deepgram orchestrates the full pipeline)

Which API Should I Use?

Audio → text (transcription)?
├─ General-purpose transcription (captions, batch, call logs, live streams with custom turn logic)
│  └─ Nova models via /v1/listen
│     ├─ Pre-recorded file    →  REST  POST https://api.deepgram.com/v1/listen?model=nova-3
│     └─ Live stream          →  WSS   wss://api.deepgram.com/v1/listen?model=nova-3
│
└─ Conversational audio / voice-agent-style turn detection
   └─ Flux models via /v2/listen
      └─ Live stream          →  WSS   wss://api.deepgram.com/v2/listen?model=flux-general-en

Text → audio?
├─ One-shot                   →  REST POST /v1/speak
└─ Low-latency stream         →  WSS  wss://api.deepgram.com/v1/speak

Full conversational voice agent (audio in, audio out)?
└─ WSS wss://agent.deepgram.com/v1/agent/converse
   Deepgram handles STT + your configured LLM + TTS internally

Analyze text for insights?
└─ REST POST /v1/read
   (summaries, sentiment, topics, intents)

Speech-to-Text: Nova (`/v1/listen`) vs Flux (`/v2/listen`)

Both model families are actively maintained and industry-leading. They solve different problems — pick the one that matches your use case.

	Nova (`/v1/listen`)	Flux (`/v2/listen`)
Endpoint	`/v1/listen`	`/v2/listen`
Available models	`nova-3`, `nova-2`, `nova`, `enhanced`, `base`	`flux-general-en`
Best for	General transcription — captions, subtitles, call logs, batch	Conversational audio — voice agents, interactive assistants, turn-taking UIs
Output	Continuous transcript stream	Structured turn events + transcripts (built-in turn state machine)
Turn detection	Manual (`utterance_end_ms`, VAD events)	Built-in (EOT, eager-EOT, turn_index)
Transports	REST + WebSocket	WebSocket only
Intelligence overlays	Yes — `summarize`, `sentiment`, `topics`, `intents`, `diarize`, `redact`, etc.	No — smaller focused param set; no `smart_format` / `diarize` / `punctuate`
Mid-session reconfig	No (reconnect to change)	Yes (`Configure` message updates EOT thresholds + keyterms live)

Pick Nova (/v1/listen, model=nova-3) when:

Generating captions, subtitles, or transcripts for recorded media
Running batch transcription over files (REST)
You need analytics overlays (summarize, sentiment, topics, intents, diarize, redact)
You want WebSocket streaming with your own turn-detection logic

Pick Flux (/v2/listen, model=flux-general-en) when:

Building an interactive voice agent or assistant
You want end-of-turn detection handled for you
You need low-latency turn signals and barge-in support
You want to update EOT thresholds or keyterms mid-session without reconnecting

Migrating from Nova 3 to Flux? See the official Nova 3 → Flux migration guide.

API Domains

Domain	REST	WebSocket	Reference
Listen v1 — STT, Nova models	`POST /v1/listen`	`wss://api.deepgram.com/v1/listen`	listen.md
Listen v2 — STT, Flux (conversational)	—	`wss://api.deepgram.com/v2/listen`	listen.md
Speak (TTS)	`POST /v1/speak`	`wss://api.deepgram.com/v1/speak`	speak.md
Voice Agent	`GET /v1/agent/settings/think/models`	`wss://agent.deepgram.com/v1/agent/converse`	agent.md
Read (Intelligence)	`POST /v1/read`	—	read.md
Models	`GET /v1/models`	—	models.md
Projects	`/v1/projects/*`	—	projects.md
Auth	`POST /v1/auth/grant`	—	auth.md
Self-Hosted	`/v1/projects//selfhosted/`	—	self-hosted.md

Common Mistakes to Avoid

All APIs

Feature flags are query params — except for Voice Agent and Flux mid-session updates. For /v1/listen, /v2/listen, and /v1/speak, initial options go on the URL. The request body carries only audio data (REST) or audio frames (WebSocket). Two exceptions: /v1/agent/converse has no URL query params at all (all config goes in the Settings message); and /v2/listen supports a Configure message after connection to update EOT thresholds and keyterms mid-session. Also note that /v2/listen has a much smaller param set than /v1/listen — flags like smart_format, diarize, and punctuate are not available.
Rate limits are concurrent connections, not total requests. A 429 means too many simultaneous open connections, not too high a request volume. Diarization and other compute-heavy features reduce your concurrency allowance further.

STT WebSocket (`/v1/listen`)

Send KeepAlive as a text frame, not binary. The connection closes after 10 seconds of no audio. Send {"type":"KeepAlive"} as a text (JSON) frame every 3–5 seconds during silence. Sending it as a binary frame causes transcription delays — the audio pipeline chokes — not a silent no-op.
Never send empty byte payloads. Sending a zero-length binary frame to /v1/listen is treated as a close — it terminates the connection. Always check that your audio packet has length before sending.
encoding must match the actual audio format. If encoding=linear16 but you're sending opus, you'll get a DATA-0000 error or garbled output. Omit encoding entirely when sending containerized formats (mp3, wav, ogg) — Deepgram detects them automatically.
Timestamps reset on reconnect. Each new WebSocket connection restarts timestamps at 00:00:00. For real-time apps, maintain a timestamp offset across reconnections or you'll silently corrupt your transcript timeline.

TTS WebSocket (`/v1/speak`)

Don't send empty text. A Speak message with an empty text field returns a 400 error. Always validate input before sending.
Character rate limiting (DATA-0001) means slow down, not retry. If you hit this, reduce how fast you're submitting text chunks — don't immediately retry or you'll compound the problem.

Voice Agent (`/v1/agent/converse`)

Send the Settings message before any audio. The agent ignores everything until it receives and acknowledges the Settings configuration. Message ordering is strictly required.

Flux model

Use /v2/listen and model=flux-general-en. /v1/listen does not support Flux. model=flux alone is not a valid value. Do not include language or encoding params for containerized audio.
Use Configure to update EOT thresholds and keyterms mid-session. Unlike /v1/listen, Flux supports live reconfiguration after connection — no need to reconnect to change turn detection sensitivity or boost new keyterms:
```
{ "type": "Configure", "thresholds": { "eot_threshold": "0.8", "eot_timeout_ms": "3000" }, "keyterms": ["Deepgram"] }
```
The server responds with ConfigureSuccess (echoing back applied values) or ConfigureFailure. Omitted threshold fields keep their current values.

Authentication

JWT TTL applies only to the initial handshake. Tokens default to 30 seconds. Once the WebSocket connection is established, the token expiring does not close it — tokens are only needed for the upgrade request.

SDK-Specific Skills

This api skill covers the product contracts (endpoints, query params, message shapes) that are identical across SDKs. For language-idiomatic code — imports, async patterns, builder APIs, common errors — install the SDK-specific skills. Each Deepgram SDK publishes 7 product skills named deepgram-{lang}-{product} (e.g. deepgram-python-speech-to-text, deepgram-js-voice-agent) plus a maintainer skill deepgram-{lang}-maintaining-sdk. The deepgram-{lang}- prefix avoids collisions when you install skills from multiple SDKs.

# Install all skills from a specific SDK
npx skills add deepgram/deepgram-python-sdk     # Python
npx skills add deepgram/deepgram-js-sdk         # JavaScript / TypeScript
npx skills add deepgram/deepgram-java-sdk       # Java
npx skills add deepgram/deepgram-go-sdk         # Go
npx skills add deepgram/deepgram-rust-sdk       # Rust
npx skills add deepgram/deepgram-swift-sdk      # Swift
npx skills add deepgram/deepgram-kotlin-sdk     # Kotlin
npx skills add deepgram/deepgram-dotnet-sdk     # C# / .NET
npx skills add deepgram/deepgram-browser-sdk    # Browser TypeScript

# Or install a specific product skill from one SDK (note the deepgram-{lang}- prefix)
npx skills add deepgram/deepgram-python-sdk --skill deepgram-python-speech-to-text
npx skills add deepgram/deepgram-js-sdk     --skill deepgram-js-voice-agent

Related Deepgram skills

Skill	Purpose
`recipes`	Minimal runnable snippets per feature per language
`examples`	Full integration examples with third-party platforms (Twilio, LiveKit, etc.)
`starters`	Runnable starter apps (framework × feature matrix)
`docs`	Navigate Deepgram documentation
`setup-mcp`	Install the Deepgram MCP server

api

Deepgram API

Getting Started

How Deepgram's APIs Fit Together

Which API Should I Use?

Speech-to-Text: Nova (`/v1/listen`) vs Flux (`/v2/listen`)

API Domains

Common Mistakes to Avoid

All APIs

STT WebSocket (`/v1/listen`)

TTS WebSocket (`/v1/speak`)

Voice Agent (`/v1/agent/converse`)

Flux model

Authentication

SDK-Specific Skills

Related Deepgram skills

Documentation

Deepgram API

Getting Started

How Deepgram's APIs Fit Together

Which API Should I Use?

Speech-to-Text: Nova (`/v1/listen`) vs Flux (`/v2/listen`)

API Domains

Common Mistakes to Avoid

All APIs

STT WebSocket (`/v1/listen`)

TTS WebSocket (`/v1/speak`)

Voice Agent (`/v1/agent/converse`)

Flux model

Authentication

SDK-Specific Skills

Related Deepgram skills

Documentation

api

More from this repository

More from this repository

Deepgram API

Getting Started

How Deepgram's APIs Fit Together

Which API Should I Use?

Speech-to-Text: Nova (/v1/listen) vs Flux (/v2/listen)

API Domains

Common Mistakes to Avoid

All APIs

STT WebSocket (/v1/listen)

TTS WebSocket (/v1/speak)

Voice Agent (/v1/agent/converse)

Flux model

Authentication

SDK-Specific Skills

Related Deepgram skills

Documentation

Deepgram API

Getting Started

How Deepgram's APIs Fit Together

Which API Should I Use?

Speech-to-Text: Nova (/v1/listen) vs Flux (/v2/listen)

API Domains

Common Mistakes to Avoid

All APIs

STT WebSocket (/v1/listen)

TTS WebSocket (/v1/speak)

Voice Agent (/v1/agent/converse)

Flux model

Authentication

SDK-Specific Skills

Related Deepgram skills

Documentation

Speech-to-Text: Nova (`/v1/listen`) vs Flux (`/v2/listen`)

STT WebSocket (`/v1/listen`)

TTS WebSocket (`/v1/speak`)

Voice Agent (`/v1/agent/converse`)

Speech-to-Text: Nova (`/v1/listen`) vs Flux (`/v2/listen`)

STT WebSocket (`/v1/listen`)

TTS WebSocket (`/v1/speak`)

Voice Agent (`/v1/agent/converse`)