Run any Skill in Manus with one click

$pwd:

deepgram-js-speech-to-text

Name: Deepgram Js Speech To Text
Author: deepgram

// Use when writing or reviewing JavaScript/TypeScript in this repo that calls Deepgram Speech-to-Text v1 (`/v1/listen`) for prerecorded or live audio transcription. Covers `client.listen.v1.media.transcribeUrl` / `transcribeFile` (REST) plus `client.listen.v1.createConnection()` / `connect()` (WebSocket). Use `deepgram-js-audio-intelligence` for summarize/sentiment/topics/diarize overlays, `deepgram-js-conversational-stt` for Flux turn-taking on `/v2/listen`, and `deepgram-js-voice-agent` for full-duplex assistants. Triggers include "transcribe", "speech to text", "STT", "listen.v1", "nova-3", "live transcription", and "websocket transcription".

Run Skill in Manus

$ git log --oneline --stat

stars:260

forks:93

updated:April 27, 2026 at 09:39

SKILL.md

readonly

related-skills.json

same repository

deepgram-js-audio-intelligence.md

from "deepgram/deepgram-js-sdk"

Use when writing or reviewing JavaScript/TypeScript in this repo that calls Deepgram audio analytics overlays on `/v1/listen` - summarize, topics, intents, sentiment, diarize, redact, detect_language, and entity detection. Same endpoint as plain STT, different params. Covers REST via `client.listen.v1.media.transcribeUrl` / `transcribeFile` and the WebSocket-supported subset on `client.listen.v1.createConnection()` / `connect()`. Use `deepgram-js-speech-to-text` for plain transcription and `deepgram-js-text-intelligence` for analytics on already-transcribed text. Triggers include "audio intelligence", "summarize audio", "diarize", "sentiment from audio", "redact PII", and "detect language audio".

2026-04-27260

deepgram-js-conversational-stt.md

from "deepgram/deepgram-js-sdk"

Use when writing or reviewing JavaScript/TypeScript in this repo that calls Deepgram Conversational STT v2 / Flux (`/v2/listen`) for turn-aware streaming transcription. Covers `client.listen.v2.createConnection()` / `connect()`, Flux models, and turn events like `TurnInfo`. Use `deepgram-js-speech-to-text` for standard v1 ASR and `deepgram-js-voice-agent` for full-duplex assistants. Triggers include "flux", "v2 listen", "conversational STT", "turn detection", "end of turn", "EOT", and "listen.v2".

2026-04-27260

deepgram-js-management-api.md

from "deepgram/deepgram-js-sdk"

Use when writing or reviewing JavaScript/TypeScript in this repo that calls Deepgram Management APIs for projects, API keys, members, invites, requests, usage, billing, models, and agent think-model discovery. Covers `client.manage.v1.*` plus `client.agent.v1.settings.think.models.list()`. Use `deepgram-js-voice-agent` when you want to run an agent live rather than administer projects or inspect models. Triggers include "management API", "list projects", "API keys", "members", "invites", "usage stats", "billing", "list models", and "manage.v1".

2026-04-27260

deepgram-js-text-intelligence.md

from "deepgram/deepgram-js-sdk"

Use when writing or reviewing JavaScript/TypeScript in this repo that calls Deepgram Text Intelligence / Read (`/v1/read`) for sentiment, summarization, topic detection, and intent recognition on text input. Covers `client.read.v1.text.analyze(...)` with `body: { text }` or `body: { url }`. Use `deepgram-js-audio-intelligence` when the source is audio instead of text. Triggers include "read API", "text intelligence", "analyze text", "sentiment", "summarize text", "topics", "intents", and "read.v1".

2026-04-27260

deepgram-js-text-to-speech.md

from "deepgram/deepgram-js-sdk"

Use when writing or reviewing JavaScript/TypeScript in this repo that calls Deepgram Text-to-Speech v1 (`/v1/speak`) for audio synthesis. Covers one-shot REST via `client.speak.v1.audio.generate` and streaming WebSocket via `client.speak.v1.createConnection()` / `connect()`. Use `deepgram-js-voice-agent` when you need full-duplex STT + LLM + TTS instead of one-way synthesis. Triggers include "TTS", "text to speech", "speak", "aura", "streaming TTS", and "speak.v1".

2026-04-27260

deepgram-js-voice-agent.md

from "deepgram/deepgram-js-sdk"

Use when writing or reviewing JavaScript/TypeScript in this repo that builds an interactive voice agent via `agent.deepgram.com/v1/agent/converse`. Covers `client.agent.v1.createConnection()` / `connect()`, `sendSettings`, `sendMedia`, runtime updates, event handling, and function-call responses. Use `deepgram-js-text-to-speech` for one-way synthesis, `deepgram-js-speech-to-text` or `deepgram-js-conversational-stt` for transcription only, and `deepgram-js-management-api` for project/model admin rather than live agent runtime. Triggers include "voice agent", "agent converse", "full duplex", "barge-in", "function calling", and "agent.v1".

2026-04-27260

package.json

"author": "deepgram"

"repository": "deepgram/deepgram-js-sdk"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name

deepgram-js-speech-to-text

description

Use when writing or reviewing JavaScript/TypeScript in this repo that calls Deepgram Speech-to-Text v1 (`/v1/listen`) for prerecorded or live audio transcription. Covers `client.listen.v1.media.transcribeUrl` / `transcribeFile` (REST) plus `client.listen.v1.createConnection()` / `connect()` (WebSocket). Use `deepgram-js-audio-intelligence` for summarize/sentiment/topics/diarize overlays, `deepgram-js-conversational-stt` for Flux turn-taking on `/v2/listen`, and `deepgram-js-voice-agent` for full-duplex assistants. Triggers include "transcribe", "speech to text", "STT", "listen.v1", "nova-3", "live transcription", and "websocket transcription".

Using Deepgram Speech-to-Text (JavaScript / TypeScript SDK)

Basic transcription for prerecorded audio (REST) or live audio (WebSocket) via /v1/listen.

When to use this product

REST (client.listen.v1.media.transcribeUrl / transcribeFile) — one-shot transcription of a finished URL or file. Good for batch jobs, caption generation, offline processing.
WebSocket (client.listen.v1.createConnection() / connect()) — continuous streaming transcription. Good for live captions, microphone audio, telephony streams, browser or Node realtime apps.

Use a different skill when:

You also want summaries, topics, intents, sentiment, language detection, or redaction guidance on the same /v1/listen call → deepgram-js-audio-intelligence.
You need Flux turn-taking and end-of-turn events on /v2/listen → deepgram-js-conversational-stt.
You need a full interactive assistant with STT + LLM + TTS over one socket → deepgram-js-voice-agent.

Authentication

require("dotenv").config();

const { DeepgramClient } = require("@deepgram/sdk");

const deepgramClient = new DeepgramClient({
  apiKey: process.env.DEEPGRAM_API_KEY,
});

Use the exported DeepgramClient from src/CustomClient.ts, not DefaultDeepgramClient. The wrapper adds the required Token auth prefix, session headers, and patched WebSocket behavior.

Quick start — REST (prerecorded URL)

From examples/04-transcription-prerecorded-url.ts:

const data = await deepgramClient.listen.v1.media.transcribeUrl({
  url: "https://dpgr.am/spacewalk.wav",
  model: "nova-3",
  language: "en",
  punctuate: true,
  paragraphs: true,
  utterances: true,
});

console.log(
  "Transcription:",
  data.results?.channels?.[0]?.alternatives?.[0]?.transcript,
);

Quick start — REST (prerecorded file)

From examples/05-transcription-prerecorded-file.ts:

const { createReadStream } = require("fs");

const data = await deepgramClient.listen.v1.media.transcribeFile(
  createReadStream("./examples/spacewalk.wav"),
  {
    model: "nova-3",
    language: "en",
    punctuate: true,
    paragraphs: true,
    utterances: true,
    smart_format: true,
  }
);

transcribeFile(...) accepts multiple upload shapes in this SDK: fs.ReadStream, Buffer, ReadableStream, Blob, File, ArrayBuffer, and Uint8Array (see examples/23-file-upload-types.ts).

Quick start — WebSocket (live streaming)

From examples/07-transcription-live-websocket.ts:

const deepgramConnection = await deepgramClient.listen.v1.createConnection({
  model: "nova-3",
  language: "en",
  punctuate: "true",
  interim_results: "true",
});

deepgramConnection.on("message", (data) => {
  if (data.type === "Results") {
    console.log("Transcript:", data);
  }
});

deepgramConnection.connect();
await deepgramConnection.waitForOpen();

// Swap this for a mic capture (e.g. `node-microphone` / `MediaRecorder`)
// in real apps; the repo examples use `createReadStream` over a sample WAV.
const { createReadStream } = require("node:fs");
const audioStream = createReadStream("samples/spacewalk.wav");

audioStream.on("data", (chunk) => {
  deepgramConnection.sendMedia(chunk);
});

audioStream.on("end", () => {
  deepgramConnection.sendFinalize({ type: "Finalize" });
});

The repo examples use the two-step socket flow: createConnection() → register handlers → connect() → waitForOpen().

Key parameters / API surface

REST: model, language, punctuate, smart_format, paragraphs, utterances, multichannel, numerals, search, keyterm, keywords, encoding, sample_rate, callback, tag.
WSS connect args (src/api/resources/listen/resources/v1/client/Client.ts): model is required; common realtime flags include language, interim_results, endpointing, utterance_end_ms, vad_events, encoding, sample_rate, multichannel, punctuate, smart_format.
WSS client messages (src/api/resources/listen/resources/v1/client/Socket.ts): sendMedia(...), sendFinalize(...), sendCloseStream(...), sendKeepAlive(...).
WSS server events: Results, Metadata, UtteranceEnd, SpeechStarted.

API reference (layered)

In-repo reference: reference.md → Listen V1 Media for REST; WSS behavior lives in src/CustomClient.ts and src/api/resources/listen/resources/v1/client/{Client,Socket}.ts.
Canonical OpenAPI (REST): https://developers.deepgram.com/openapi.yaml
Canonical AsyncAPI (WSS): https://developers.deepgram.com/asyncapi.yaml
Context7: library ID /llmstxt/developers_deepgram_llms_txt
Product docs:
- https://developers.deepgram.com/reference/speech-to-text/listen-pre-recorded
- https://developers.deepgram.com/reference/speech-to-text/listen-streaming

Gotchas

Use DeepgramClient, not DefaultDeepgramClient. The custom wrapper adds Token auth, session IDs, browser WS auth protocols, and patched sockets.
Repo examples are two-stage for WSS. createConnection() does not open the socket; call connect() and usually waitForOpen().
Finalize before closing v1 streams. sendFinalize({ type: "Finalize" }) flushes the final partial.
Keep idle streams alive. Use audio or sendKeepAlive({ type: "KeepAlive" }) on long pauses.
Raw audio metadata must match reality. If you send PCM, encoding and sample_rate must match the bytes.
Browser auth differs from Node auth. In browsers, the wrapper moves auth/session info into WebSocket subprotocols because custom headers are unavailable.
Use /v2/listen only for Flux. If you need turn-aware conversational STT, switch skills instead of forcing v1.

Example files in this repo

examples/04-transcription-prerecorded-url.ts
examples/05-transcription-prerecorded-file.ts
examples/06-transcription-prerecorded-callback.ts
examples/07-transcription-live-websocket.ts
examples/08-transcription-captions.ts
examples/23-file-upload-types.ts
examples/27-deepgram-session-header.ts

Central product skills

For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills:

npx skills add deepgram/skills

This SDK ships language-idiomatic code skills; deepgram/skills ships cross-language product knowledge (see api, docs, recipes, examples, starters, setup-mcp).

deepgram-js-speech-to-text

More from this repository

More from this repository

Using Deepgram Speech-to-Text (JavaScript / TypeScript SDK)

When to use this product

Authentication

Quick start — REST (prerecorded URL)

Quick start — REST (prerecorded file)

Quick start — WebSocket (live streaming)

Key parameters / API surface

API reference (layered)

Gotchas

Example files in this repo

Central product skills

Using Deepgram Speech-to-Text (JavaScript / TypeScript SDK)

When to use this product

Authentication

Quick start — REST (prerecorded URL)

Quick start — REST (prerecorded file)

Quick start — WebSocket (live streaming)

Key parameters / API surface

API reference (layered)

Gotchas

Example files in this repo

Central product skills