Run any Skill in Manus with one click

$pwd:

deepgram-java-speech-to-text

Name: Deepgram Java Speech To Text
Author: deepgram

// Use when writing or reviewing Java code in this repo that calls Deepgram Speech-to-Text v1 (`/v1/listen`) for prerecorded or live transcription. Covers `client.listen().v1().media().transcribeUrl` / `transcribeFile` (REST) and `client.listen().v1().v1WebSocket()` (WebSocket). Use `deepgram-java-audio-intelligence` for analytics overlays, `deepgram-java-conversational-stt` for Flux `/v2/listen`, and `deepgram-java-voice-agent` for full-duplex assistants. Triggers include "transcribe", "speech to text", "STT", "listen v1", "nova-3", "live transcription", and "websocket transcription".

Run Skill in Manus

$ git log --oneline --stat

stars:4

forks:3

updated:April 24, 2026 at 15:45

SKILL.md

readonly

related-skills.json

same repository

deepgram-java-voice-agent.md

from "deepgram/deepgram-java-sdk"

Use when writing or reviewing Java code in this repo that builds an interactive voice agent over `agent.deepgram.com/v1/agent/converse`. Covers `client.agent().v1().v1WebSocket()`, `AgentV1Settings`, `sendSettings`, `sendMedia`, event handlers, provider configuration, and message injection. Use `deepgram-java-text-to-speech` for one-way synthesis or the STT skills for transcription-only flows. Triggers include "voice agent", "agent converse", "full duplex", "barge in", "function call", and "agent websocket".

2026-04-244

deepgram-java-management-api.md

from "deepgram/deepgram-java-sdk"

Use when writing or reviewing Java code in this repo that calls Deepgram Management APIs for projects, project models, API keys, members, invites, usage, and billing. Covers `client.manage().v1().*` plus related think-model discovery under `client.agent().v1().settings().think().models()`. Use `deepgram-java-voice-agent` for live agent conversations instead of admin APIs. Triggers include "management api", "list projects", "api keys", "members", "invites", "usage", "billing", and "models".

2026-04-244

deepgram-java-text-to-speech.md

from "deepgram/deepgram-java-sdk"

Use when writing or reviewing Java code in this repo that calls Deepgram Text-to-Speech v1 (`/v1/speak`) for audio synthesis. Covers one-shot REST via `client.speak().v1().audio().generate(...)` and streaming synthesis via `client.speak().v1().v1WebSocket()`. Use `deepgram-java-voice-agent` for full-duplex assistants instead of one-way synthesis. Triggers include "tts", "text to speech", "speak", "aura", "streaming tts", and "speak websocket".

2026-04-244

deepgram-java-audio-intelligence.md

from "deepgram/deepgram-java-sdk"

Use when writing or reviewing Java code in this repo that enables Deepgram intelligence overlays on `/v1/listen` audio transcription - diarization, entity detection, sentiment, summarize, topics, intents, language detection, and redaction. Same endpoint as plain STT, but with extra request fields on `ListenV1RequestUrl` or `MediaTranscribeRequestOctetStream`. Use `deepgram-java-speech-to-text` for plain transcripts and `deepgram-java-text-intelligence` for analysis on existing text. Triggers include "audio intelligence", "diarize", "summarize audio", "sentiment from audio", "topic detection", and "redact".

2026-04-244

deepgram-java-conversational-stt.md

from "deepgram/deepgram-java-sdk"

Use when writing or reviewing Java code in this repo that calls Deepgram Conversational STT v2 / Flux over `/v2/listen`. Covers `client.listen().v2().v2WebSocket()`, `V2ConnectOptions`, `onTurnInfo`, and turn-aware close handling. Use `deepgram-java-speech-to-text` for standard v1 transcription and `deepgram-java-voice-agent` for fully interactive assistants. Triggers include "flux", "conversational stt", "listen v2", "turn detection", "end of turn", and "eot".

2026-04-244

deepgram-java-text-intelligence.md

from "deepgram/deepgram-java-sdk"

Use when writing or reviewing Java code in this repo that calls Deepgram Text Intelligence / Read (`/v1/read`) for text analysis. Covers `client.read().v1().text().analyze(...)` with `ReadV1Request` or `TextAnalyzeRequest`. Use `deepgram-java-audio-intelligence` when the source is audio instead of text. Triggers include "read api", "text intelligence", "analyze text", "sentiment", "topics", "intents", and "summarize text".

2026-04-244

package.json

"author": "deepgram"

"repository": "deepgram/deepgram-java-sdk"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name

deepgram-java-speech-to-text

description

Use when writing or reviewing Java code in this repo that calls Deepgram Speech-to-Text v1 (`/v1/listen`) for prerecorded or live transcription. Covers `client.listen().v1().media().transcribeUrl` / `transcribeFile` (REST) and `client.listen().v1().v1WebSocket()` (WebSocket). Use `deepgram-java-audio-intelligence` for analytics overlays, `deepgram-java-conversational-stt` for Flux `/v2/listen`, and `deepgram-java-voice-agent` for full-duplex assistants. Triggers include "transcribe", "speech to text", "STT", "listen v1", "nova-3", "live transcription", and "websocket transcription".

Using Deepgram Speech-to-Text (Java SDK)

Basic transcription for prerecorded audio over REST or live audio over WebSocket via /v1/listen.

When to use this product

REST (media().transcribeUrl / transcribeFile) — one-shot transcription of a complete URL or byte array.
WebSocket (v1WebSocket()) — live streaming transcription with interim/final results.

Use a different skill when:

You want summaries, sentiment, topics, intents, diarization, redaction, or language detection overlays on the same endpoint → deepgram-java-audio-intelligence.
You need turn-aware conversational streaming on /v2/listen → deepgram-java-conversational-stt.
You need a full interactive assistant with TTS + LLM orchestration → deepgram-java-voice-agent.

Authentication

Gradle

implementation 'com.deepgram:deepgram-java-sdk:0.2.1'

Maven

<dependency>
  <groupId>com.deepgram</groupId>
  <artifactId>deepgram-java-sdk</artifactId>
  <version>0.2.1</version>
</dependency>

import com.deepgram.DeepgramClient;

DeepgramClient client = DeepgramClient.builder()
        .apiKey(System.getenv("DEEPGRAM_API_KEY"))
        .build();

Default API-key auth sends Authorization: Token <apiKey>. accessToken(...) switches to Bearer.

Quick start — REST (URL)

import com.deepgram.resources.listen.v1.media.requests.ListenV1RequestUrl;
import com.deepgram.resources.listen.v1.media.types.MediaTranscribeRequestModel;
import com.deepgram.resources.listen.v1.media.types.MediaTranscribeResponse;
import com.deepgram.types.ListenV1Response;

ListenV1RequestUrl request = ListenV1RequestUrl.builder()
        .url("https://static.deepgram.com/examples/Bueller-Life-moves-pretty-fast.wav")
        .model(MediaTranscribeRequestModel.NOVA3)
        .smartFormat(true)
        .build();

MediaTranscribeResponse result = client.listen().v1().media().transcribeUrl(request);

result.visit(new MediaTranscribeResponse.Visitor<Void>() {
    @Override
    public Void visit(ListenV1Response response) {
        // Guard channels + alternatives against empty results (matches examples/listen/TranscribeUrl.java).
        String transcript = "";
        java.util.List<?> channels = response.getResults().getChannels();
        if (channels != null && !channels.isEmpty()) {
            java.util.List<?> alternatives = response.getResults()
                    .getChannels().get(0)
                    .getAlternatives().orElse(java.util.Collections.emptyList());
            if (!alternatives.isEmpty()) {
                transcript = response.getResults()
                        .getChannels().get(0)
                        .getAlternatives().orElse(java.util.Collections.emptyList())
                        .get(0)
                        .getTranscript().orElse("");
            }
        }
        System.out.println(transcript);
        return null;
    }

    @Override
    public Void visit(com.deepgram.types.ListenV1AcceptedResponse accepted) {
        System.out.println("Request accepted: " + accepted.getRequestId());
        return null;
    }
});

Quick start — REST (file bytes)

import com.deepgram.resources.listen.v1.media.requests.MediaTranscribeRequestOctetStream;
import com.deepgram.resources.listen.v1.media.types.MediaTranscribeRequestModel;

byte[] audioBytes = java.nio.file.Files.readAllBytes(java.nio.file.Paths.get("audio.wav"));

MediaTranscribeRequestOctetStream request = MediaTranscribeRequestOctetStream.builder()
        .body(audioBytes)
        .model(MediaTranscribeRequestModel.NOVA3)
        .smartFormat(true)
        .build();

MediaTranscribeResponse result = client.listen().v1().media().transcribeFile(request);

transcribeFile(...) accepts either raw byte[] or a full MediaTranscribeRequestOctetStream request object.

Quick start — WebSocket (live streaming)

import com.deepgram.resources.listen.v1.types.ListenV1CloseStream;
import com.deepgram.resources.listen.v1.types.ListenV1CloseStreamType;
import com.deepgram.resources.listen.v1.websocket.V1ConnectOptions;
import com.deepgram.resources.listen.v1.websocket.V1WebSocketClient;
import com.deepgram.types.ListenV1Model;
import java.util.concurrent.TimeUnit;

V1WebSocketClient wsClient = client.listen().v1().v1WebSocket();

wsClient.onResults(result -> {
    if (result.getChannel() != null
            && result.getChannel().getAlternatives() != null
            && !result.getChannel().getAlternatives().isEmpty()) {
        String transcript = result.getChannel().getAlternatives().get(0).getTranscript();
        boolean isFinal = result.getIsFinal().orElse(false);
        System.out.printf("%s %s%n", isFinal ? "[final]" : "[interim]", transcript);
    }
});

wsClient.connect(V1ConnectOptions.builder().model(ListenV1Model.NOVA3).build())
        .get(10, TimeUnit.SECONDS);

// send raw audio chunks here
// wsClient.sendMedia(okio.ByteString.of(audioChunk));

wsClient.sendCloseStream(ListenV1CloseStream.builder()
        .type(ListenV1CloseStreamType.CLOSE_STREAM)
        .build());

Async equivalents

import com.deepgram.AsyncDeepgramClient;
import java.util.concurrent.CompletableFuture;

AsyncDeepgramClient asyncClient = AsyncDeepgramClient.builder()
        .apiKey(System.getenv("DEEPGRAM_API_KEY"))
        .build();

CompletableFuture<com.deepgram.resources.listen.v1.media.types.MediaTranscribeResponse> future =
        asyncClient.listen().v1().media().transcribeUrl(request);

The async REST clients return CompletableFuture<T>. WebSocket clients are already asynchronous and also return CompletableFuture<Void> from connect(...) and send methods.

Key parameters / API surface

REST request builders: ListenV1RequestUrl.builder() and MediaTranscribeRequestOctetStream.builder()
Common REST params verified in source: model, language, encoding, smartFormat, punctuate, diarize, detectEntities, multichannel, numerals, paragraphs, utterances, keywords, keyterm, replace, search, mipOptOut, tag, callback
REST methods: transcribeUrl(...), transcribeFile(byte[]), transcribeFile(MediaTranscribeRequestOctetStream)
WSS connect options: model, encoding, sampleRate, endpointing, interimResults, vadEvents, utteranceEndMs, diarize, detectEntities, redact, keywords, keyterm, language
WSS send methods: sendMedia(...), sendFinalize(...), sendKeepAlive(...), sendCloseStream(...)
WSS handlers: onResults, onMetadata, onUtteranceEnd, onSpeechStarted, plus generic onConnected, onDisconnected, onError, onMessage
REST responses are a union: ListenV1Response or ListenV1AcceptedResponse, handled via MediaTranscribeResponse.Visitor

API reference (layered)

In-repo source of truth: generated clients and request/response models under src/main/java/com/deepgram/resources/listen/v1/ plus examples under examples/listen/. This checkout does not include reference.md.
Canonical OpenAPI (REST): https://developers.deepgram.com/openapi.yaml
Canonical AsyncAPI (WSS): https://developers.deepgram.com/asyncapi.yaml
Context7: /llmstxt/developers_deepgram_llms_txt
Product docs:
- https://developers.deepgram.com/reference/speech-to-text/listen-pre-recorded
- https://developers.deepgram.com/reference/speech-to-text/listen-streaming

Gotchas

API key auth is Token, not Bearer. Bearer only happens when you use accessToken(...).
REST responses are a union. Handle both ListenV1Response and ListenV1AcceptedResponse with the visitor.
transcribeFile(byte[]) reads the whole file into memory. Use the request builder only when you need extra params.
The Java REST request currently exposes redact as a single String. Do not assume Python-style list support in this checkout.
WebSocket audio must match declared encoding/sample rate. If you set encoding, the bytes must actually match it.
Live sessions should end explicitly. Use sendFinalize(...) or sendCloseStream(...); otherwise trailing audio can be lost.
WebSocket handlers should be registered before connect(...). The examples do this consistently.
V1WebSocketClient is async already. Wait on connect(...).get(...) before sending audio.

Example files in this repo

examples/listen/TranscribeUrl.java
examples/listen/FileUploadTypes.java
examples/listen/AdvancedOptions.java
examples/listen/LiveStreaming.java
examples/listen/TranscribeCallback.java
examples/listen/Captions.java

Central product skills

For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills:

npx skills add deepgram/skills

This SDK ships language-idiomatic code skills; deepgram/skills ships cross-language product knowledge (see api, docs, recipes, examples, starters, setup-mcp).

deepgram-java-speech-to-text

More from this repository

More from this repository

Using Deepgram Speech-to-Text (Java SDK)

When to use this product

Authentication

Quick start — REST (URL)

Quick start — REST (file bytes)

Quick start — WebSocket (live streaming)

Async equivalents

Key parameters / API surface

API reference (layered)

Gotchas

Example files in this repo

Central product skills

Using Deepgram Speech-to-Text (Java SDK)

When to use this product

Authentication

Quick start — REST (URL)

Quick start — REST (file bytes)

Quick start — WebSocket (live streaming)

Async equivalents

Key parameters / API surface

API reference (layered)

Gotchas

Example files in this repo

Central product skills