Run any Skill in Manus with one click

$pwd:

deepgram-java-voice-agent

Name: Deepgram Java Voice Agent
Author: deepgram

// Use when writing or reviewing Java code in this repo that builds an interactive voice agent over `agent.deepgram.com/v1/agent/converse`. Covers `client.agent().v1().v1WebSocket()`, `AgentV1Settings`, `sendSettings`, `sendMedia`, event handlers, provider configuration, and message injection. Use `deepgram-java-text-to-speech` for one-way synthesis or the STT skills for transcription-only flows. Triggers include "voice agent", "agent converse", "full duplex", "barge in", "function call", and "agent websocket".

Run Skill in Manus

$ git log --oneline --stat

stars:4

forks:3

updated:April 24, 2026 at 18:07

SKILL.md

readonly

name

deepgram-java-voice-agent

description

Use when writing or reviewing Java code in this repo that builds an interactive voice agent over `agent.deepgram.com/v1/agent/converse`. Covers `client.agent().v1().v1WebSocket()`, `AgentV1Settings`, `sendSettings`, `sendMedia`, event handlers, provider configuration, and message injection. Use `deepgram-java-text-to-speech` for one-way synthesis or the STT skills for transcription-only flows. Triggers include "voice agent", "agent converse", "full duplex", "barge in", "function call", and "agent websocket".

Using Deepgram Voice Agent (Java SDK)

Run a full-duplex voice agent over a single WebSocket: user audio in, agent events + audio out.

When to use this product

You want a live conversational agent.
You need STT + think-provider + TTS orchestration in one session.
You may need message injection, prompt updates, or function-call handling.

Use a different skill when:

You only need transcription → deepgram-java-speech-to-text or deepgram-java-conversational-stt.
You only need speech synthesis → deepgram-java-text-to-speech.
You only need project/admin endpoints → deepgram-java-management-api.

Authentication

import com.deepgram.DeepgramClient;

DeepgramClient client = DeepgramClient.builder()
        .apiKey(System.getenv("DEEPGRAM_API_KEY"))
        .build();

The agent WebSocket uses the SDK's agent environment URL and the same auth headers.

Quick start

import com.deepgram.resources.agent.v1.types.AgentV1Settings;
import com.deepgram.resources.agent.v1.types.AgentV1SettingsAgent;
import com.deepgram.resources.agent.v1.types.AgentV1SettingsAgentThink;
import com.deepgram.resources.agent.v1.types.AgentV1SettingsAgentThinkOneItem;
import com.deepgram.resources.agent.v1.types.AgentV1SettingsAgentThinkOneItemProvider;
import com.deepgram.resources.agent.v1.types.AgentV1SettingsAudio;
import com.deepgram.resources.agent.v1.websocket.V1WebSocketClient;
import com.deepgram.types.OpenAiThinkProvider;
import java.util.List;
import java.util.Map;

V1WebSocketClient wsClient = client.agent().v1().v1WebSocket();

wsClient.onWelcome(welcome -> {
    OpenAiThinkProvider openAiProvider = OpenAiThinkProvider.of(Map.of("model", "gpt-4o-mini"));

    AgentV1Settings settings = AgentV1Settings.builder()
            .audio(AgentV1SettingsAudio.builder().build())
            .agent(AgentV1SettingsAgent.builder()
                    .think(AgentV1SettingsAgentThink.of(List.of(AgentV1SettingsAgentThinkOneItem.builder()
                            .provider(AgentV1SettingsAgentThinkOneItemProvider.of(openAiProvider))
                            .prompt("You are a helpful voice assistant. Keep your responses brief.")
                            .build())))
                    .greeting("Hello! How can I help you today?")
                    .build())
            .build();

    wsClient.sendSettings(settings);
});

wsClient.onConversationText(text -> System.out.printf("[%s] %s%n", text.getRole(), text.getContent()));
wsClient.onAgentStartedSpeaking(event -> System.out.println(">> Agent started speaking"));
wsClient.onAgentV1Audio(audioData -> System.out.printf("Received %d bytes%n", audioData.size()));

wsClient.connect().get(10, java.util.concurrent.TimeUnit.SECONDS);

Message injection / control

The repo also demonstrates:

wsClient.sendInjectUserMessage(com.deepgram.resources.agent.v1.types.AgentV1InjectUserMessage.builder()
        .content("What is the capital of France?")
        .build());

wsClient.sendInjectAgentMessage(com.deepgram.resources.agent.v1.types.AgentV1InjectAgentMessage.builder()
        .message("By the way, I can also help you with math and science questions!")
        .build());

Key parameters / API surface

Connect path: client.agent().v1().v1WebSocket()
Initial session config: AgentV1Settings
Common send methods: sendSettings, sendMedia, sendUpdatePrompt, sendUpdateSpeak, sendInjectUserMessage, sendInjectAgentMessage, sendFunctionCallResponse, sendKeepAlive
Event handlers: onWelcome, onSettingsApplied, onConversationText, onUserStartedSpeaking, onAgentThinking, onFunctionCallRequest, onAgentStartedSpeaking, onAgentAudioDone, onAgentV1Audio, onInjectionRefused, onPromptUpdated, onSpeakUpdated, onErrorMessage, onWarning
Think-model discovery lives at client.agent().v1().settings().think().models().list()

API reference (layered)

In-repo source of truth: src/main/java/com/deepgram/resources/agent/v1/ and examples/agent/. No reference.md file is present.
Canonical AsyncAPI: https://developers.deepgram.com/asyncapi.yaml
Context7: /llmstxt/developers_deepgram_llms_txt
Product docs:

Gotchas

The base URL is the agent environment, not the standard API base. The SDK routes this automatically through environment().getAgentURL().
Send settings first. The repo examples wait for onWelcome(...) and immediately call sendSettings(...).
Audio is binary ByteString. Playback/output is your responsibility.
sendMedia(...) is raw audio bytes. Match whatever audio settings you configured.
Use the provider wrapper/union types rather than raw JSON. Constructors like OpenAiThinkProvider.of(...), AnthropicThinkProvider.of(...), GoogleThinkProvider.of(...) package the provider into the think/listen/speak union the SDK expects. The underlying payload is still an Object (so provider-field mistakes won't be caught at compile time), but the wrappers keep routing correct and ensure you pick the right variant of the sealed union.
There is no persisted agent-configuration management client shown in this checkout. This repo exposes live agent runtime plus think-model discovery.
Closing is connection-level. The examples call disconnect(); there is no separate close-message flow like Speak/Listen.

Example files in this repo

examples/agent/VoiceAgent.java
examples/agent/InjectMessage.java
examples/agent/ProviderCombinations.java
examples/agent/CustomProviders.java

Central product skills

For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills:

npx skills add deepgram/skills

This SDK ships language-idiomatic code skills; deepgram/skills ships cross-language product knowledge (see api, docs, recipes, examples, starters, setup-mcp).

related-skills.json

same repository

deepgram-java-management-api.md

from "deepgram/deepgram-java-sdk"

Use when writing or reviewing Java code in this repo that calls Deepgram Management APIs for projects, project models, API keys, members, invites, usage, and billing. Covers `client.manage().v1().*` plus related think-model discovery under `client.agent().v1().settings().think().models()`. Use `deepgram-java-voice-agent` for live agent conversations instead of admin APIs. Triggers include "management api", "list projects", "api keys", "members", "invites", "usage", "billing", and "models".

2026-04-244

deepgram-java-speech-to-text.md

from "deepgram/deepgram-java-sdk"

Use when writing or reviewing Java code in this repo that calls Deepgram Speech-to-Text v1 (`/v1/listen`) for prerecorded or live transcription. Covers `client.listen().v1().media().transcribeUrl` / `transcribeFile` (REST) and `client.listen().v1().v1WebSocket()` (WebSocket). Use `deepgram-java-audio-intelligence` for analytics overlays, `deepgram-java-conversational-stt` for Flux `/v2/listen`, and `deepgram-java-voice-agent` for full-duplex assistants. Triggers include "transcribe", "speech to text", "STT", "listen v1", "nova-3", "live transcription", and "websocket transcription".

2026-04-244

deepgram-java-text-to-speech.md

from "deepgram/deepgram-java-sdk"

Use when writing or reviewing Java code in this repo that calls Deepgram Text-to-Speech v1 (`/v1/speak`) for audio synthesis. Covers one-shot REST via `client.speak().v1().audio().generate(...)` and streaming synthesis via `client.speak().v1().v1WebSocket()`. Use `deepgram-java-voice-agent` for full-duplex assistants instead of one-way synthesis. Triggers include "tts", "text to speech", "speak", "aura", "streaming tts", and "speak websocket".

2026-04-244

deepgram-java-audio-intelligence.md

from "deepgram/deepgram-java-sdk"

Use when writing or reviewing Java code in this repo that enables Deepgram intelligence overlays on `/v1/listen` audio transcription - diarization, entity detection, sentiment, summarize, topics, intents, language detection, and redaction. Same endpoint as plain STT, but with extra request fields on `ListenV1RequestUrl` or `MediaTranscribeRequestOctetStream`. Use `deepgram-java-speech-to-text` for plain transcripts and `deepgram-java-text-intelligence` for analysis on existing text. Triggers include "audio intelligence", "diarize", "summarize audio", "sentiment from audio", "topic detection", and "redact".

2026-04-244

deepgram-java-conversational-stt.md

from "deepgram/deepgram-java-sdk"

Use when writing or reviewing Java code in this repo that calls Deepgram Conversational STT v2 / Flux over `/v2/listen`. Covers `client.listen().v2().v2WebSocket()`, `V2ConnectOptions`, `onTurnInfo`, and turn-aware close handling. Use `deepgram-java-speech-to-text` for standard v1 transcription and `deepgram-java-voice-agent` for fully interactive assistants. Triggers include "flux", "conversational stt", "listen v2", "turn detection", "end of turn", and "eot".

2026-04-244

deepgram-java-text-intelligence.md

from "deepgram/deepgram-java-sdk"

Use when writing or reviewing Java code in this repo that calls Deepgram Text Intelligence / Read (`/v1/read`) for text analysis. Covers `client.read().v1().text().analyze(...)` with `ReadV1Request` or `TextAnalyzeRequest`. Use `deepgram-java-audio-intelligence` when the source is audio instead of text. Triggers include "read api", "text intelligence", "analyze text", "sentiment", "topics", "intents", and "summarize text".

2026-04-244

package.json

"author": "deepgram"

"repository": "deepgram/deepgram-java-sdk"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name

deepgram-java-voice-agent

description

Using Deepgram Voice Agent (Java SDK)

Run a full-duplex voice agent over a single WebSocket: user audio in, agent events + audio out.

When to use this product

You want a live conversational agent.
You need STT + think-provider + TTS orchestration in one session.
You may need message injection, prompt updates, or function-call handling.

Use a different skill when:

You only need transcription → deepgram-java-speech-to-text or deepgram-java-conversational-stt.
You only need speech synthesis → deepgram-java-text-to-speech.
You only need project/admin endpoints → deepgram-java-management-api.

Authentication

import com.deepgram.DeepgramClient;

DeepgramClient client = DeepgramClient.builder()
        .apiKey(System.getenv("DEEPGRAM_API_KEY"))
        .build();

The agent WebSocket uses the SDK's agent environment URL and the same auth headers.

Quick start

import com.deepgram.resources.agent.v1.types.AgentV1Settings;
import com.deepgram.resources.agent.v1.types.AgentV1SettingsAgent;
import com.deepgram.resources.agent.v1.types.AgentV1SettingsAgentThink;
import com.deepgram.resources.agent.v1.types.AgentV1SettingsAgentThinkOneItem;
import com.deepgram.resources.agent.v1.types.AgentV1SettingsAgentThinkOneItemProvider;
import com.deepgram.resources.agent.v1.types.AgentV1SettingsAudio;
import com.deepgram.resources.agent.v1.websocket.V1WebSocketClient;
import com.deepgram.types.OpenAiThinkProvider;
import java.util.List;
import java.util.Map;

V1WebSocketClient wsClient = client.agent().v1().v1WebSocket();

wsClient.onWelcome(welcome -> {
    OpenAiThinkProvider openAiProvider = OpenAiThinkProvider.of(Map.of("model", "gpt-4o-mini"));

    AgentV1Settings settings = AgentV1Settings.builder()
            .audio(AgentV1SettingsAudio.builder().build())
            .agent(AgentV1SettingsAgent.builder()
                    .think(AgentV1SettingsAgentThink.of(List.of(AgentV1SettingsAgentThinkOneItem.builder()
                            .provider(AgentV1SettingsAgentThinkOneItemProvider.of(openAiProvider))
                            .prompt("You are a helpful voice assistant. Keep your responses brief.")
                            .build())))
                    .greeting("Hello! How can I help you today?")
                    .build())
            .build();

    wsClient.sendSettings(settings);
});

wsClient.onConversationText(text -> System.out.printf("[%s] %s%n", text.getRole(), text.getContent()));
wsClient.onAgentStartedSpeaking(event -> System.out.println(">> Agent started speaking"));
wsClient.onAgentV1Audio(audioData -> System.out.printf("Received %d bytes%n", audioData.size()));

wsClient.connect().get(10, java.util.concurrent.TimeUnit.SECONDS);

Message injection / control

The repo also demonstrates:

wsClient.sendInjectUserMessage(com.deepgram.resources.agent.v1.types.AgentV1InjectUserMessage.builder()
        .content("What is the capital of France?")
        .build());

wsClient.sendInjectAgentMessage(com.deepgram.resources.agent.v1.types.AgentV1InjectAgentMessage.builder()
        .message("By the way, I can also help you with math and science questions!")
        .build());

Key parameters / API surface

Connect path: client.agent().v1().v1WebSocket()
Initial session config: AgentV1Settings
Common send methods: sendSettings, sendMedia, sendUpdatePrompt, sendUpdateSpeak, sendInjectUserMessage, sendInjectAgentMessage, sendFunctionCallResponse, sendKeepAlive
Event handlers: onWelcome, onSettingsApplied, onConversationText, onUserStartedSpeaking, onAgentThinking, onFunctionCallRequest, onAgentStartedSpeaking, onAgentAudioDone, onAgentV1Audio, onInjectionRefused, onPromptUpdated, onSpeakUpdated, onErrorMessage, onWarning
Think-model discovery lives at client.agent().v1().settings().think().models().list()

API reference (layered)

In-repo source of truth: src/main/java/com/deepgram/resources/agent/v1/ and examples/agent/. No reference.md file is present.
Canonical AsyncAPI: https://developers.deepgram.com/asyncapi.yaml
Context7: /llmstxt/developers_deepgram_llms_txt
Product docs:

Gotchas

The base URL is the agent environment, not the standard API base. The SDK routes this automatically through environment().getAgentURL().
Send settings first. The repo examples wait for onWelcome(...) and immediately call sendSettings(...).
Audio is binary ByteString. Playback/output is your responsibility.
sendMedia(...) is raw audio bytes. Match whatever audio settings you configured.
Use the provider wrapper/union types rather than raw JSON. Constructors like OpenAiThinkProvider.of(...), AnthropicThinkProvider.of(...), GoogleThinkProvider.of(...) package the provider into the think/listen/speak union the SDK expects. The underlying payload is still an Object (so provider-field mistakes won't be caught at compile time), but the wrappers keep routing correct and ensure you pick the right variant of the sealed union.
There is no persisted agent-configuration management client shown in this checkout. This repo exposes live agent runtime plus think-model discovery.
Closing is connection-level. The examples call disconnect(); there is no separate close-message flow like Speak/Listen.

Example files in this repo

examples/agent/VoiceAgent.java
examples/agent/InjectMessage.java
examples/agent/ProviderCombinations.java
examples/agent/CustomProviders.java

Central product skills

npx skills add deepgram/skills

This SDK ships language-idiomatic code skills; deepgram/skills ships cross-language product knowledge (see api, docs, recipes, examples, starters, setup-mcp).

deepgram-java-voice-agent

Using Deepgram Voice Agent (Java SDK)

When to use this product

Authentication

Quick start

Message injection / control

Key parameters / API surface

API reference (layered)

Gotchas

Example files in this repo

Central product skills

More from this repository

More from this repository

Using Deepgram Voice Agent (Java SDK)

When to use this product

Authentication

Quick start

Message injection / control

Key parameters / API surface

API reference (layered)

Gotchas

Example files in this repo

Central product skills