Run any Skill in Manus with one click

$pwd:

gradium-sdk

Name: Gradium Sdk
Author: championswimmer

// Teaches the agent how to use the Gradium AI real-time speech-to-text WebSocket API from TypeScript/JavaScript. Use when wiring voice transcription directly against the Gradium protocol. For browser use in this project, prefer the `@agsk/lib-gradium` wrapper — this skill documents the underlying protocol.

Run Skill in Manus

$ git log --oneline --stat

stars:1

forks:1

updated:April 18, 2026 at 14:55

SKILL.md

readonly

related-skills.json

same repository

tinyfish-sdk.md

from "championswimmer/VoiceAIHack-AGSK"

Teaches the agent how to use Tinyfish's REST Search and Fetch APIs from TypeScript/JavaScript. Use when you need lightweight web search, documentation fetching, or GitHub PR page retrieval without launching Playwright or AgentQL.

2026-04-181

lib-gradium.md

from "championswimmer/VoiceAIHack-AGSK"

Teaches the agent how to use the `@agsk/lib-gradium` browser SDK in this monorepo to add real-time Gradium STT to any browser app. Use when integrating voice transcription into the pi-reviewer extension (or any other browser UI) instead of calling the raw Gradium WebSocket directly. Also documents the local-proxy requirement.

2026-04-181

planning.md

from "championswimmer/VoiceAIHack-AGSK"

Creates and maintains execution plans in .agents/plans/. Use when the user asks for a plan, roadmap, phased breakdown, implementation checklist, or progress tracker. Plans must include a completion status, a top-of-file checklist covering every step, and detailed phases with actionable steps.

2026-04-181

package.json

"author": "championswimmer"

"repository": "championswimmer/VoiceAIHack-AGSK"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name	gradium-sdk
description	Teaches the agent how to use the Gradium AI real-time speech-to-text WebSocket API from TypeScript/JavaScript. Use when wiring voice transcription directly against the Gradium protocol. For browser use in this project, prefer the `@agsk/lib-gradium` wrapper — this skill documents the underlying protocol.

Gradium AI (raw WebSocket protocol)

Use this skill when implementing a Gradium STT client from scratch, or when debugging one. In this repo the browser-side SDK at libs/lib-gradium/ already implements this — prefer importing it (@agsk/lib-gradium) unless you specifically need to talk to Gradium directly.

Endpoint

wss://us.api.gradium.ai/api/speech/asr

Authentication

Gradium requires an HTTP header on the WebSocket upgrade:

x-api-key: <GRADIUM_API_KEY>

Important: browser WebSocket cannot set custom headers. Only Node or native clients can. From a browser, you must route through a Node proxy that accepts a local WebSocket and re-opens to Gradium with the header — see libs/lib-gradium/demo/vite.config.ts for a 40-line reference proxy.

Verified as the only working auth method — query param, subprotocol, and setup-message payloads are all rejected with "No authentication provided".

Audio format

PCM 16-bit signed little-endian, mono, 24 kHz
Frames must be exactly 1920 samples = 3840 bytes = 80 ms
Each frame is base64-encoded and wrapped in a JSON message

Do not use MediaRecorder — it only emits containerized opus/webm, never raw PCM. Use AudioContext + AudioWorklet to capture Float32 samples and convert to Int16.

Message shapes

Client → server

Send exactly one setup, then audio frames paced at ~80ms, then optionally end_of_stream:

// 1. setup (first message after connection opens)
{ "type": "setup", "model_name": "default", "input_format": "pcm" }

// 2. audio (field name is `audio`, NOT `data`)
{ "type": "audio", "audio": "<base64 pcm_s16le 24kHz mono, 1920 samples>" }

// 3. optional control
{ "type": "flush", "flush_id": "<client-chosen id>" }
{ "type": "end_of_stream" }

All frames MUST be WebSocket text frames, not binary.

Server → client

// handshake complete — DO NOT send audio before this arrives
{ "type": "ready", "request_id": "...", "model_name": "default",
  "sample_rate": 24000, "frame_size": 1920, "delay_in_frames": 10,
  "text_stream_names": ["asr_0"] }

// partial transcript — text is a token/phrase, concatenate across msgs in a turn
{ "type": "text", "text": "Hello", "start_s": 0.5, "stream_id": null }

// VAD every 80ms. vad[2].inactivity_prob > 0.5 = ~2s silence horizon crossed
{ "type": "step", "vad": [
    { "horizon_s": 0.5, "inactivity_prob": 0.02 },
    { "horizon_s": 1.0, "inactivity_prob": 0.05 },
    { "horizon_s": 2.0, "inactivity_prob": 0.12 },
    { "horizon_s": 3.0, "inactivity_prob": 0.20 }
  ], "step_idx": 5, "step_duration_s": 0.08, "total_duration_s": 0.4 }

// end of a turn — finalize accumulated text from this turn
{ "type": "end_text", "stop_s": 2.5, "stream_id": null }

{ "type": "flushed", "flush_id": "..." }
{ "type": "error", "message": "...", "code": 1011 }

Critical sequencing rules

Wait for ready before sending any audio. Audio sent before ready may be silently discarded or cause the server to rage-close.
Pace audio at ~80ms to match frame_size=1920. Do not fire-hose a burst of buffered chunks — Gradium validates frame arrival and responds with "Message validation error" (code 1011) if frames arrive too fast.
No partial/final flag on text. Accumulate text events within a turn; treat end_text as the turn boundary that finalizes the utterance.

Minimal browser pipeline (via a proxy)

const WS_URL = 'ws://127.0.0.1:5174'; // local proxy that injects x-api-key

const ws = new WebSocket(WS_URL);
let ready = false;
let accumulated = '';

ws.onopen = () => {
  ws.send(JSON.stringify({
    type: 'setup',
    model_name: 'default',
    input_format: 'pcm',
  }));
};

ws.onmessage = (event) => {
  const msg = JSON.parse(event.data);
  switch (msg.type) {
    case 'ready': ready = true; startAudio(); break;
    case 'text':
      accumulated = accumulated ? `${accumulated} ${msg.text}` : msg.text;
      console.log('partial:', accumulated);
      break;
    case 'end_text':
      console.log('final:', accumulated);
      accumulated = '';
      break;
    case 'step':
      // msg.vad[2].inactivity_prob in [0, 1]
      break;
    case 'error':
      console.error('gradium:', msg.message);
      break;
  }
};

async function startAudio() {
  const stream = await navigator.mediaDevices.getUserMedia({
    audio: { channelCount: 1, sampleRate: 24000 },
  });
  const ctx = new AudioContext({ sampleRate: 24000 });
  // Load an AudioWorklet that accumulates 1920 Float32 samples, converts
  // to Int16 PCM, and postMessage's the ArrayBuffer back to the main thread.
  // See libs/lib-gradium/src/index.ts for the worklet source.
  await ctx.audioWorklet.addModule('./pcm-worklet.js');
  const node = new AudioWorkletNode(ctx, 'pcm-chunker', {
    processorOptions: { chunkSamples: 1920 },
    numberOfInputs: 1, numberOfOutputs: 0, channelCount: 1,
  });
  node.port.onmessage = (e) => {
    if (ws.readyState !== WebSocket.OPEN) return;
    const bytes = new Uint8Array(e.data);
    let bin = '';
    for (let i = 0; i < bytes.length; i++) bin += String.fromCharCode(bytes[i]);
    ws.send(JSON.stringify({ type: 'audio', audio: btoa(bin) }));
  };
  ctx.createMediaStreamSource(stream).connect(node);
}

Common mistakes (all of these were in this repo's first attempt)

Mistake	Correct
URL `wss://st.gradium.ai`	`wss://us.api.gradium.ai/api/speech/asr`
Auth in setup message `api_key`	`x-api-key` header (requires proxy from browser)
Sample rate 16 kHz	24 kHz
`MediaRecorder` for PCM	`AudioContext` + `AudioWorklet`
Audio field name `data`	`audio`
Setup `audio_format: { encoding, sample_rate, channels }`	Just `model_name` + `input_format: 'pcm'`
Server event names `transcription` / `vad`	`text` / `end_text` / `step`
Sending audio before `ready`	Wait for `ready` first

gradium-sdk

More from this repository

More from this repository

Gradium AI (raw WebSocket protocol)

Endpoint

Authentication

Audio format

Message shapes

Client → server

Server → client

Critical sequencing rules

Minimal browser pipeline (via a proxy)

Common mistakes (all of these were in this repo's first attempt)

Gradium AI (raw WebSocket protocol)

Endpoint

Authentication

Audio format

Message shapes

Client → server

Server → client

Critical sequencing rules

Minimal browser pipeline (via a proxy)

Common mistakes (all of these were in this repo's first attempt)