name	vox
description	Lightweight voice MCP server with local Moonshine STT + Kokoro TTS
version	0.1.0
tags	["mcp","voice","tts","stt","rust"]

Vox — Claude Code Skill

Project Overview

Vox is a lightweight voice MCP server (~1,500 lines of Rust) providing local text-to-speech (Kokoro) and speech-to-text (Moonshine Base) via the MCP protocol. It runs as a stdio subprocess per MCP client, or as a shared HTTP daemon.

Build & Test

cargo check                    # type-check only
cargo test                     # run all unit tests (84 tests across 10 modules)
cargo clippy -- -D warnings    # lint — must pass with zero warnings
cargo build --release          # optimized build (LTO + single codegen unit)
cargo bench -- resample        # benchmark resampling

All of cargo test, cargo clippy -- -D warnings, and cargo fmt --check must pass before submitting changes.

Architecture

Module	Purpose
`main.rs`	Entry point, config loading, model download, stdio/daemon startup
`cli.rs`	Clap CLI parser: daemon, config, download-models subcommands
`server.rs`	MCP tool handlers (`say`, `listen`, `converse`), streaming TTS pipeline
`tts.rs`	Kokoro TTS engine wrapper, voice name → speaker ID resolution, sentence splitting
`audio.rs`	cpal-based mic capture and speaker playback, Lanczos-3 sinc resampling
`stt.rs`	Moonshine Base STT engine wrapper
`vad.rs`	Voice activity detection (Silero ONNX)
`config.rs`	TOML config loading, env var overrides (`VOX_*` prefix), path resolution
`daemon.rs`	HTTP daemon lifecycle: daemonize, PID file, start/stop/status/log
`models.rs`	Model readiness checks and download/extraction
`lib.rs`	Public re-exports for benchmarks (`audio`, `config`, `error`, `tts`)
`error.rs`	`VoiceError` enum with `thiserror` derives

Transport Modes

Stdio (default): rmcp::transport::stdio(). One process per MCP client.
Daemon (vox daemon start [--port PORT]): StreamableHttpService via rmcp. Single process, models loaded once, multiple clients connect over HTTP/SSE. Factory closure creates a VoiceMcpServer per session with shared Arc<Mutex<TtsEngine>> and Arc<Mutex<SttEngine>>.

Config System

Precedence (highest wins):

Environment variables: VOX_SPEED, VOX_VOICE, VOX_MODEL_DIR, VOX_LOG_LEVEL, VOX_PORT
TOML file: $XDG_CONFIG_HOME/vox/config.toml
Compiled defaults (Config::default())

CLI management: vox config get [key], vox config set <key> <value>, vox config path

MCP Tools

Tool	Description
`say`	Speak text aloud through speakers (TTS only)
`listen`	Record from microphone and transcribe (STT only)
`converse`	Speak text then listen for response (TTS + STT round-trip)

Available Voices

American female (af_*): heart, alloy, aoede, bella, jessica, kore, nicole, nova, river, sarah, sky American male (am_*): adam, echo, eric, liam, michael, onyx, puck, santa British female (bf_*): alice, emma, lily British male (bm_*): daniel, fable, george, lewis

Default: af_heart (ID 0). Voices can be specified by name or numeric ID.

Code Conventions

Edition 2024 — uses let chains (if let Ok(x) = ... && let Ok(y) = ...)
Visibility: pub(crate) for test-only exposure, not fully pub
Clippy: treat all warnings as errors (-D warnings)
Tests: inline #[cfg(test)] mod tests per module, tempfile for filesystem tests
unsafe impl Send: TtsEngine and CaptureHandle have manual Send impls due to non-Send cpal/sherpa internals confined to dedicated threads

Dev Workflow

Make changes
cargo test — verify all 84 tests pass
cargo clippy -- -D warnings — zero warnings
cargo fmt --check — formatting
If touching audio.rs resampling: cargo bench -- resample

name	vox
description	Lightweight voice MCP server with local Moonshine STT + Kokoro TTS
version	0.1.0
tags	["mcp","voice","tts","stt","rust"]