ワンクリックで
impressions
// Add and use custom voices for VoiceMode TTS via local mlx-audio. Use when the user wants to clone a voice, do an impression, add a reference clip, or use voice="<name>" in converse.
// Add and use custom voices for VoiceMode TTS via local mlx-audio. Use when the user wants to clone a voice, do an impression, add a reference clip, or use voice="<name>" in converse.
Voice interaction for Claude Code. Use when users mention voice mode, speak, talk, converse, voice status, or voice troubleshooting.
Start an ongoing voice conversation
Remote voice via VoiceMode Connect. Use when users want to add voice to Claude Code using their phone or web app, without local STT/TTS setup.
Background music control for VoiceMode voice sessions using mpv
| name | impressions |
| description | Add and use custom voices for VoiceMode TTS via local mlx-audio. Use when the user wants to clone a voice, do an impression, add a reference clip, or use voice="<name>" in converse. |
Make VoiceMode speak in any voice. The model takes a short reference clip and synthesises fresh speech in that voice via local Qwen3-TTS on top of mlx-audio.
Status: Preview / experimental. Apple Silicon only. Opt-in.
voice= argument in voicemode:converse doesn't match a known Kokoro voicemlx-audio service# 1. Install the local TTS service (one-time, Apple Silicon only)
voicemode service install mlx-audio
# 2. Add a voice from a reference clip
voicemode clone add fleabag ~/Downloads/fleabag-clip.wav
# 3. Use it
voicemode converse --voice fleabag
In the MCP converse tool, pass voice="fleabag" -- VoiceMode auto-routes any voice that matches a profile in VOICEMODE_VOICES_DIR to mlx-audio instead of Kokoro / OpenAI.
voicemode clone add validates the input before doing any expensive work:
default.wav.If your source is longer than 9 seconds, trim with the same one-liner the runtime error suggests:
ffmpeg -i in.wav -ss 0 -t 8 out.wav
Voices live as directories under ~/.voicemode/voices/<name>/:
~/.voicemode/voices/fleabag/
├── default.wav # required: 3-9s of clean reference audio, mono 24kHz 16-bit PCM
└── voice.md # auto-generated by `voicemode clone add` -- name, source, duration, format, transcript
voice.md carries YAML front matter with name, source (original input path), duration_seconds, format (literal mono 24kHz 16-bit PCM, loudnorm I=-16 TP=-1.5 LRA=11), and transcript. It documents what the clip is and where it came from.
voices.json at the voices root is retained as a legacy index -- voicemode clone add writes an entry pointing at <name>/default.wav so older consumers keep working. Prefer the directory layout above for new work.
Multiple WAVs are allowed alongside default.wav; symlink whichever one is "active" to default.wav. A directory with multiple WAVs and no default.wav is treated as a sample bin and skipped.
5-9 seconds of clean conversational speech beats 30 seconds of noisy podcast audio. The model copies what it hears -- including hum, music beds, and laugh tracks. See docs/finding-samples.md for ranking heuristics, an mlx-whisper word-timestamp ranker concept, and ffmpeg loudnorm recipes.
| Variable | Default | Purpose |
|---|---|---|
VOICEMODE_VOICES_DIR | ~/.voicemode/voices | Where voice profiles live |
VOICEMODE_REMOTE_VOICES_DIR | (unset) | Path on remote mlx-audio host (path translation) |
VOICEMODE_MLX_AUDIO_BASE_URL | http://127.0.0.1:8890/v1 | OpenAI-compatible mlx-audio endpoint |
VOICEMODE_IMPRESSIONS_MODEL | mlx-community/Qwen3-TTS-12Hz-1.7B-Base-bf16 | Hugging Face model ID |
The unreleased 8.7.0 candidate used VOICEMODE_CLONE_* names. They're honoured in 8.7.x with a one-shot deprecation warning and removed in 8.8.0:
| Deprecated | Use instead |
|---|---|
VOICEMODE_CLONE_BASE_URL | VOICEMODE_MLX_AUDIO_BASE_URL |
VOICEMODE_CLONE_MODEL | VOICEMODE_IMPRESSIONS_MODEL |
VOICEMODE_CLONE_PORT | VOICEMODE_MLX_AUDIO_PORT |
If you see those in a user's voicemode.env, suggest updating them.
af_sky (or any other Kokoro voice name) shadows the Kokoro voice. Pick distinctive names like fleabag, mike-2026, bryan_morning.