| name | mlx-audio |
| description | Local TTS on Apple Silicon via mlx-audio. Use when the user asks to generate speech, read text aloud, or manage local TTS settings. Requires the mlx-audio OpenClaw plugin to be installed and enabled. |
MLX Audio - Local TTS
Generate speech locally on Apple Silicon using mlx-audio. No API key, no cloud dependency.
Tool: mlx_audio_tts
Generate Speech
{
"action": "generate",
"text": "Text to synthesize",
"outputPath": "/optional/path/to/output.mp3"
}
Returns path to generated audio file. outputPath is restricted to /tmp or ~/.openclaw/mlx-audio/outputs, and symbolic-link path segments are rejected.
Check Status
{
"action": "status"
}
Returns server status, loaded model, server startedAt timestamp, and config.
Also includes startup phase and approximate model cache download progress when warming up.
Commands
| Command | Description |
|---|
/mlx-tts status | Server status, startup phase, and approximate model cache progress |
/mlx-tts test <text> | Run a test generation, returns text with file path and size |
/mlx-tts reload | Reload plugin config without restarting the OpenClaw gateway |
Models
| Model | Languages | Description |
|---|
| Kokoro-82M (default) | EN, JA, ZH, FR, ES, IT, PT, HI | Lightweight, multilingual, 54 preset voices |
| Qwen3-TTS-0.6B-Base | ZH, EN, JA, KO, and more | Higher Chinese quality. Supports 3-second reference audio voice cloning |
| Qwen3-TTS-1.7B-VoiceDesign | ZH, EN, JA, KO, and more | Generates voices from natural language descriptions. Requires 16 GB+ |
| Chatterbox | 16 languages | Widest language coverage. Requires 16 GB+ |
Notes
- Audio is generated locally. No data leaves the machine.
- Proxy starts first. The server warms up in the background when
autoStart is enabled, otherwise it starts on first generation request or GET /v1/models.
- Startup readiness requires
/v1/models to pass health check within about 10 seconds. If not ready, the request returns unavailable and startup is retried on the next request.
- Startup status tracks phase and approximate model cache progress (text bar + percentage). The same status appears in startup timeout error details returned to OpenClaw.
pythonEnvMode: managed (default) bootstraps uv, syncs ~/.openclaw/mlx-audio/runtime/ from bundled pyproject.toml and uv.lock, and launches with uv run --project ....
pythonEnvMode: external uses pythonExecutable directly after validating Python 3.11-3.13 and required modules.
- Single-port mode is the default:
port is the public OpenAI-compatible endpoint, and the server uses an internal derived port.
proxyPort is a legacy compatibility field. When set, the plugin uses legacy dual-port semantics.
- First generation may be slower due to model warmup.
- The server runs as a background subprocess and auto-restarts on crash.
- Proxy injects
model, lang_code, speed, temperature, top_p, top_k, repetition_penalty, and forces response_format=mp3 for speech requests.
langCode is Kokoro-specific. Qwen3-TTS auto-detects from text. Other models ignore this field.
/v1/audio/speech request bodies larger than 1 MB are rejected with HTTP 413.
- Proxy requests are canceled upstream when the downstream client disconnects before completion.
- Generated audio is streamed to disk, and payloads larger than 64 MB are rejected to avoid memory spikes.
- Output path safety checks use async filesystem operations and still reject symbolic-link path segments.
- Config is set in
openclaw.json under plugins.entries.openclaw-mlx-audio.config. Changes are auto-applied by background refresh while service is running (about every 2 seconds), and /mlx-tts reload or tool action reload can force immediate apply.