| name | vocal |
| description | Speak text aloud (TTS) and transcribe speech (STT). Supports local (macOS say, mlx-whisper) and cloud (ElevenLabs) providers. Use when user asks to speak, read aloud, listen, transcribe, or use vocal. |
| license | Apache-2.0 |
| disable-model-invocation | true |
| metadata | {"status":"experimental","experimental_reason":"Voice workflows depend on local audio devices and optional ElevenLabs credentials, so reliability is environment-sensitive."} |
Vocal
Speak text aloud and transcribe speech with local and cloud providers. User-invocable only (/vocal) — audio is a side-effect surface, not something to auto-trigger on.
Usage
/vocal — turn-based vocal loop
Runs an ask-aloud / listen / respond / keep-listening cycle using the vocal-listener background agent.
/vocal What should we work on next?
Optional inline config:
/vocal stt=local tts=local duration=8 What should we work on next?
/vocal stt=elevenlabs tts=elevenlabs duration=10 Ready when you are.
Loop behavior
-
Parse inline config from the command text:
stt=local|elevenlabs (default: local)
tts=local|elevenlabs (default: match stt)
duration=<seconds> (default: 8)
- Remaining text becomes the first spoken prompt.
-
Validate selected providers before starting (run only the checks needed):
uv run ~/.claude/skills/vocal/scripts/stt_local.py --check
uv run ~/.claude/skills/vocal/scripts/stt_elevenlabs.py --check
uv run ~/.claude/skills/vocal/scripts/tts_local.py --check
uv run ~/.claude/skills/vocal/scripts/tts_elevenlabs.py --check
-
Launch the listener. Create or reuse a team named vocal and launch vocal-listener as a background task with config:
stt_provider=<local|elevenlabs>
duration_seconds=<duration>
continue_token=keep-listening
stop_token=stop-listening
-
Speak the first prompt aloud (if provided). If none is provided, speak: Vocal mode active. I'm listening.
-
For every listener message starting with [voice-input]:
- Treat the transcript as the user turn.
- Produce a concise assistant response.
- Speak the response with the selected TTS provider.
- Send
keep-listening to the listener agent.
-
Stop conditions:
- Transcript asks to stop (e.g. "stop vocal mode", "goodbye", "exit vocal") — speak confirmation and send
stop-listening.
- Listener reports
[voice-error] — surface the error and pause vocal mode.
Turn-based, not full-duplex realtime. Each listen cycle is a separate background agent turn. Keep spoken responses short unless the user asks for detail.
Web tuning console
uv run --script ~/.claude/skills/vocal/scripts/web_console.py
Open http://127.0.0.1:8765 to tune the skill from a local browser.
The console supports:
- TTS sample playback for local
say and ElevenLabs
- Local and ElevenLabs voice listing
- Browser microphone recording and audio-file transcription
- Provider checks from the same scripts used by the skill
- Saved local defaults in
skills/vocal/data/preferences.json
Options:
uv run --script ~/.claude/skills/vocal/scripts/web_console.py --port 8799
VOCAL_DATA_DIR=~/Library/Application\ Support/vocal-skill \
uv run --script ~/.claude/skills/vocal/scripts/web_console.py
Local TTS (macOS say)
uv run --script ~/.claude/skills/vocal/scripts/tts_local.py --text "Hello Michael"
Examples:
uv run --script ~/.claude/skills/vocal/scripts/tts_local.py \
--text "Build succeeded" \
--voice Alex \
--rate 200 \
--output /tmp/build.aiff
uv run --script ~/.claude/skills/vocal/scripts/tts_local.py --list-voices
Local STT (mlx-whisper, Apple Silicon)
uv run --script ~/.claude/skills/vocal/scripts/stt_local.py --duration 5
uv run --script ~/.claude/skills/vocal/scripts/stt_local.py --file ./meeting.wav
uv run --script ~/.claude/skills/vocal/scripts/stt_local.py --list-devices
uv run --script ~/.claude/skills/vocal/scripts/stt_local.py --duration 5 --device 1
ElevenLabs TTS (cloud)
uv run --script ~/.claude/skills/vocal/scripts/tts_elevenlabs.py \
--text "Hello Michael" \
--voice George
Examples:
uv run --script ~/.claude/skills/vocal/scripts/tts_elevenlabs.py \
--text "Deployment complete" \
--model eleven_turbo_v2_5 \
--output /tmp/deploy.mp3 \
--play
ElevenLabs STT (Scribe v2)
uv run --script ~/.claude/skills/vocal/scripts/stt_elevenlabs.py --duration 5
uv run --script ~/.claude/skills/vocal/scripts/stt_elevenlabs.py --file ./call.wav
uv run --script ~/.claude/skills/vocal/scripts/stt_elevenlabs.py --list-devices
uv run --script ~/.claude/skills/vocal/scripts/stt_elevenlabs.py --duration 5 --device 1
Provider checks
uv run --script ~/.claude/skills/vocal/scripts/tts_local.py --check
uv run --script ~/.claude/skills/vocal/scripts/stt_local.py --check
uv run --script ~/.claude/skills/vocal/scripts/tts_elevenlabs.py --check
uv run --script ~/.claude/skills/vocal/scripts/stt_elevenlabs.py --check
Provider Comparison
| Provider | Mode | Latency | Quality | Cost |
|---|
tts_local.py | Local | Low | Good | Free |
stt_local.py | Local | Medium (first run downloads model) | Good | Free |
tts_elevenlabs.py | Cloud | Very low with flash model | Very high | Paid API |
stt_elevenlabs.py | Cloud | Low | Very high | Paid API |
Environment Variables
| Variable | Required | Used by |
|---|
ELEVENLABS_API_KEY | Yes (cloud only) | tts_elevenlabs.py, stt_elevenlabs.py |
ELEVEN_LABS_API_KEY | Accepted alias | tts_elevenlabs.py, stt_elevenlabs.py |
Set via ~/.env or shell export.
Recommended local setup:
ELEVENLABS_API_KEY=your-key-here
ELEVEN_LABS_API_KEY=your-key-here
Put one of those lines in ~/.env, then restart web_console.py.
The vocal scripts load ~/.env automatically before checking the process environment.
Troubleshooting
Getting an ElevenLabs API key
- Open https://elevenlabs.io/app/settings/api-keys
- Create a key
- Export it:
export ELEVENLABS_API_KEY=your-key-here
macOS microphone permissions
If transcription fails with permission errors:
- Open
System Settings -> Privacy & Security -> Microphone
- Allow Terminal (or your Claude host app)
- Re-run the command
Common issues
say: command not found: install or restore macOS command line tools
mlx-whisper import error: run command via uv run so dependencies install
API key invalid: regenerate key and ensure no whitespace
Self-Validation
Run fast provider checks:
uv run --script ~/.claude/skills/vocal/tests/test_voice.py
Run file-based ask/listen/respond loop (no microphone required):
uv run --script ~/.claude/skills/vocal/tests/test_voice_loop.py
Include cloud loop validation (requires ElevenLabs key):
uv run --script ~/.claude/skills/vocal/tests/test_voice_loop.py --cloud
Run web console helper tests:
uv run --script ~/.claude/skills/vocal/tests/test_web_console.py
Run browser validation for the web console:
uv run --script ~/.claude/skills/vocal/tests/test_web_console_playwright.py
uv run --script ~/.claude/skills/vocal/tests/test_web_console_playwright.py \
--url http://127.0.0.1:8765
uv run --script ~/.claude/skills/vocal/tests/test_web_console_playwright.py \
--url http://127.0.0.1:8765 \
--cloud
uv run --script ~/.claude/skills/vocal/tests/test_web_console_playwright.py \
--url http://127.0.0.1:8765 \
--headed \
--slow-mo 100
Fixture files for loop validation:
tests/fixtures/loop_prompt.txt
tests/fixtures/expected_keyword.txt
References
- Architecture & research: See references/architecture.md — three-tier design, ElevenLabs API details, Claude Code background communication research, CLI programmatic modes
- Voice bridge backlog: See
backlog/voice-bridge-plan.md — standalone process for continuous voice conversation with self-eval loop