ワンクリックで
whisper-setup
// Use when the user wants to set up whisper for PTT (push-to-talk) voice input. Guides through choosing API vs local mode and configuring whisper.cpp if local.
// Use when the user wants to set up whisper for PTT (push-to-talk) voice input. Guides through choosing API vs local mode and configuring whisper.cpp if local.
| name | whisper-setup |
| description | Use when the user wants to set up whisper for PTT (push-to-talk) voice input. Guides through choosing API vs local mode and configuring whisper.cpp if local. |
This skill guides users through setting up Whisper for the PTT plugin.
The PTT plugin supports two transcription backends:
Ask the user which mode they prefer:
Which Whisper mode would you like to set up?
1. **OpenAI API** (Recommended for ease of use)
- Requires OpenAI API key
- Costs ~$0.006 per minute of audio
- Best transcription quality
- Requires internet connection
2. **Local whisper.cpp** (Recommended for privacy)
- Free, no API costs
- Works offline
- Requires ~150MB-3GB disk space (depending on model)
- Transcription speed depends on your hardware
If user chooses API:
Check if OPENAI_API_KEY environment variable is set:
echo $OPENAI_API_KEY | head -c 10
If not set, ask user to provide their API key
Update config:
# Read current config and update
cat ~/.claude/ptt-config.json
Set whisper.openaiApiKey to the user's key or instruct them to set OPENAI_API_KEY env var.
Set whisper.preferredMode to "api"
If user chooses local:
Check available RAM:
free -h
Check available disk space:
df -h ~
Check CPU info:
lscpu | grep -E "(Model name|CPU\(s\)|Thread)"
Check for NVIDIA GPU (for CUDA acceleration):
nvidia-smi 2>/dev/null || echo "No NVIDIA GPU detected"
Present model options with recommendations based on system:
| Model | Size | RAM Required | Speed | Quality | Best For |
|---|---|---|---|---|---|
| tiny.en | 75MB | ~400MB | Fastest | Basic | Low-resource systems, quick tests |
| base.en | 142MB | ~500MB | Fast | Good | Most desktop systems (RECOMMENDED) |
| small.en | 466MB | ~1GB | Medium | Better | Systems with 8GB+ RAM |
| medium.en | 1.5GB | ~2.5GB | Slow | Great | Systems with 16GB+ RAM |
| large-v3 | 3GB | ~4GB | Slowest | Best | High-end systems, accuracy critical |
Recommendations:
tiny.enbase.en (default recommendation)small.en for better qualitymedium.en or large-v3 if accuracy is criticalInstall dependencies:
sudo apt-get update && sudo apt-get install -y build-essential cmake
Clone and build:
cd ~ && git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp && make -j$(nproc)
Download chosen model:
./models/download-ggml-model.sh <model_name>
Replace <model_name> with: tiny.en, base.en, small.en, medium.en, or large-v3
Test the installation:
./build/bin/whisper-cli -m models/ggml-<model>.bin -f samples/jfk.wav
Update ~/.claude/ptt-config.json:
{
"whisper": {
"localModelPath": "/home/<user>/whisper.cpp/models/ggml-<model>.bin",
"whisperExecutable": "/home/<user>/whisper.cpp/build/bin/whisper-cli",
"preferredMode": "local"
}
}
Verify the setup works:
# Record a short test
arecord -f S16_LE -r 16000 -c 1 -d 3 /tmp/test.wav
# Transcribe
~/whisper.cpp/build/bin/whisper-cli -m ~/whisper.cpp/models/ggml-base.en.bin -f /tmp/test.wav
Ask if user wants fallback enabled:
enableFallback: true"whisper-cli not found"
"Model file not found"
"API key invalid"
"Out of memory" during local transcription
make without additional flagsmake GGML_CUDA=1