| name | phone-agent |
| description | Run a real-time AI phone agent using Twilio, Deepgram, and OpenAI/ElevenLabs TTS. Handles incoming calls, transcribes audio, generates responses via LLM, and speaks back via streaming TTS. Use when user wants to: (1) Test voice AI capabilities, (2) Handle phone calls programmatically, (3) Build a conversational voice bot. |
Phone Agent Skill
Runs a local FastAPI server that acts as a real-time voice bridge.
Architecture
Twilio (Phone) <--> WebSocket (Audio) <--> [Local Server] <--> Deepgram (STT)
|
+--> OpenAI (LLM)
+--> OpenAI TTS or ElevenLabs (TTS)
Prerequisites
- Twilio Account: Phone number + TwiML App.
- Deepgram API Key: For fast speech-to-text.
- OpenAI API Key: For conversation logic + TTS (default).
- ElevenLabs API Key (optional): For higher-quality TTS (set
TTS_PROVIDER=elevenlabs).
- Ngrok (or similar): To expose your local port 8080 to Twilio.
Setup
-
Install Dependencies:
pip install -r scripts/requirements.txt
-
Set Environment Variables (in legacy ~/.moltbot/.env, ~/.clawdbot/.env, or export):
export DEEPGRAM_API_KEY="your_key"
export OPENAI_API_KEY="your_key"
export TWILIO_ACCOUNT_SID="your_sid"
export TWILIO_AUTH_TOKEN="your_token"
export PORT=8080
export TTS_PROVIDER="openai"
export OPENAI_TTS_VOICE="echo"
export OPENAI_TTS_MODEL="tts-1"
export ELEVENLABS_API_KEY="your_key"
export ELEVENLABS_VOICE_ID="onwK4e9ZLuTAKqWW03F9"
Optional - System Prompt Customization (priority: file > env var > built-in):
export SYSTEM_PROMPT_FILE="/path/to/custom-prompt.txt"
export SYSTEM_PROMPT="You are a helpful phone assistant. Be concise and friendly."
export AGENT_NAME="Niemand"
export OWNER_NAME="Martin's"
-
Start the Server:
python3 scripts/server.py
-
Expose to Internet:
ngrok http 8080
-
Configure Twilio:
- Go to your Phone Number settings.
- Set "Voice & Fax" -> "A Call Comes In" to Webhook.
- URL:
https://<your-ngrok-url>.ngrok.io/incoming
- Method:
POST
Usage
Call your Twilio number. The agent should answer, transcribe your speech, think, and reply in a natural voice.
Customization
- System Prompt: Configure via
SYSTEM_PROMPT_FILE (load from file), SYSTEM_PROMPT (env var), or modify the built-in defaults with AGENT_NAME and OWNER_NAME.
- TTS Provider: Set
TTS_PROVIDER=openai (default, $0.03/min) or TTS_PROVIDER=elevenlabs ($0.17/min, higher quality).
- Voice (OpenAI): Set
OPENAI_TTS_VOICE — options: alloy, echo, fable, onyx, nova, shimmer.
- Voice (ElevenLabs): Change
ELEVENLABS_VOICE_ID to use different voices.
- Model: Switch
gpt-4o-mini to gpt-4 for smarter (but slower) responses.
- Language: Set
AGENT_LANGUAGE to en or de for English or German.