一键在 Manus 中运行任何 Skill

$pwd:

ai-voice-agent

Name: Ai Voice Agent
Author: M4ST3R-C0NTR0L

// Build and deploy production-grade AI voice agents for businesses. Use when: user asks about "voice AI," "AI phone agent," "IVR replacement," "automated calling," "Twilio voice bot," "AI receptionist," "voice assistant," "phone automation," "call center AI," or "voice conversational AI."

在 Manus 中运行

$ git log --oneline --stat

stars:0

forks:0

updated:2026年3月18日 03:00

SKILL.md

readonly

related-skills.json

同仓库

ai-company-ops.md

from "M4ST3R-C0NTR0L/cybrflux-skills"

Run your company with AI agents. Agent team architecture, autonomous workflows, client onboarding automation, inbox management, and heartbeat systems. Use when: user asks about "AI operations," "autonomous company," "agent workforce," "AI team structure," "automate my business," "AI employees," "agent orchestration," "run business with AI," or "AI-powered operations."

2026-03-180

lead-machine.md

from "M4ST3R-C0NTR0L/cybrflux-skills"

End-to-end lead generation system. Scraping, enrichment, scoring, cold outreach, and pipeline management with real templates and benchmarks. Use when: user asks about "lead generation," "cold email," "lead scraping," "outbound sales," "prospecting," "lead qualification," "Apollo.io," "LinkedIn scraping," "email outreach," "sales pipeline," or "B2B leads."

2026-03-180

revenue-website.md

from "M4ST3R-C0NTR0L/cybrflux-skills"

Build websites that generate revenue, not just look pretty. Conversion architecture, pricing psychology, trust engineering, and revenue-optimized design patterns. Use when: user asks about "revenue-focused website," "conversion optimization," "pricing page design," "high-converting landing page," "website that makes money," "sales page," "checkout optimization," " Stripe integration," "conversion rate," or "website ROI."

2026-03-180

viral-video-ads.md

from "M4ST3R-C0NTR0L/cybrflux-skills"

Create scroll-stopping video ads using Remotion + AI. Hook formulas, scene structure, kinetic typography, platform specs for IG Reels, TikTok, YouTube Shorts. Use when: user asks about "video ads," "viral video," "Remotion," "TikTok ads," "Instagram Reels," "video marketing," "motion graphics," "animated ads," "scroll-stopping video," or "video creative."

2026-03-180

cybrscan.md

from "M4ST3R-C0NTR0L/cybrflux-skills"

AI-powered website inspector using Playwright + vision models. Opens any URL, captures screenshots, analyzes design/UX/technical issues, and suggests fixes. Use when: user asks to review a website, check a site visually, inspect a page, or improve a web design.

2026-03-180

marketing-arsenal.md

from "M4ST3R-C0NTR0L/cybrflux-skills"

The ultimate marketing knowledge base for Cybrflux AI Marketing Agency. Comprehensive reference files covering cold outreach, email marketing, SEO (traditional + AI SEO), paid ads (Meta, Google, LinkedIn, TikTok), content strategy, copywriting, CRO, lead generation, analytics, social media, brand strategy, sales enablement, client management, and marketing automation. 14 reference files, 2000-5000+ words each, with real templates, frameworks, and actionable guidance. Use when the user asks about any marketing discipline, tactic, or strategy.

2026-03-180

package.json

"author": "M4ST3R-C0NTR0L"

"repository": "M4ST3R-C0NTR0L/cybrflux-skills"

打开 GitHub 仓库查看创作者相关仓库

$ install --global

$ download --local

在 Manus 中运行

$ useful --forSOC

软件开发工程师计算机与数学类职业15-1252L4

name	ai-voice-agent
description	Build and deploy production-grade AI voice agents for businesses. Use when: user asks about "voice AI," "AI phone agent," "IVR replacement," "automated calling," "Twilio voice bot," "AI receptionist," "voice assistant," "phone automation," "call center AI," or "voice conversational AI."
metadata	{"author":"Cybrflux","version":"1.0.0","tags":["voice","ai","twilio","telephony","automation","conversational-ai"],"requires":["twilio","openai","deepgram"]}

AI Voice Agent — Build Production Voice AI That Actually Works

This isn't a toy. This is how you build voice agents that handle real business calls—booking appointments, qualifying leads, providing support, and making sales—24/7 without human intervention.

Cybrflux has deployed voice agents for real estate (R7), healthcare, and e-commerce. This skill distills everything we learned building VoxKit into a battle-tested framework.

When to Use

Inbound Call Handling — Replace IVRs, answer FAQs, route calls intelligently
Outbound Sales/Prospecting — Cold calls that don't sound robotic
Appointment Scheduling — Book, confirm, reschedule without human agents
Lead Qualification — Pre-qualify leads before passing to sales
Customer Support — Handle tier-1 support, escalate complex issues
Follow-up Automation — Post-purchase calls, satisfaction surveys, renewal reminders

Trigger phrases: "voice AI," "AI phone agent," "automated calling system," "Twilio voice bot," "AI receptionist," "phone automation," "voice assistant for business," "IVR replacement," "call center AI"

Architecture Overview

┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Caller    │────▶│   Twilio    │────▶│  WebSocket  │────▶│  Your Server │
│  (Phone)    │◄────│  (Voice)    │◄────│  (Media)    │◄────│  (Node/Python)│
└─────────────┘     └─────────────┘     └─────────────┘     └──────┬──────┘
                                                                    │
                    ┌─────────────┐     ┌─────────────┐            │
                    │    TTS      │◄────│     LLM     │◄───────────┘
                    │  (ElevenLabs│     │  (GPT-4o/   │
                    │   or Cartesia)    │   Claude)   │
                    └──────▲──────┘     └──────▲──────┘
                           │                    │
                    ┌──────┴──────┐     ┌──────┴──────┐
                    │     STT     │────▶│  Conversation │
                    │  (Deepgram  │     │   Manager     │
                    │   or Whisper)     │               │
                    └─────────────┘     └───────────────┘

Latency Budget (target <800ms end-to-end):

STT: 150-300ms
LLM: 200-500ms
TTS: 150-300ms
Network: 50-100ms

Stack Recommendations

Telephony Layer

Provider	Best For	Cost/Min	Notes
Twilio	General purpose	$0.0085/min	Best docs, most integrations
Vonage	International	$0.005-0.02/min	Better rates outside US
Telnyx	Cost optimization	$0.004/min	Lowest per-minute rates

Speech-to-Text (STT)

Provider	Latency	Accuracy	Cost/Hour	Best For
Deepgram Nova-2	~200ms	95%+	$0.75	Production default
Deepgram Nova-3	~250ms	97%	$1.25	High-accuracy needs
Whisper API	~400ms	94%	$0.36	Budget option
Whisper Local	Variable	92%	$0	On-premise requirements

Text-to-Speech (TTS)

Provider	Latency	Naturalness	Cost/1K chars	Best For
ElevenLabs Turbo v2.5	~150ms	⭐⭐⭐⭐⭐	$0.10	Premium experience
Cartesia Sonic	~100ms	⭐⭐⭐⭐⭐	$0.08	Ultra-low latency
OpenAI TTS HD	~200ms	⭐⭐⭐⭐	$0.03	Budget-conscious
Azure Neural	~250ms	⭐⭐⭐⭐	$0.016	Enterprise fallback

LLM (Conversation Brain)

Model	Latency	Intelligence	Cost/1K tokens	Best For
GPT-4o	~300ms	⭐⭐⭐⭐⭐	$0.005/0.015	Default choice
Claude 3.5 Sonnet	~400ms	⭐⭐⭐⭐⭐	$0.003/0.015	Complex reasoning
GPT-4o-mini	~150ms	⭐⭐⭐⭐	$0.00015/0.0006	High-volume, simple flows

Core Framework: The VOX Pattern

Every production voice agent follows the VOX Pattern:

V - Voice (STT capture + audio streaming)
O - Orchestrate (conversation state + context management)  
X - eXecute (LLM reasoning + function calling + TTS output)

Step 1: Voice Layer (STT + Audio Streaming)

// Twilio WebSocket handler (Node.js/Express)
const WebSocket = require('ws');
const Deepgram = require('@deepgram/sdk');

class VoiceStreamHandler {
  constructor(twilioWs, config) {
    this.twilioWs = twilioWs;
    this.deepgram = Deepgram.createClient(config.deepgramKey);
    this.transcriptionBuffer = [];
    this.isProcessing = false;
    
    // Initialize Deepgram live transcription
    this.dgConnection = this.deepgram.listen.live({
      model: 'nova-2',
      language: 'en-US',
      smart_format: true,
      interim_results: true,
      utterance_end_ms: 1500, // Pause detection
      vad_events: true,
      endpointing: 400,
    });
    
    this.setupDeepgramHandlers();
  }
  
  setupDeepgramHandlers() {
    // Interim results (partial transcripts while speaking)
    this.dgConnection.on(LiveTranscriptionEvents.Transcript, (data) => {
      const transcript = data.channel.alternatives[0].transcript;
      
      if (data.is_final) {
        this.transcriptionBuffer.push({
          text: transcript,
          confidence: data.channel.alternatives[0].confidence,
          timestamp: Date.now()
        });
        
        // Trigger processing when utterance ends
        if (!this.isProcessing) {
          this.processTranscription();
        }
      }
    });
    
    // Utterance end = natural pause in speech
    this.dgConnection.on(LiveTranscriptionEvents.UtteranceEnd, () => {
      if (this.transcriptionBuffer.length > 0 && !this.isProcessing) {
        this.processTranscription();
      }
    });
  }
  
  async processTranscription() {
    this.isProcessing = true;
    const utterance = this.transcriptionBuffer.map(u => u.text).join(' ');
    this.transcriptionBuffer = [];
    
    // Send to orchestration layer
    await this.orchestrator.handleUserInput(utterance);
    this.isProcessing = false;
  }
  
  // Handle incoming audio from Twilio (mu-law 8kHz)
  handleAudio(payload) {
    const audioBuffer = Buffer.from(payload, 'base64');
    this.dgConnection.send(audioBuffer);
  }
}

Step 2: Orchestration Layer (State Management)

class ConversationOrchestrator {
  constructor(config) {
    this.sessions = new Map(); // Active call sessions
    this.config = config;
  }
  
  async createSession(callSid, fromNumber, toNumber) {
    const session = {
      callSid,
      fromNumber,
      toNumber,
      startTime: Date.now(),
      context: {
        userInfo: null,
        appointmentDate: null,
        qualificationScore: 0,
        intents: [],
        objects: {} // Any extracted entities
      },
      history: [],
      currentNode: 'greeting',
      transferRequested: false
    };
    
    // Enrich with caller data if available
    session.context.userInfo = await this.enrichCaller(fromNumber);
    this.sessions.set(callSid, session);
    
    return session;
  }
  
  async handleUserInput(callSid, utterance) {
    const session = this.sessions.get(callSid);
    if (!session) return;
    
    // Add to conversation history
    session.history.push({ role: 'user', content: utterance, timestamp: Date.now() });
    
    // Determine current state and available actions
    const stateContext = this.buildStateContext(session);
    
    // Execute LLM reasoning
    const response = await this.executeLLM(session, stateContext);
    
    // Handle function calls
    if (response.functionCall) {
      const result = await this.executeFunction(response.functionCall, session);
      // Re-prompt with function result
      const followUp = await this.executeLLM(session, {
        ...stateContext,
        functionResult: result
      });
      await this.speakResponse(callSid, followUp.content);
    } else {
      await this.speakResponse(callSid, response.content);
    }
    
    session.history.push({ role: 'assistant', content: response.content, timestamp: Date.now() });
  }
  
  buildStateContext(session) {
    return {
      currentNode: session.currentNode,
      extractedInfo: session.context,
      availableFunctions: this.getAvailableFunctions(session.currentNode),
      systemPrompt: this.getSystemPromptForNode(session.currentNode),
      maxHistory: 10 // Rolling window
    };
  }
}

Step 3: Execution Layer (LLM + Function Calling)

class LLMExecutor {
  constructor(config) {
    this.openai = new OpenAI({ apiKey: config.openaiKey });
    this.model = config.model || 'gpt-4o';
  }
  
  async generateResponse(session, context) {
    const messages = [
      { role: 'system', content: context.systemPrompt },
      ...session.history.slice(-context.maxHistory).map(h => ({
        role: h.role,
        content: h.content
      }))
    ];
    
    const tools = context.availableFunctions.map(fn => ({
      type: 'function',
      function: fn
    }));
    
    const response = await this.openai.chat.completions.create({
      model: this.model,
      messages,
      tools: tools.length > 0 ? tools : undefined,
      tool_choice: tools.length > 0 ? 'auto' : undefined,
      temperature: 0.7,
      max_tokens: 300 // Keep responses concise for voice
    });
    
    const choice = response.choices[0];
    
    if (choice.message.tool_calls) {
      return {
        content: choice.message.content,
        functionCall: {
          name: choice.message.tool_calls[0].function.name,
          arguments: JSON.parse(choice.message.tool_calls[0].function.arguments)
        }
      };
    }
    
    return { content: choice.message.content };
  }
}

// Example function definitions
const FUNCTIONS = {
  bookAppointment: {
    name: 'bookAppointment',
    description: 'Book an appointment for the caller',
    parameters: {
      type: 'object',
      properties: {
        date: { type: 'string', format: 'date' },
        time: { type: 'string', pattern: '^[0-9]{2}:[0-9]{2}$' },
        service: { type: 'string' },
        name: { type: 'string' },
        phone: { type: 'string' }
      },
      required: ['date', 'time', 'service', 'name', 'phone']
    }
  },
  
  checkAvailability: {
    name: 'checkAvailability',
    description: 'Check available time slots for a given date',
    parameters: {
      type: 'object',
      properties: {
        date: { type: 'string', format: 'date' },
        service: { type: 'string' }
      },
      required: ['date']
    }
  },
  
  transferToHuman: {
    name: 'transferToHuman',
    description: 'Transfer the call to a human agent',
    parameters: {
      type: 'object',
      properties: {
        reason: { type: 'string' },
        priority: { type: 'string', enum: ['low', 'normal', 'high', 'urgent'] }
      },
      required: ['reason']
    }
  },
  
  qualifyLead: {
    name: 'qualifyLead',
    description: 'Score the lead based on qualification criteria',
    parameters: {
      type: 'object',
      properties: {
        budget: { type: 'string' },
        timeline: { type: 'string' },
        authority: { type: 'boolean' },
        need: { type: 'string' }
      }
    }
  }
};

Step 4: TTS Output Streaming

class TTSStreamer {
  constructor(config) {
    this.elevenLabsKey = config.elevenLabsKey;
    this.voiceId = config.voiceId;
    this.twilioWs = null;
  }
  
  async speakResponse(callSid, text, twilioWs) {
    // Stream TTS directly to avoid latency
    const response = await fetch(
      `https://api.elevenlabs.io/v1/text-to-speech/${this.voiceId}/stream`,
      {
        method: 'POST',
        headers: {
          'Accept': 'audio/mulaw',
          'xi-api-key': this.elevenLabsKey,
          'Content-Type': 'application/json'
        },
        body: JSON.stringify({
          text,
          model_id: 'eleven_turbo_v2_5',
          output_format: 'mulaw_8000', // Twilio format
          voice_settings: {
            stability: 0.5,
            similarity_boost: 0.75,
            style: 0.3,
            use_speaker_boost: true
          }
        })
      }
    );
    
    // Stream chunks to Twilio as they arrive
    const reader = response.body.getReader();
    while (true) {
      const { done, value } = await reader.read();
      if (done) break;
      
      // Send to Twilio as media message
      twilioWs.send(JSON.stringify({
        event: 'media',
        streamSid: callSid,
        media: {
          payload: Buffer.from(value).toString('base64')
        }
      }));
    }
  }
}

Conversation Design Framework

The SPICE Method for Voice Scripts

Every voice interaction follows SPICE:

Element	Purpose	Example
Set Context	Who/what/where	"Hi, this is Sarah from Apex Dental. I'm calling about your appointment request."
Proceed with Purpose	Clear next step	"I can check our availability for this week. What day works best for you?"
Invite Response	Open-ended prompt	"Tell me a bit about what you're looking for."
Confirm Understanding	Verify extraction	"So you're looking for a cleaning on Tuesday afternoon—is that right?"
Exit or Advance	Move forward or close	"Perfect! I've got you booked. You'll receive a confirmation text. Anything else I can help with?"

Voice Persona Template

name: "Sarah"
purpose: "Dental appointment scheduling and lead qualification"
tone: "warm, professional, efficient"
speaking_style: "natural pace, occasional brief pauses, friendly but not overly casual"
avoid: 
  - "um", "uh", "like"
  - Over-apologizing
  - Robot-speak ("I am an AI assistant...")
  - Rushing through important details
use_when:
  - Inbound appointment calls
  - Follow-up on web inquiries
  - Rescheduling requests
greeting: "Hi, this is Sarah from Apex Dental. How can I help you today?"

Conversation Flow Templates

Template 1: Appointment Booking

const APPOINTMENT_FLOW = {
  nodes: {
    greeting: {
      prompt: `You're Sarah from {businessName}. Answer the phone warmly.
      If they want to book an appointment, proceed to qualification.
      If they have other questions, answer helpfully or offer to transfer.`,
      next: ['qualifyNeed', 'handleQuestion', 'transfer']
    },
    
    qualifyNeed: {
      prompt: `Ask what service they need and when they'd prefer to come in.
      Services: {availableServices}.
      Extract: service type, preferred date range, urgency level.`,
      extract: ['service', 'preferredDate', 'urgency'],
      next: ['checkAvailability']
    },
    
    checkAvailability: {
      function: 'checkAvailability',
      prompt: `Share 2-3 available slots. Present them clearly with day and time.
      Ask which they prefer or if they need different options.`,
      next: ['confirmBooking', 'offerAlternatives']
    },
    
    confirmBooking: {
      prompt: `Confirm the details: "Just to confirm, that's a {service} on {date} at {time} for {name}. Is that correct?"`,
      function: 'bookAppointment',
      next: ['bookingConfirmed', 'correctDetails']
    },
    
    bookingConfirmed: {
      prompt: `Great! You're all set. Mention they'll get a confirmation text 24 hours before.
      Ask if there's anything else they need.`,
      next: ['additionalHelp', 'closeCall']
    },
    
    closeCall: {
      prompt: `Thank them warmly and say goodbye. Keep it brief.`,
      hangup: true
    }
  }
};

Template 2: Lead Qualification (Outbound)

const OUTBOUND_QUALIFICATION_FLOW = {
  nodes: {
    intro: {
      prompt: `Quick intro: "Hi {name}, this is {agentName} from {company}. 
      I saw you recently {triggerEvent}. Do you have a quick minute?"
      
      If YES → proceed to discovery
      If NO → ask for better time, schedule callback
      If voicemail → leave 15-second message with callback number`,
      next: ['discovery', 'scheduleCallback', 'voicemail']
    },
    
    discovery: {
      prompt: `Ask BANT questions naturally:
        - Budget: "What's your budget range for this?" or "Are you the decision-maker for budget?"
        - Authority: "Besides yourself, who else is involved in this decision?"
        - Need: "What prompted you to look into this now?"
        - Timeline: "When are you hoping to have this in place?"
      
      Don't ask all at once—let it flow conversationally.
      Score each response (1-10) and store in context.`,
      extract: ['budgetRange', 'decisionMaker', 'painPoints', 'timeline', 'competitors'],
      next: ['presentSolution', 'nurture', 'disqualify']
    },
    
    presentSolution: {
      condition: 'qualificationScore >= 7',
      prompt: `They're qualified! Briefly present the solution that matches their needs.
      Focus on outcomes, not features.
      End with: "Would it make sense to schedule a 15-minute demo this week?"`,
      next: ['bookMeeting', 'handleObjection', 'nurture']
    },
    
    nurture: {
      condition: 'qualificationScore >= 4 && qualificationScore < 7',
      prompt: `Not ready to buy yet. Add value:
      - Send relevant case study
      - Offer to add them to monthly insights email
      - Set follow-up in 30-60 days
      
      Get permission and their preferred contact method.`,
      function: 'addToNurtureSequence',
      next: ['closeCall']
    },
    
    disqualify: {
      condition: 'qualificationScore < 4',
      prompt: `They're not a fit. Be polite but direct:
      "Based on what you've shared, I don't think we're the right fit right now. 
      But if things change, feel free to reach out. Thanks for your time!"`,
      next: ['closeCall']
    }
  }
};

Voice Selection Guide

Matching Voice to Brand

Industry	Voice Characteristics	Recommended Voices
Healthcare	Warm, calming, trustworthy	ElevenLabs: "Grace" (premade) or custom clone
Legal/Finance	Professional, measured, authoritative	ElevenLabs: "Adam" or "Daniel"
E-commerce	Energetic, helpful, efficient	ElevenLabs: "Bella" or "Rachel"
SaaS/Tech	Modern, confident, solution-oriented	Cartesia: "Sonic" upbeat variants
Luxury	Refined, sophisticated, unhurried	Custom clone of premium brand voice

Creating Custom Voice Clones

# ElevenLabs voice cloning best practices
"""
Requirements:
- Minimum 30 minutes of clean audio
- Consistent tone throughout samples
- No background music or noise
- Diverse sentence structures (questions, statements, emotions)
- Speaker should sound like they want the clone to sound

Upload via: https://elevenlabs.io/voice-lab
Training time: ~30 minutes
Cost: ~$5/month per custom voice
"""

# Sample recording script for voice actor:
SCRIPT = """
Welcome to our service. I'm here to help you find exactly what you need.

Did you know we offer same-day delivery in your area? That's right—order by 2 PM and it's at your door by evening.

Let me check that availability for you. One moment please... Okay, great news! We have three slots open tomorrow.

Are you looking for something specific, or would you like me to make some recommendations?

I understand this is frustrating. Let me see what I can do to make this right.

Congratulations! Your booking is confirmed. You'll receive a confirmation shortly.

Is there anything else I can help you with today?
"""

Error Handling & Edge Cases

Common Failure Modes

Issue	Detection	Recovery Strategy
STT fails/no transcription	5+ seconds silence with audio	"I'm having trouble hearing you. Could you speak up or call back from a quieter location?"
LLM hallucinates	Nonsensical response detected	Interrupt, apologize, and restate: "Let me rephrase that..."
User asks "Are you a robot?"	Keyword detection	"I'm an AI assistant helping {company} with scheduling. I'm real in the sense that I can actually book your appointment right now. How can I help?"
Angry caller	Sentiment analysis (tone + words)	"I understand you're frustrated. Let me get you to a supervisor right away." → immediate transfer
Confusing request	Low confidence on intent	"I want to make sure I help you correctly. Are you looking to [option A] or [option B]?"
Background noise	Audio level detection	"It sounds like there's some background noise. Could you move to a quieter spot?"

Barge-In Handling

// Allow users to interrupt the AI
class BargeInHandler {
  constructor() {
    this.isSpeaking = false;
    this.interruptThreshold = 0.3; // 300ms of speech triggers interrupt
  }
  
  onUserSpeechDetected() {
    if (this.isSpeaking) {
      // Stop current TTS immediately
      this.stopSpeaking();
      
      // Brief acknowledgment then let them speak
      setTimeout(() => {
        this.playAcknowledgment();
      }, 200);
    }
  }
  
  playAcknowledgment() {
    // Short audio cue: "Yes?" or "Go ahead"
    const acks = ['yes?', 'go ahead', 'im listening'];
    return acks[Math.floor(Math.random() * acks.length)];
  }
}

Deployment Patterns

Pattern 1: Serverless (Vercel/Netlify Functions)

Best for: Low volume (<1,000 calls/day), simple flows, quick MVP

// api/voice.js (Vercel Serverless)
export default async function handler(req, res) {
  if (req.method === 'POST' && req.body.Event === 'Start') {
    // Initialize call session in Redis/Upstash
    const session = await createSession(req.body.CallSid);
    
    // Return TwiML to connect WebSocket
    res.setHeader('Content-Type', 'text/xml');
    res.send(`<?xml version="1.0" encoding="UTF-8"?>
      <Response>
        <Connect>
          <Stream url="wss://your-domain.com/ws" />
        </Connect>
      </Response>`);
  }
}

Pattern 2: Dedicated Server (Docker)

Best for: High volume, complex state management, low latency requirements

# Dockerfile
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
EXPOSE 3000
CMD ["node", "server.js"]

# docker-compose.yml
version: '3.8'
services:
  voice-agent:
    build: .
    ports:
      - "3000:3000"
    environment:
      - TWILIO_ACCOUNT_SID=${TWILIO_ACCOUNT_SID}
      - TWILIO_AUTH_TOKEN=${TWILIO_AUTH_TOKEN}
      - DEEPGRAM_API_KEY=${DEEPGRAM_API_KEY}
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - ELEVENLABS_API_KEY=${ELEVENLABS_API_KEY}
    deploy:
      replicas: 3
      resources:
        limits:
          cpus: '1'
          memory: 512M

Pattern 3: Kubernetes (Production Scale)

# k8s-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: voice-agent
spec:
  replicas: 5
  selector:
    matchLabels:
      app: voice-agent
  template:
    metadata:
      labels:
        app: voice-agent
    spec:
      containers:
      - name: voice-agent
        image: your-registry/voice-agent:latest
        ports:
        - containerPort: 3000
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        envFrom:
        - secretRef:
            name: voice-agent-secrets
---
apiVersion: v1
kind: Service
metadata:
  name: voice-agent-service
spec:
  selector:
    app: voice-agent
  ports:
  - port: 80
    targetPort: 3000
  type: LoadBalancer

Pricing Calculator

Cost Per Minute Breakdown

Component	Provider	Cost/Min	Notes
Telephony	Twilio	$0.0085	Inbound US calls
STT	Deepgram Nova-2	$0.0125	Assuming ~150 words/min
LLM	GPT-4o	$0.015	~300 tokens avg response
TTS	ElevenLabs	$0.018	~180 chars avg response
Total		~$0.054/min	~$3.24/hour of talk time

Monthly Cost Projections

Volume	Minutes/Month	Est. Cost	Use Case
Startup	500	$27	Testing, 1-2 active clients
Small Biz	3,000	$162	Single business inbound line
Growing	15,000	$810	Multi-client, moderate volume
Scale	50,000	$2,700	High-volume outbound + inbound
Enterprise	200,000	$10,800	Call center replacement

Cost Optimization Tips

Use GPT-4o-mini for simple flows — 15x cheaper, often sufficient
Batch TTS when possible — Cache common phrases (greetings, closings)
Implement smart call routing — Route simple requests to FAQ handler
Use Deepgram's tiered pricing — Commit to volume for 20-40% discounts

Testing & Monitoring

Load Testing Script

// test/load-test.js
import { VoiceTestClient } from './test-client';

const CONCURRENT_CALLS = 10;
const TEST_DURATION_MINUTES = 5;

async function runLoadTest() {
  const results = [];
  
  for (let i = 0; i < CONCURRENT_CALLS; i++) {
    const call = new VoiceTestClient({
      scenario: 'appointment_booking',
      audioSample: './test-audio/sample-conversation.wav'
    });
    
    const result = await call.run();
    results.push({
      latency: result.avgLatency,
      success: result.bookingCompleted,
      errors: result.errors,
      duration: result.callDuration
    });
  }
  
  console.log('Load Test Results:');
  console.log(`Success Rate: ${results.filter(r => r.success).length / results.length * 100}%`);
  console.log(`Avg Latency: ${results.reduce((a, r) => a + r.latency, 0) / results.length}ms`);
}

Key Metrics to Track

const METRICS = {
  // Latency
  'stt.latency': 'Time from speech end to transcript',
  'llm.latency': 'Time from transcript to response text',
  'tts.latency': 'Time from response text to audio playback',
  'e2e.latency': 'Total round-trip time',
  
  // Quality
  'stt.word_error_rate': 'Transcription accuracy',
  'conversation.completion_rate': 'Successful call completions',
  'transfer.rate': 'Percentage transferred to human',
  'user.interruption_rate': 'How often users interrupt AI',
  
  // Business
  'booking.conversion': 'Appointment bookings / total calls',
  'lead.qualification_rate': 'Qualified leads / total leads',
  'call.duration': 'Average call duration'
};

Cross-References

revenue-website — Integrate voice agents into high-converting landing pages
lead-machine — Connect voice qualification to your lead pipeline
ai-company-ops — Deploy voice agents as part of your AI workforce

Quick Start Checklist

Time to first call: 2-4 hours with this skill.

ai-voice-agent

同仓库更多 Skills

同仓库更多 Skills

AI Voice Agent — Build Production Voice AI That Actually Works

When to Use

Architecture Overview

Stack Recommendations

Telephony Layer

Speech-to-Text (STT)

Text-to-Speech (TTS)

LLM (Conversation Brain)

Core Framework: The VOX Pattern

Step 1: Voice Layer (STT + Audio Streaming)

Step 2: Orchestration Layer (State Management)

Step 3: Execution Layer (LLM + Function Calling)

Step 4: TTS Output Streaming

Conversation Design Framework

The SPICE Method for Voice Scripts

Voice Persona Template

Conversation Flow Templates

Template 1: Appointment Booking

Template 2: Lead Qualification (Outbound)

Voice Selection Guide

Matching Voice to Brand

Creating Custom Voice Clones

Error Handling & Edge Cases

Common Failure Modes

Barge-In Handling

Deployment Patterns

Pattern 1: Serverless (Vercel/Netlify Functions)

Pattern 2: Dedicated Server (Docker)

Pattern 3: Kubernetes (Production Scale)

Pricing Calculator

Cost Per Minute Breakdown

Monthly Cost Projections

Cost Optimization Tips

Testing & Monitoring

Load Testing Script

Key Metrics to Track

Cross-References

Quick Start Checklist

AI Voice Agent — Build Production Voice AI That Actually Works

When to Use

Architecture Overview

Stack Recommendations

Telephony Layer

Speech-to-Text (STT)

Text-to-Speech (TTS)

LLM (Conversation Brain)

Core Framework: The VOX Pattern

Step 1: Voice Layer (STT + Audio Streaming)

Step 2: Orchestration Layer (State Management)

Step 3: Execution Layer (LLM + Function Calling)

Step 4: TTS Output Streaming

Conversation Design Framework

The SPICE Method for Voice Scripts

Voice Persona Template

Conversation Flow Templates

Template 1: Appointment Booking

Template 2: Lead Qualification (Outbound)

Voice Selection Guide

Matching Voice to Brand

Creating Custom Voice Clones

Error Handling & Edge Cases

Common Failure Modes

Barge-In Handling

Deployment Patterns

Pattern 1: Serverless (Vercel/Netlify Functions)

Pattern 2: Dedicated Server (Docker)

Pattern 3: Kubernetes (Production Scale)

Pricing Calculator

Cost Per Minute Breakdown

Monthly Cost Projections

Cost Optimization Tips

Testing & Monitoring

Load Testing Script

Key Metrics to Track

Cross-References

Quick Start Checklist