| name | openclaw-voice-avatar |
| description | Give your OpenClaw agent a face and voice. Build real-time voice and video avatar agents using LiveKit, LemonSlice, ElevenLabs, and Deepgram. Covers agent setup, LLM routing through OpenClaw Gateway, avatar integration, and STT/TTS configuration. |
OpenClaw Voice Avatar
Give your OpenClaw agent a face and voice. Users speak into their microphone, the agent thinks via OpenClaw Gateway, and responds with synthesized speech and a live animated avatar.
How it works
User speaks → LiveKit room → Deepgram STT → OpenClaw Gateway (LLM)
↓
User sees ← LemonSlice avatar ← ElevenLabs TTS ← Agent response
- LiveKit Cloud hosts the WebRTC room — handles audio/video transport
- Deepgram Nova-3 transcribes user speech to text (streaming STT)
- OpenClaw Gateway routes the text to your agent for reasoning
- ElevenLabs Flash v2.5 synthesizes the agent's response to speech
- LemonSlice generates real-time lip-synced avatar video from the audio
Stack
| Component | Provider | Purpose |
|---|
| Transport | LiveKit Cloud | WebRTC rooms, audio/video routing |
| STT | Deepgram Nova-3 | Speech-to-text (fastest streaming, <300ms) |
| LLM | OpenClaw Gateway | Routes to your OpenClaw agent |
| TTS | ElevenLabs Flash v2.5 | Text-to-speech (lowest latency) |
| Avatar | LemonSlice | Real-time lip-synced video |
| Framework | LiveKit Agents SDK | Python agent orchestration |
Quick Start
Prerequisites
Installation
mkdir my-voice-agent && cd my-voice-agent
uv init
Add dependencies to pyproject.toml:
[project]
name = "my-voice-agent"
version = "0.1.0"
requires-python = ">=3.10,<3.13"
dependencies = [
"livekit-agents[elevenlabs]~=1.3",
"livekit-plugins-lemonslice~=1.3",
"livekit-plugins-noise-cancellation>=0.2.5",
"livekit-plugins-openai~=1.3",
"python-dotenv>=1.2.1",
]
uv sync
Environment Variables
Create a .env file:
# LiveKit Cloud (from cloud.livekit.io → Settings → Keys)
LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your_api_key
LIVEKIT_API_SECRET=your_api_secret
# LemonSlice (from lemonslice.com)
LEMONSLICE_API_KEY=your_lemonslice_key
# ElevenLabs (from elevenlabs.io)
ELEVEN_API_KEY=your_elevenlabs_key
# OpenClaw Gateway
OPENCLAW_GATEWAY_URL=https://your-openclaw-gateway.example.com
OPENCLAW_TOKEN=your_openclaw_token
Run & Test
uv run python agent.py dev
Open agents-playground.livekit.io, select your LiveKit project, and click Connect. You'll see the avatar and can start talking to your agent immediately — no frontend code needed.
Web Frontend
Once the agent works in the playground, you can build your own website so anyone can talk to it.
Setup
npx create-next-app@latest web --typescript --tailwind --app
cd web
npm install @livekit/components-react @livekit/components-styles livekit-client livekit-server-sdk
Add to web/.env.local (same LiveKit credentials as the agent):
LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your_api_key
LIVEKIT_API_SECRET=your_api_secret
Token API (app/api/token/route.ts)
The browser needs a short-lived JWT to join a LiveKit room. This server-side route generates one — credentials stay on the server.
import { AccessToken } from "livekit-server-sdk";
import { NextResponse } from "next/server";
export async function POST() {
const apiKey = process.env.LIVEKIT_API_KEY;
const apiSecret = process.env.LIVEKIT_API_SECRET;
const wsUrl = process.env.LIVEKIT_URL;
if (!apiKey || !apiSecret || !wsUrl) {
return NextResponse.json({ error: "Server misconfigured" }, { status: 500 });
}
const roomName = `room-${crypto.randomUUID().slice(0, 8)}`;
const identity = `user-${crypto.randomUUID().slice(0, 8)}`;
const at = new AccessToken(apiKey, apiSecret, { identity, ttl: "10m" });
at.addGrant({ roomJoin: true, room: roomName, canPublish: true, canSubscribe: true });
return NextResponse.json({ token: await at.toJwt(), serverUrl: wsUrl });
}
Room Component (components/Room.tsx)
A client component that connects to the LiveKit room, renders the avatar video, and plays agent audio.
"use client";
import { useCallback, useState } from "react";
import {
DisconnectButton, LiveKitRoom, RoomAudioRenderer,
TrackToggle, VideoTrack, useVoiceAssistant,
} from "@livekit/components-react";
import { Track } from "livekit-client";
import "@livekit/components-styles";
function AgentView() {
const { state, videoTrack } = useVoiceAssistant();
return (
<div style={{ width: 480, height: 480, background: "#111", borderRadius: 12, overflow: "hidden", display: "flex", alignItems: "center", justifyContent: "center" }}>
{videoTrack
? <VideoTrack trackRef={videoTrack} style={{ width: "100%", height: "100%", objectFit: "contain" }} />
: <p style={{ color: "#888" }}>{state === "connecting" ? "Connecting..." : "Waiting for avatar..."}</p>}
</div>
);
}
export default function Room() {
const [conn, setConn] = useState<{ token: string; serverUrl: string } | null>(null);
const connect = useCallback(async () => {
const res = await fetch("/api/token", { method: "POST" });
setConn(await res.json());
}, []);
if (!conn) return <button onClick={connect}>Talk to Agent</button>;
return (
<LiveKitRoom token={conn.token} serverUrl={conn.serverUrl} connect audio video={false} onDisconnected={() => setConn(null)}>
<AgentView />
<TrackToggle source={Track.Source.Microphone} />
<DisconnectButton onClick={() => setConn(null)}>Disconnect</DisconnectButton>
<RoomAudioRenderer />
</LiveKitRoom>
);
}
Page (app/page.tsx)
import Room from "@/components/Room";
export default function Home() {
return (
<main style={{ display: "flex", height: "100vh", alignItems: "center", justifyContent: "center", background: "#000" }}>
<Room />
</main>
);
}
How it works
- User clicks Talk to Agent → browser fetches a JWT from
/api/token
<LiveKitRoom> connects to LiveKit Cloud with the JWT and requests microphone access
- LiveKit Cloud dispatches your Python agent to the room
useVoiceAssistant() picks up the agent's avatar video track → <VideoTrack> renders it
<RoomAudioRenderer> plays the agent's TTS audio through the speakers
Run both the agent and frontend:
uv run python agent.py dev
cd web && npm run dev
Deploy the frontend to Vercel (or any Next.js host) with the same LIVEKIT_URL, LIVEKIT_API_KEY, and LIVEKIT_API_SECRET env vars.
Agent Architecture
Agent Subclass
Define your agent's personality via the instructions parameter:
from livekit.agents import Agent
class MyAgent(Agent):
def __init__(self) -> None:
super().__init__(instructions="You are a helpful assistant.")
LLM via OpenClaw Gateway
OpenClaw Gateway exposes an OpenAI-compatible /v1/chat/completions endpoint. Use the openai.LLM plugin to connect:
import os
import httpx
from livekit.plugins import openai
openclaw_llm = openai.LLM(
model="openclaw:main",
base_url=os.environ.get("OPENCLAW_GATEWAY_URL") + "/v1",
api_key=os.environ.get("OPENCLAW_TOKEN", ""),
extra_headers={"x-openclaw-agent-id": "main"},
user="my-agent-session",
timeout=httpx.Timeout(connect=10, read=120, write=10, pool=10),
)
model — "openclaw:main" routes to your primary OpenClaw agent
extra_headers — x-openclaw-agent-id selects which agent handles the request
user — session identifier for conversation persistence across turns
timeout — generous read timeout (120s) since agent reasoning can take time
STT (Speech-to-Text)
stt="deepgram/nova-3"
Deepgram Nova-3 is the fastest streaming STT (<300ms). Auto-configured by LiveKit — no separate API key needed.
TTS (Text-to-Speech)
from livekit.plugins import elevenlabs
tts=elevenlabs.TTS(
voice_id="your_voice_id",
model="eleven_flash_v2_5",
)
ElevenLabs Flash v2.5 is the lowest-latency model. Find voice IDs at elevenlabs.io/voices.
Avatar
from livekit.plugins import lemonslice
avatar = lemonslice.AvatarSession(
agent_image_url="https://your-image-url.png",
agent_prompt="Description of how the avatar should look and behave.",
)
await avatar.start(session, room=ctx.room)
Order matters: call avatar.start() before session.start().
Greeting
session.say("Hello! How can I help you today?")
Call after session.start(). Queues speech without waiting for user input.
Reference: reference.md (full API) · troubleshooting.md (common issues) · examples/agent.py (complete example)