Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

openclaw-voice-avatar

Sterne9

Forks2

Aktualisiert21. Februar 2026 um 00:41

Give your OpenClaw agent a face and voice. Build real-time voice and video avatar agents using LiveKit, LemonSlice, ElevenLabs, and Deepgram. Covers agent setup, LLM routing through OpenClaw Gateway, avatar integration, and STT/TTS configuration.

Installation

Mit Codex oder Claude installieren Kopieren Sie diesen Prompt, fügen Sie ihn in Codex, Claude oder einen anderen Assistant ein und lassen Sie die Skill-Seite prüfen und installieren.

In Manus ausführen

Quelle

openserv-labs

openserv-labs/openclaw-voice-avatar

GitHub-Repository öffnen Creator-Repositorys ansehen

Download

In Manus ausführen

Verwandte BerufeSOC

Basierend auf der SOC-Berufsklassifikation

SoftwareentwicklerInformatik- und Mathematikberufe·SOC 15-1252

Datei-Explorer

5 Dateien

SKILL.md

readonly

name	openclaw-voice-avatar
description	Give your OpenClaw agent a face and voice. Build real-time voice and video avatar agents using LiveKit, LemonSlice, ElevenLabs, and Deepgram. Covers agent setup, LLM routing through OpenClaw Gateway, avatar integration, and STT/TTS configuration.

OpenClaw Voice Avatar

Give your OpenClaw agent a face and voice. Users speak into their microphone, the agent thinks via OpenClaw Gateway, and responds with synthesized speech and a live animated avatar.

How it works

User speaks → LiveKit room → Deepgram STT → OpenClaw Gateway (LLM)
                                                    ↓
User sees ← LemonSlice avatar ← ElevenLabs TTS ← Agent response

LiveKit Cloud hosts the WebRTC room — handles audio/video transport
Deepgram Nova-3 transcribes user speech to text (streaming STT)
OpenClaw Gateway routes the text to your agent for reasoning
ElevenLabs Flash v2.5 synthesizes the agent's response to speech
LemonSlice generates real-time lip-synced avatar video from the audio

Stack

Component	Provider	Purpose
Transport	LiveKit Cloud	WebRTC rooms, audio/video routing
STT	Deepgram Nova-3	Speech-to-text (fastest streaming, <300ms)
LLM	OpenClaw Gateway	Routes to your OpenClaw agent
TTS	ElevenLabs Flash v2.5	Text-to-speech (lowest latency)
Avatar	LemonSlice	Real-time lip-synced video
Framework	LiveKit Agents SDK	Python agent orchestration

Quick Start

Prerequisites

Python 3.10–3.12
uv package manager
LiveKit Cloud account (cloud.livekit.io)
LemonSlice account (starter subscription or above)
ElevenLabs account
OpenClaw Gateway credentials

Installation

mkdir my-voice-agent && cd my-voice-agent
uv init

Add dependencies to pyproject.toml:

[project]
name = "my-voice-agent"
version = "0.1.0"
requires-python = ">=3.10,<3.13"
dependencies = [
    "livekit-agents[elevenlabs]~=1.3",
    "livekit-plugins-lemonslice~=1.3",
    "livekit-plugins-noise-cancellation>=0.2.5",
    "livekit-plugins-openai~=1.3",
    "python-dotenv>=1.2.1",
]

uv sync

Environment Variables

Create a .env file:

# LiveKit Cloud (from cloud.livekit.io → Settings → Keys)
LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your_api_key
LIVEKIT_API_SECRET=your_api_secret

# LemonSlice (from lemonslice.com)
LEMONSLICE_API_KEY=your_lemonslice_key

# ElevenLabs (from elevenlabs.io)
ELEVEN_API_KEY=your_elevenlabs_key

# OpenClaw Gateway
OPENCLAW_GATEWAY_URL=https://your-openclaw-gateway.example.com
OPENCLAW_TOKEN=your_openclaw_token

Run & Test

uv run python agent.py dev

Open agents-playground.livekit.io, select your LiveKit project, and click Connect. You'll see the avatar and can start talking to your agent immediately — no frontend code needed.

Web Frontend

Once the agent works in the playground, you can build your own website so anyone can talk to it.

Setup

npx create-next-app@latest web --typescript --tailwind --app
cd web
npm install @livekit/components-react @livekit/components-styles livekit-client livekit-server-sdk

Add to web/.env.local (same LiveKit credentials as the agent):

LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your_api_key
LIVEKIT_API_SECRET=your_api_secret

Token API (`app/api/token/route.ts`)

The browser needs a short-lived JWT to join a LiveKit room. This server-side route generates one — credentials stay on the server.

import { AccessToken } from "livekit-server-sdk";
import { NextResponse } from "next/server";

export async function POST() {
  const apiKey = process.env.LIVEKIT_API_KEY;
  const apiSecret = process.env.LIVEKIT_API_SECRET;
  const wsUrl = process.env.LIVEKIT_URL;

  if (!apiKey || !apiSecret || !wsUrl) {
    return NextResponse.json({ error: "Server misconfigured" }, { status: 500 });
  }

  const roomName = `room-${crypto.randomUUID().slice(0, 8)}`;
  const identity = `user-${crypto.randomUUID().slice(0, 8)}`;

  const at = new AccessToken(apiKey, apiSecret, { identity, ttl: "10m" });
  at.addGrant({ roomJoin: true, room: roomName, canPublish: true, canSubscribe: true });

  return NextResponse.json({ token: await at.toJwt(), serverUrl: wsUrl });
}

Room Component (`components/Room.tsx`)

A client component that connects to the LiveKit room, renders the avatar video, and plays agent audio.

"use client";

import { useCallback, useState } from "react";
import {
  DisconnectButton, LiveKitRoom, RoomAudioRenderer,
  TrackToggle, VideoTrack, useVoiceAssistant,
} from "@livekit/components-react";
import { Track } from "livekit-client";
import "@livekit/components-styles";

function AgentView() {
  const { state, videoTrack } = useVoiceAssistant();
  return (
    <div style={{ width: 480, height: 480, background: "#111", borderRadius: 12, overflow: "hidden", display: "flex", alignItems: "center", justifyContent: "center" }}>
      {videoTrack
        ? <VideoTrack trackRef={videoTrack} style={{ width: "100%", height: "100%", objectFit: "contain" }} />
        : <p style={{ color: "#888" }}>{state === "connecting" ? "Connecting..." : "Waiting for avatar..."}</p>}
    </div>
  );
}

export default function Room() {
  const [conn, setConn] = useState<{ token: string; serverUrl: string } | null>(null);

  const connect = useCallback(async () => {
    const res = await fetch("/api/token", { method: "POST" });
    setConn(await res.json());
  }, []);

  if (!conn) return <button onClick={connect}>Talk to Agent</button>;

  return (
    <LiveKitRoom token={conn.token} serverUrl={conn.serverUrl} connect audio video={false} onDisconnected={() => setConn(null)}>
      <AgentView />
      <TrackToggle source={Track.Source.Microphone} />
      <DisconnectButton onClick={() => setConn(null)}>Disconnect</DisconnectButton>
      <RoomAudioRenderer />
    </LiveKitRoom>
  );
}

Page (`app/page.tsx`)

import Room from "@/components/Room";

export default function Home() {
  return (
    <main style={{ display: "flex", height: "100vh", alignItems: "center", justifyContent: "center", background: "#000" }}>
      <Room />
    </main>
  );
}

How it works

User clicks Talk to Agent → browser fetches a JWT from /api/token
<LiveKitRoom> connects to LiveKit Cloud with the JWT and requests microphone access
LiveKit Cloud dispatches your Python agent to the room
useVoiceAssistant() picks up the agent's avatar video track → <VideoTrack> renders it
<RoomAudioRenderer> plays the agent's TTS audio through the speakers

Run both the agent and frontend:

# Terminal 1 — agent
uv run python agent.py dev

# Terminal 2 — frontend
cd web && npm run dev

Deploy the frontend to Vercel (or any Next.js host) with the same LIVEKIT_URL, LIVEKIT_API_KEY, and LIVEKIT_API_SECRET env vars.

Agent Architecture

Agent Subclass

Define your agent's personality via the instructions parameter:

from livekit.agents import Agent

class MyAgent(Agent):
    def __init__(self) -> None:
        super().__init__(instructions="You are a helpful assistant.")

LLM via OpenClaw Gateway

OpenClaw Gateway exposes an OpenAI-compatible /v1/chat/completions endpoint. Use the openai.LLM plugin to connect:

import os
import httpx
from livekit.plugins import openai

openclaw_llm = openai.LLM(
    model="openclaw:main",
    base_url=os.environ.get("OPENCLAW_GATEWAY_URL") + "/v1",
    api_key=os.environ.get("OPENCLAW_TOKEN", ""),
    extra_headers={"x-openclaw-agent-id": "main"},
    user="my-agent-session",
    timeout=httpx.Timeout(connect=10, read=120, write=10, pool=10),
)

model — "openclaw:main" routes to your primary OpenClaw agent
extra_headers — x-openclaw-agent-id selects which agent handles the request
user — session identifier for conversation persistence across turns
timeout — generous read timeout (120s) since agent reasoning can take time

STT (Speech-to-Text)

stt="deepgram/nova-3"

Deepgram Nova-3 is the fastest streaming STT (<300ms). Auto-configured by LiveKit — no separate API key needed.

TTS (Text-to-Speech)

from livekit.plugins import elevenlabs

tts=elevenlabs.TTS(
    voice_id="your_voice_id",
    model="eleven_flash_v2_5",
)

ElevenLabs Flash v2.5 is the lowest-latency model. Find voice IDs at elevenlabs.io/voices.

Avatar

from livekit.plugins import lemonslice

avatar = lemonslice.AvatarSession(
    agent_image_url="https://your-image-url.png",
    agent_prompt="Description of how the avatar should look and behave.",
)

# Start avatar BEFORE starting the session
await avatar.start(session, room=ctx.room)

Order matters: call avatar.start() before session.start().

Greeting

session.say("Hello! How can I help you today?")

Call after session.start(). Queues speech without waiting for user input.

Reference: reference.md (full API) · troubleshooting.md (common issues) · examples/agent.py (complete example)

openclaw-voice-avatar

OpenClaw Voice Avatar

How it works

Stack

Quick Start

Prerequisites

Installation

Environment Variables

Run & Test

Web Frontend

Setup

Token API (app/api/token/route.ts)

Room Component (components/Room.tsx)

Page (app/page.tsx)

How it works

Agent Architecture

Agent Subclass

LLM via OpenClaw Gateway

STT (Speech-to-Text)

TTS (Text-to-Speech)

Avatar

Greeting

OpenClaw Voice Avatar

How it works

Stack

Quick Start

Prerequisites

Installation

Environment Variables

Run & Test

Web Frontend

Setup

Token API (app/api/token/route.ts)

Room Component (components/Room.tsx)

Page (app/page.tsx)

How it works

Agent Architecture

Agent Subclass

LLM via OpenClaw Gateway

STT (Speech-to-Text)

TTS (Text-to-Speech)

Avatar

Greeting

Token API (`app/api/token/route.ts`)

Room Component (`components/Room.tsx`)

Page (`app/page.tsx`)

Token API (`app/api/token/route.ts`)

Room Component (`components/Room.tsx`)

Page (`app/page.tsx`)