تشغيل أي مهارة في Manus بنقرة واحدة

ابدأ الآن

hermes-voice-call

النجوم٠

التفرعات٠

آخر تحديث٢٩ أبريل ٢٠٢٦ في ١٦:٢٧

Real-time voice calling with Hermes Agent via Pipecat + Daily.co WebRTC on VPS

التثبيت

التثبيت باستخدام Codex أو Claude انسخ هذا Prompt والصقه في Codex أو Claude أو مساعد آخر ليراجع صفحة Skill ويثبّتها لك.

تشغيل في Manus

المصدر

hermesmarcoai-ai

hermesmarcoai-ai/hermes-skills

فتح مستودع GitHub عرض مستودعات المنشئ

تنزيل

تشغيل في Manus

المهن ذات الصلةSOC

استنادا إلى تصنيف SOC المهني

مطوّرو البرمجياتمهن الحاسوب والرياضيات·SOC 15-1252

SKILL.md

readonly

name	hermes-voice-call
description	Real-time voice calling with Hermes Agent via Pipecat + Daily.co WebRTC on VPS

Hermes Voice Call Setup

Enable real-time voice conversation (phone-call style) with Hermes from iPhone/Android/browser.

Architecture

iPhone → Daily.co (WebRTC) → Pipecat (VPS) → Whisper STT → MiniMax M2.7 (OpenRouter) → MiniMax TTS → Daily.co → iPhone

Latenza stimata: 3-5 secondi end-to-end. Non è conversazione simultanea reale — è voice-in, voice-out in sequenza.

Current Status (2026-04-29)

Pipecat 1.1.0 ✅ installed on VPS
DailyTransport ✅ working
MiniMaxHttpTTSService ✅ available
OpenRouterLLMService ✅ available
WhisperSTTService ✅ available
SileroVADAnalyzer ✅ installed
Bot code: /home/hermes/voice-bot/bot.py ✅ written
BLOCKED: Missing OPENAI_API_KEY for Whisper STT (key not found anywhere)
BLOCKED: MINIMAX_API_KEY not on VPS (only OPENROUTER_API_KEY present)

VPS Setup — Full Steps

1. Install Dependencies

ssh vps
pip3 install --break-system-packages \
  pipecat-ai \
  pipecat-ai[daily] \
  aiohttp fastapi uvicorn pydantic-settings python-dotenv silero-vad

Note: Use --break-system-packages on Ubuntu 24.04 Python 3.12 (no root apt).

2. Create Bot Directory

ssh vps "mkdir -p /home/hermes/voice-bot"

3. Bot Code (Pipecat v1.1.0 — CORRECTED API)

The Pipecat v1.1.0 API differs significantly from older versions. Correct class names and imports:

#!/usr/bin/env python3
import asyncio, os, sys, aiohttp
from loguru import logger

from pipecat.transports.daily.transport import DailyTransport, DailyParams
from pipecat.services.openrouter.llm import OpenRouterLLMService
from pipecat.services.whisper.stt import WhisperSTTService
from pipecat.services.minimax.tts import MiniMaxHttpTTSService
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair, LLMUserAggregatorParams,
)
from pipecat.runner.daily import configure
from pipecat.frames.frames import LLMRunFrame

DAILY_API_KEY = os.environ["DAILY_API_KEY"]

async def main():
    async with aiohttp.ClientSession() as session:
        (room_url, token) = await configure(session, api_key=DAILY_API_KEY, room_exp_duration=24.0)

        transport = DailyTransport(
            room_url, token, "Hermes",
            DailyParams(audio_in_enabled=True, audio_out_enabled=True, transcription_enabled=True),
        )

        # Pipeline: user audio → STT → LLM → TTS → bot audio
        # Correct v1.1.0 pipeline order:
        pipeline = Pipeline([
            transport.input(),          # Raw user audio
            WhisperSTTService(api_key=os.environ["OPENAI_API_KEY"]),
            LLMContextAggregatorPair(context=LLMContext(), user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()))[0],  # user agg
            OpenRouterLLMService(api_key=os.environ["OPENROUTER_API_KEY"], settings=OpenRouterLLMService.Settings(model="minimax/minimax-2026-04-15")),
            MiniMaxHttpTTSService(api_key=os.environ["MINIMAX_API_KEY"], settings=MiniMaxHttpTTSService.Settings(model="speech-02-turbo", voice="male-qn-qingse")),
            transport.output(),         # Bot audio out
            LLMContextAggregatorPair(context=LLMContext(), user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()))[1],  # assistant agg
        ])

        # ... rest of bot setup

4. VPS Environment Variables

VPS needs 3 keys. Check what's available:

# On VPS (hermes user):
python3 -c "import json; d=json.load(open('/home/hermes/.hermes/auth.json')); pool=d.get('credential_pool',{}); [(print(p, c['label'], len(c.get('access_token','')))) for p,v in pool.items() for c in v if isinstance(v,list)]"

Currently: OPENROUTER_API_KEY ✅ on VPS. MINIMAX_API_KEY and OPENAI_API_KEY ❌ missing.

To add missing keys, either:

Copy from Surface ~/.hermes/auth.json (credential_pool structure)
Or set as env vars before running bot

5. Run the Bot

ssh vps
cd /home/hermes/voice-bot
OPENAI_API_KEY=sk-pro... MINIMAX_API_KEY=sk-cp-aOTN... OPENROUTER_API_KEY=sk-or-v1-... DAILY_API_KEY=pk_... \
  PATH=$HOME/.local/bin:$PATH PYTHONPATH=$HOME/.local/lib/python3.12/site-packages:$PYTHONPATH \
  python3 bot.py

6. Access from iPhone

Open Safari on iPhone
Navigate to the room URL (printed on bot start)
Allow microphone access
Speak — Hermes responds with voice

Key Pipecat v1.1.0 Discoveries

Item	Old/Wrong	Correct (v1.1.0)
Daily transport class	`DailyTransportClient`	`DailyTransport`
Daily init signature	`(room_url, api_key, ...)`	`(room_url, token, bot_name, params)`
Room URL + token	Manual API call	`configure()` from `pipecat.runner.daily`
MiniMax TTS class	`MiniMaxTTSService`	`MiniMaxHttpTTSService`
MiniMax default model	`speech-02-hd`	`speech-02-turbo`
Pipeline user agg	`user_aggregator` separate	`LLMContextAggregatorPair(...)[0]`
Pipeline bot agg	`assistant_aggregator` separate	`LLMContextAggregatorPair(...)[1]`
Transport input	`transport` directly	`transport.input()`
Transport output	`transport` directly	`transport.output()`

Critical Constraints

MiniMax speech-02-hd NOT in $40 plan — use speech-02-turbo or speech-02-hd requires upgrade
VPS Python 3.12 — requires --break-system-packages for pip
VPS sudo unavailable — can't apt-get install, use pip with --break-system-packages
Missing OPENAI_API_KEY — Marco must provide to enable Whisper STT
~3-5s latency — not simultaneous conversation, sequential voice interaction
Must run on VPS — Surface behind NAT, not reachable from iPhone

Daily.co API Key

Saved in Bitwarden as "Daily.co API Key" (token: pk_f96bf006-fde6-48c9-b7ff-f69cd7f1991f)

المزيد من هذا المستودع

نفس المستودع

hermes-gateway-troubleshooting

hermesmarcoai-ai/hermes-skills

Diagnose and fix Hermes messaging gateway connectivity issues (Telegram/Discord down, stale locks, PM2 problems)

2026-05-160

hermes-vps-migration

hermesmarcoai-ai/hermes-skills

Backup Hermes agent to GitHub and restore on a new VPS. Covers what to include/exclude, GitHub token requirements, and restore steps. Includes automated scripts.

2026-05-160

github-auth

hermesmarcoai-ai/hermes-skills

GitHub auth setup: HTTPS tokens, SSH keys, gh CLI login.

2026-05-160

github-repo-management

hermesmarcoai-ai/hermes-skills

Clone, create, fork, configure, and manage GitHub repositories. Manage remotes, secrets, releases, and workflows. Works with gh CLI or falls back to git + GitHub REST API via curl.

2026-05-160

youtube-content

hermesmarcoai-ai/hermes-skills

Fetch YouTube video transcripts and transform them into structured content (chapters, summaries, threads, blog posts). Use when the user shares a YouTube URL or video link, asks to summarize a video, requests a transcript, or wants to extract and reformat content from any YouTube video.

2026-05-160

linear

hermesmarcoai-ai/hermes-skills

Manage Linear issues, projects, and teams via the GraphQL API. Create, update, search, and organize issues. Uses API key auth (no OAuth needed). All operations via curl — no dependencies.

2026-05-160

name	hermes-voice-call
description	Real-time voice calling with Hermes Agent via Pipecat + Daily.co WebRTC on VPS

Hermes Voice Call Setup

Enable real-time voice conversation (phone-call style) with Hermes from iPhone/Android/browser.

Architecture

iPhone → Daily.co (WebRTC) → Pipecat (VPS) → Whisper STT → MiniMax M2.7 (OpenRouter) → MiniMax TTS → Daily.co → iPhone

Latenza stimata: 3-5 secondi end-to-end. Non è conversazione simultanea reale — è voice-in, voice-out in sequenza.

Current Status (2026-04-29)

Pipecat 1.1.0 ✅ installed on VPS
DailyTransport ✅ working
MiniMaxHttpTTSService ✅ available
OpenRouterLLMService ✅ available
WhisperSTTService ✅ available
SileroVADAnalyzer ✅ installed
Bot code: /home/hermes/voice-bot/bot.py ✅ written
BLOCKED: Missing OPENAI_API_KEY for Whisper STT (key not found anywhere)
BLOCKED: MINIMAX_API_KEY not on VPS (only OPENROUTER_API_KEY present)

VPS Setup — Full Steps

1. Install Dependencies

ssh vps
pip3 install --break-system-packages \
  pipecat-ai \
  pipecat-ai[daily] \
  aiohttp fastapi uvicorn pydantic-settings python-dotenv silero-vad

Note: Use --break-system-packages on Ubuntu 24.04 Python 3.12 (no root apt).

2. Create Bot Directory

ssh vps "mkdir -p /home/hermes/voice-bot"

3. Bot Code (Pipecat v1.1.0 — CORRECTED API)

The Pipecat v1.1.0 API differs significantly from older versions. Correct class names and imports:

#!/usr/bin/env python3
import asyncio, os, sys, aiohttp
from loguru import logger

from pipecat.transports.daily.transport import DailyTransport, DailyParams
from pipecat.services.openrouter.llm import OpenRouterLLMService
from pipecat.services.whisper.stt import WhisperSTTService
from pipecat.services.minimax.tts import MiniMaxHttpTTSService
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair, LLMUserAggregatorParams,
)
from pipecat.runner.daily import configure
from pipecat.frames.frames import LLMRunFrame

DAILY_API_KEY = os.environ["DAILY_API_KEY"]

async def main():
    async with aiohttp.ClientSession() as session:
        (room_url, token) = await configure(session, api_key=DAILY_API_KEY, room_exp_duration=24.0)

        transport = DailyTransport(
            room_url, token, "Hermes",
            DailyParams(audio_in_enabled=True, audio_out_enabled=True, transcription_enabled=True),
        )

        # Pipeline: user audio → STT → LLM → TTS → bot audio
        # Correct v1.1.0 pipeline order:
        pipeline = Pipeline([
            transport.input(),          # Raw user audio
            WhisperSTTService(api_key=os.environ["OPENAI_API_KEY"]),
            LLMContextAggregatorPair(context=LLMContext(), user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()))[0],  # user agg
            OpenRouterLLMService(api_key=os.environ["OPENROUTER_API_KEY"], settings=OpenRouterLLMService.Settings(model="minimax/minimax-2026-04-15")),
            MiniMaxHttpTTSService(api_key=os.environ["MINIMAX_API_KEY"], settings=MiniMaxHttpTTSService.Settings(model="speech-02-turbo", voice="male-qn-qingse")),
            transport.output(),         # Bot audio out
            LLMContextAggregatorPair(context=LLMContext(), user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()))[1],  # assistant agg
        ])

        # ... rest of bot setup

4. VPS Environment Variables

VPS needs 3 keys. Check what's available:

# On VPS (hermes user):
python3 -c "import json; d=json.load(open('/home/hermes/.hermes/auth.json')); pool=d.get('credential_pool',{}); [(print(p, c['label'], len(c.get('access_token','')))) for p,v in pool.items() for c in v if isinstance(v,list)]"

Currently: OPENROUTER_API_KEY ✅ on VPS. MINIMAX_API_KEY and OPENAI_API_KEY ❌ missing.

To add missing keys, either:

Copy from Surface ~/.hermes/auth.json (credential_pool structure)
Or set as env vars before running bot

5. Run the Bot

ssh vps
cd /home/hermes/voice-bot
OPENAI_API_KEY=sk-pro... MINIMAX_API_KEY=sk-cp-aOTN... OPENROUTER_API_KEY=sk-or-v1-... DAILY_API_KEY=pk_... \
  PATH=$HOME/.local/bin:$PATH PYTHONPATH=$HOME/.local/lib/python3.12/site-packages:$PYTHONPATH \
  python3 bot.py

6. Access from iPhone

Open Safari on iPhone
Navigate to the room URL (printed on bot start)
Allow microphone access
Speak — Hermes responds with voice

Key Pipecat v1.1.0 Discoveries

Item	Old/Wrong	Correct (v1.1.0)
Daily transport class	`DailyTransportClient`	`DailyTransport`
Daily init signature	`(room_url, api_key, ...)`	`(room_url, token, bot_name, params)`
Room URL + token	Manual API call	`configure()` from `pipecat.runner.daily`
MiniMax TTS class	`MiniMaxTTSService`	`MiniMaxHttpTTSService`
MiniMax default model	`speech-02-hd`	`speech-02-turbo`
Pipeline user agg	`user_aggregator` separate	`LLMContextAggregatorPair(...)[0]`
Pipeline bot agg	`assistant_aggregator` separate	`LLMContextAggregatorPair(...)[1]`
Transport input	`transport` directly	`transport.input()`
Transport output	`transport` directly	`transport.output()`

Critical Constraints

MiniMax speech-02-hd NOT in $40 plan — use speech-02-turbo or speech-02-hd requires upgrade
VPS Python 3.12 — requires --break-system-packages for pip
VPS sudo unavailable — can't apt-get install, use pip with --break-system-packages
Missing OPENAI_API_KEY — Marco must provide to enable Whisper STT
~3-5s latency — not simultaneous conversation, sequential voice interaction
Must run on VPS — Surface behind NAT, not reachable from iPhone

Daily.co API Key

Saved in Bitwarden as "Daily.co API Key" (token: pk_f96bf006-fde6-48c9-b7ff-f69cd7f1991f)