Run any Skill in Manus with one click

f5-tts

Stars0

Forks0

UpdatedMarch 14, 2026 at 16:37

Use the local F5TTS-FASTAPI service for voice-cloned text-to-speech, voice discovery, and direct synthesis. Trigger when working with the user's local F5 TTS Docker service, listing available voice profiles, validating health or auth, generating speech in a specific cloned voice, or using the F5 API independently of Hermes' built-in text_to_speech provider.

Installation

Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.

Run Skill in Manus

Source

helix4u

helix4u/hermes-agent-private

View GitHub Repository View Creator Repositories

Download

Run Skill in Manus

Related occupationsSOC

Based on SOC occupation classification

Software DevelopersComputer and Mathematical Occupations·SOC 15-1252

File Explorer

2 files

SKILL.md

readonly

name	f5-tts
description	Use the local F5TTS-FASTAPI service for voice-cloned text-to-speech, voice discovery, and direct synthesis. Trigger when working with the user's local F5 TTS Docker service, listing available voice profiles, validating health or auth, generating speech in a specific cloned voice, or using the F5 API independently of Hermes' built-in text_to_speech provider.
metadata	{"hermes":{"tags":["TTS","F5","Voice Cloning","FastAPI","Audio"]}}

f5-tts

Use this skill when the task is specifically about the user's local F5 TTS server or when you need to choose a voice profile directly instead of relying on Hermes' default text_to_speech configuration.

Core workflow

Check the service first with GET /health.
Read the base URL from tts.f5.base_url in ~/.hermes/config.yaml when available. Default to http://localhost:8081.
Use the already-loaded environment variable F5TTS_SECRET_KEY when you need to authenticate directly.
Generate a short-lived HS256 bearer token from that env value.
Call GET /api/v1/voices/list to discover valid voice_profile names instead of guessing.
Call POST /api/v1/tts/synthesize with exact user text plus the selected voice_profile.
Save the returned WAV bytes to disk and return the file path or MEDIA: tag as needed.

Preferred behavior

Prefer the built-in Hermes text_to_speech tool when the user simply wants speech output and the configured default voice is fine.
Prefer direct F5 API use when the user wants to inspect voices, pick a specific cloned voice, debug the local TTS service, or bypass Hermes' default provider selection.
Preserve the user's text exactly unless they asked for rewriting.
For long text, keep the FastAPI request body within the service limit by chunking and stitching rather than relaxing validation.
Do not ask the user for the secret key if F5TTS_SECRET_KEY is already present in the environment.

Long text

The local F5 FastAPI endpoint accepts text up to 1000 characters per request.
For text longer than that, split on sentence boundaries when possible.
If a sentence still exceeds the limit, split on whitespace, then hard-split only as a last resort.
Synthesize each chunk separately and concatenate the WAV files in order.
For Telegram voice bubbles, convert the final WAV to OGG Opus after stitching.

Failure handling

If /health fails, report that the local F5 container is unavailable before trying anything else.
If auth fails, first check whether F5TTS_SECRET_KEY is already present in the environment and use it before asking the user for anything.
If auth still fails after using the env var, verify F5TTS_SECRET_KEY matches the FastAPI container SECRET_KEY.
If a requested voice is missing, list available voices from /api/v1/voices/list.
If synthesis fails on long text, retry with shorter chunks instead of sending a larger single request.

Reference file

Load references/api.md when you need exact endpoint shapes, token generation snippets, or direct request examples.

More from this repository

same repository

polymarket

helix4u/hermes-agent-private

Query Polymarket prediction market data — search markets, get prices, orderbooks, and price history. Read-only via public REST APIs, no API key needed.

2026-03-150

axolotl

helix4u/hermes-agent-private

Expert guidance for fine-tuning LLMs with Axolotl - YAML configs, 100+ models, LoRA/QLoRA, DPO/KTO/ORPO/GRPO, multimodal support

2026-03-150

google-workspace

helix4u/hermes-agent-private

Gmail, Calendar, Drive, Contacts, Sheets, and Docs integration via Python. Uses OAuth2 with automatic token refresh. No external binaries needed — runs entirely with Google's Python client libraries in the Hermes venv.

2026-03-150

obs-scene-capture

helix4u/hermes-agent-private

Use OBS as a generic screenshot API. List scenes, inspect scene sources, and capture still images from scenes or sources via obs-websocket.

2026-03-130

bird

helix4u/hermes-agent-private

X/Twitter CLI for reading, searching, timelines, bookmarks, and posting via cookie auth.

2026-03-130

gemini

helix4u/hermes-agent-private

Gemini CLI for one-shot prompts, summaries, structured output, and extension-aware calls.

2026-03-130

name	f5-tts
description	Use the local F5TTS-FASTAPI service for voice-cloned text-to-speech, voice discovery, and direct synthesis. Trigger when working with the user's local F5 TTS Docker service, listing available voice profiles, validating health or auth, generating speech in a specific cloned voice, or using the F5 API independently of Hermes' built-in text_to_speech provider.
metadata	{"hermes":{"tags":["TTS","F5","Voice Cloning","FastAPI","Audio"]}}

f5-tts

Core workflow

Check the service first with GET /health.
Read the base URL from tts.f5.base_url in ~/.hermes/config.yaml when available. Default to http://localhost:8081.
Use the already-loaded environment variable F5TTS_SECRET_KEY when you need to authenticate directly.
Generate a short-lived HS256 bearer token from that env value.
Call GET /api/v1/voices/list to discover valid voice_profile names instead of guessing.
Call POST /api/v1/tts/synthesize with exact user text plus the selected voice_profile.
Save the returned WAV bytes to disk and return the file path or MEDIA: tag as needed.

Preferred behavior

Prefer the built-in Hermes text_to_speech tool when the user simply wants speech output and the configured default voice is fine.
Prefer direct F5 API use when the user wants to inspect voices, pick a specific cloned voice, debug the local TTS service, or bypass Hermes' default provider selection.
Preserve the user's text exactly unless they asked for rewriting.
For long text, keep the FastAPI request body within the service limit by chunking and stitching rather than relaxing validation.
Do not ask the user for the secret key if F5TTS_SECRET_KEY is already present in the environment.

Long text

The local F5 FastAPI endpoint accepts text up to 1000 characters per request.
For text longer than that, split on sentence boundaries when possible.
If a sentence still exceeds the limit, split on whitespace, then hard-split only as a last resort.
Synthesize each chunk separately and concatenate the WAV files in order.
For Telegram voice bubbles, convert the final WAV to OGG Opus after stitching.

Failure handling

If /health fails, report that the local F5 container is unavailable before trying anything else.
If auth fails, first check whether F5TTS_SECRET_KEY is already present in the environment and use it before asking the user for anything.
If auth still fails after using the env var, verify F5TTS_SECRET_KEY matches the FastAPI container SECRET_KEY.
If a requested voice is missing, list available voices from /api/v1/voices/list.
If synthesis fails on long text, retry with shorter chunks instead of sending a larger single request.

Reference file

Load references/api.md when you need exact endpoint shapes, token generation snippets, or direct request examples.