Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

gemini-tts-fast

Étoiles3

Forks1

Mis à jour14 février 2026 à 13:52

Convert text to speech with Google Gemini TTS API at fixed 1.2x playback speed and WAV output. Use when users need fast narration generation from English or Chinese text, with optional voice and output filename.

Installation

Installer avec Codex ou Claude Copiez ce prompt, collez-le dans Codex, Claude ou un autre assistant, puis laissez-le vérifier la page du skill et l'installer pour vous.

Exécuter dans Manus

Source

joyehuang

joyehuang/skills

Ouvrir le dépôt GitHub Voir les dépôts du créateur

Téléchargement

Exécuter dans Manus

Métiers associésSOC

Basé sur la classification professionnelle SOC

Développeurs de logicielsProfessions informatiques et mathématiques·SOC 15-1252

Explorateur de fichiers

4 fichiers

SKILL.md

readonly

name	gemini-tts-fast
description	Convert text to speech with Google Gemini TTS API at fixed 1.2x playback speed and WAV output. Use when users need fast narration generation from English or Chinese text, with optional voice and output filename.

Gemini Text-to-Speech Skill (1.2x Speed)

Convert text to speech using Google Gemini's TTS API with fixed 1.2x playback speed. Automatically generates standard WAV audio files optimized for video narration.

Features

Converts text to natural-sounding speech using Gemini 2.5 Pro TTS
Fixed 1.2x speed for more dynamic video narration
Outputs standard WAV format (24kHz, 16-bit, mono)
Supports multiple voice styles
Handles both English and Chinese text
Automatic format conversion using ffmpeg

Requirements

Python 3.x with google-genai package
ffmpeg (for audio format conversion)
GOOGLE_API_KEY environment variable (stored in .env file)

Usage

When the user requests text-to-speech conversion, the skill will automatically apply 1.2x speed:

Load environment: Source the .env file to get the API key
Parse arguments:
- Text to convert (required) - supports multiple languages
- --output=filename.wav (optional, default: output.wav)
- --voice=VoiceName (optional, default: Puck)
- Speed is automatically set to 1.2x (no need to specify)
Generate audio: Run the script to create the WAV file at 1.2x speed
Confirm success: Report the output file location and size

Available Voices

Puck (default) - 中性、清晰 (Neutral, clear)
Charon - 深沉、权威 (Deep, authoritative)
Kore - 温暖、友好 (Warm, friendly)
Fenrir - 强劲、动感 (Strong, dynamic)
Aoede - 流畅、富有表现力 (Smooth, expressive)

Command Template

The skill now includes its own tts_cli.py script in the skill directory, making it fully portable:

set -a && source .env && set +a && \
source venv/bin/activate && \
python .claude/skills/gemini-tts-fast/tts_cli.py "<text>" --output="<filename>" --voice="<voice>" --speed=1.2

Note: Users need to:

Install Python dependencies: pip install google-genai
Create a .env file with GOOGLE_API_KEY=your-key
Install ffmpeg: brew install ffmpeg (macOS) or equivalent
Create a Python virtual environment: python -m venv venv

Error Handling

If GOOGLE_API_KEY is missing from .env, instruct user to add it
If ffmpeg is not installed, instruct user to install it (brew install ffmpeg)
If script fails, show the error message
If model is unavailable, suggest checking Gemini API status

Examples

Simple usage (English):

/gemini-tts-fast "Hello world"

→ Generates output.wav at 1.2x speed

With custom output:

/gemini-tts-fast "Welcome to our app" --output=welcome.wav

→ Generates welcome.wav at 1.2x speed

With custom voice:

/gemini-tts-fast "Thank you for listening" --output=thanks.wav --voice=Aoede

→ Generates thanks.wav at 1.2x speed with Aoede voice

Chinese text:

/gemini-tts-fast "你好世界" --output=hello_cn.wav --voice=Kore

→ Generates hello_cn.wav at 1.2x speed with Kore voice

Processing script.json:

/gemini-tts-fast @script.json

→ Automatically processes all narration scenes from script.json at 1.2x speed

Technical Details

Input: Raw PCM data from Gemini API
Processing: Converts to WAV and applies 1.2x speed using ffmpeg atempo filter
Output format: RIFF WAVE, 24000 Hz, mono, 16-bit PCM
Playback speed: Fixed at 1.2x (shortens duration by ~17%)
Temporary files are automatically cleaned up

Why 1.2x Speed?

1.2x speed is optimal for video narration because:

Maintains natural speech clarity
Keeps the content engaging and dynamic
Reduces video length without sounding rushed
Standard practice for professional video voiceovers

Plus depuis ce dépôt

même dépôt

curated-content-writer

joyehuang/skills

Turn paper links, blog posts, articles, or any worthy external content into curated entries for the Personal Blog curated section. Use this whenever the user sends a link and says 精选/curated/收录/加到这个模块/收录到精选/加到精选, or when you discover notable external content worth surfacing on the homepage. Each curated entry is a concise recommendation with your own reading notes — not just a link dump.

2026-05-233

explore-site

joyehuang/skills

Explore a website that exposes a well-known agent manifest (RFC-8615 path) and answer questions or summarize content. Works on any site whose owner ships /.well-known/<name>-manifest.json with a structured tree, a public content endpoint dictionary, and natural-language instructions for agents. Use when the user asks "what's on <site>?", "summarize the latest post on <site>", "find resources about X on <site>", or wants you to act as a reader/agent on someone's personal site or content service.

2026-05-073

archive-card-writer

joyehuang/skills

Turn fragmented knowledge, rough notes, research snippets, interview takeaways, project learnings, paper excerpts, or partially formed ideas into structured archive cards for the Personal Blog archive system. Use this whenever the user asks to remember, 整理, 沉淀, 归档, 记到 archive, turn notes into cards, merge new knowledge into an existing archive note, or when they provide scattered points that should become a reusable archive card rather than a polished blog post. Also use this when new input should be matched against existing archive/blog content so similar notes are merged instead of duplicated.

2026-05-063

screenshot

joyehuang/skills

Use when the user explicitly asks for a desktop or system screenshot (full screen, specific app or window, or a pixel region), or when tool-specific capture capabilities are unavailable and an OS-level capture is needed.

2026-03-113

x-auto-engagement

joyehuang/skills

Automatically engage with X (Twitter) home timeline by liking and replying to relevant tweets. The agent reads a persona profile (X-PROFILE.md), fetches timeline via script, filters tweets by relevance, generates comments directly, and publishes via script. Designed to run as a scheduled cron job (3x daily). Use when the user asks to set up X auto-engagement, configure a commenter persona, run timeline engagement, or troubleshoot engagement failures.

2026-02-133

article-to-script

joyehuang/skills

Convert written articles into structured video scripts with timing, narration, and scene breakdowns

2026-02-133

name	gemini-tts-fast
description	Convert text to speech with Google Gemini TTS API at fixed 1.2x playback speed and WAV output. Use when users need fast narration generation from English or Chinese text, with optional voice and output filename.

Gemini Text-to-Speech Skill (1.2x Speed)

Convert text to speech using Google Gemini's TTS API with fixed 1.2x playback speed. Automatically generates standard WAV audio files optimized for video narration.

Features

Converts text to natural-sounding speech using Gemini 2.5 Pro TTS
Fixed 1.2x speed for more dynamic video narration
Outputs standard WAV format (24kHz, 16-bit, mono)
Supports multiple voice styles
Handles both English and Chinese text
Automatic format conversion using ffmpeg

Requirements

Python 3.x with google-genai package
ffmpeg (for audio format conversion)
GOOGLE_API_KEY environment variable (stored in .env file)

Usage

When the user requests text-to-speech conversion, the skill will automatically apply 1.2x speed:

Load environment: Source the .env file to get the API key
Parse arguments:
- Text to convert (required) - supports multiple languages
- --output=filename.wav (optional, default: output.wav)
- --voice=VoiceName (optional, default: Puck)
- Speed is automatically set to 1.2x (no need to specify)
Generate audio: Run the script to create the WAV file at 1.2x speed
Confirm success: Report the output file location and size

Available Voices

Puck (default) - 中性、清晰 (Neutral, clear)
Charon - 深沉、权威 (Deep, authoritative)
Kore - 温暖、友好 (Warm, friendly)
Fenrir - 强劲、动感 (Strong, dynamic)
Aoede - 流畅、富有表现力 (Smooth, expressive)

Command Template

The skill now includes its own tts_cli.py script in the skill directory, making it fully portable:

set -a && source .env && set +a && \
source venv/bin/activate && \
python .claude/skills/gemini-tts-fast/tts_cli.py "<text>" --output="<filename>" --voice="<voice>" --speed=1.2

Note: Users need to:

Install Python dependencies: pip install google-genai
Create a .env file with GOOGLE_API_KEY=your-key
Install ffmpeg: brew install ffmpeg (macOS) or equivalent
Create a Python virtual environment: python -m venv venv

Error Handling

If GOOGLE_API_KEY is missing from .env, instruct user to add it
If ffmpeg is not installed, instruct user to install it (brew install ffmpeg)
If script fails, show the error message
If model is unavailable, suggest checking Gemini API status

Examples

Simple usage (English):

/gemini-tts-fast "Hello world"

→ Generates output.wav at 1.2x speed

With custom output:

/gemini-tts-fast "Welcome to our app" --output=welcome.wav

→ Generates welcome.wav at 1.2x speed

With custom voice:

/gemini-tts-fast "Thank you for listening" --output=thanks.wav --voice=Aoede

→ Generates thanks.wav at 1.2x speed with Aoede voice

Chinese text:

/gemini-tts-fast "你好世界" --output=hello_cn.wav --voice=Kore

→ Generates hello_cn.wav at 1.2x speed with Kore voice

Processing script.json:

/gemini-tts-fast @script.json

→ Automatically processes all narration scenes from script.json at 1.2x speed

Technical Details

Input: Raw PCM data from Gemini API
Processing: Converts to WAV and applies 1.2x speed using ffmpeg atempo filter
Output format: RIFF WAVE, 24000 Hz, mono, 16-bit PCM
Playback speed: Fixed at 1.2x (shortens duration by ~17%)
Temporary files are automatically cleaned up

Why 1.2x Speed?

1.2x speed is optimal for video narration because:

Maintains natural speech clarity
Keeps the content engaging and dynamic
Reduces video length without sounding rushed
Standard practice for professional video voiceovers