Run any Skill in Manus with one click

$pwd:

happy-audio-gen

Name: Happy Audio Gen
Author: iamzhihuix

// Universal AI voice / text-to-speech skill supporting OpenAI TTS (gpt-4o-mini-tts, tts-1), ElevenLabs multilingual TTS with voice cloning, Bailian Qwen TTS (qwen-tts / qwen3-tts-vd with voice-design custom voices, long-text chunking built in), MiniMax speech-02-hd, SiliconFlow CosyVoice / SenseVoice, and PlayHT 2.0. Use this skill whenever the user asks to read text aloud, synthesize speech, generate narration, create voice-over, dub a script, or turn any text into audio (mp3 / wav / ogg / flac). Typical phrases include "read this aloud", "generate voice for ...", "create a narration of ...", "tts this", "把这段念出来", "做个配音", "合成语音", or mentions of voices / TTS model names like Alloy, Ash, Cherry, Rachel, CosyVoice, PlayHT. Always use this skill even if the user does not specify a provider — pick one from EXTEND.md defaults or available env keys.

Run Skill in Manus

$ git log --oneline --stat

stars:295

forks:28

updated:April 17, 2026 at 11:48

File Explorer

18 files

SKILL.md

readonly

name	happy-audio-gen
description	Universal AI voice / text-to-speech skill supporting OpenAI TTS (gpt-4o-mini-tts, tts-1), ElevenLabs multilingual TTS with voice cloning, Bailian Qwen TTS (qwen-tts / qwen3-tts-vd with voice-design custom voices, long-text chunking built in), MiniMax speech-02-hd, SiliconFlow CosyVoice / SenseVoice, and PlayHT 2.0. Use this skill whenever the user asks to read text aloud, synthesize speech, generate narration, create voice-over, dub a script, or turn any text into audio (mp3 / wav / ogg / flac). Typical phrases include "read this aloud", "generate voice for ...", "create a narration of ...", "tts this", "把这段念出来", "做个配音", "合成语音", or mentions of voices / TTS model names like Alloy, Ash, Cherry, Rachel, CosyVoice, PlayHT. Always use this skill even if the user does not specify a provider — pick one from EXTEND.md defaults or available env keys.
version	0.1.0

happy-audio-gen

Turns text into speech across 6 providers through one CLI. All providers are synchronous (TTS is fast — typically under 10 seconds) except Bailian's voice-design flow (which is still covered but uses a longer poll window).

Quick usage

# Shortest path — OpenAI default voice
bun scripts/main.ts --text "Hello, world" --out ./hello.mp3

# Chinese, MiniMax
bun scripts/main.ts --provider minimax --text "大家好" --voice male-qn-qingse --out ./hello.mp3

# Long-form, Bailian (auto-splits by sentence)
bun scripts/main.ts --provider bailian --textfiles ./script.md --out ./narration.mp3

When to invoke this skill

User asks to synthesize speech / TTS / read aloud / narrate / dub / make a voice-over.
User asks to convert script / text / article into audio.
User names a TTS voice or model.

Do not route here when the user wants to transcribe audio → text (that's STT, different domain), or edit / mix audio files (use a dedicated audio editor).

Step 0: Preflight (BLOCKING)

Locate EXTEND.md:
- ./.happy-skills/happy-audio-gen/EXTEND.md
- $XDG_CONFIG_HOME/happy-skills/happy-audio-gen/EXTEND.md
- ~/.happy-skills/happy-audio-gen/EXTEND.md
If none found, run bun scripts/main.ts --setup and walk the user through references/config/first-time-setup.md.
Verify at least one provider has credentials (env var or 1Password reference).
Verify Bun is available. Fallback: npx -y bun.

Step 1: Choose provider

Preference order:

--provider <id>
EXTEND.md default_provider
Auto-detect env vars: openai > elevenlabs > bailian > minimax > siliconflow > playht

Pick by language / voice intent:

English, natural + fast → openai (gpt-4o-mini-tts / tts-1).
Multilingual, voice cloning → elevenlabs.
Chinese, long-form → bailian (qwen-tts auto-chunks long scripts) or minimax.
Chinese dialect / voice design → bailian (voice-design with qwen3-tts-vd) or siliconflow (CosyVoice2).
Ultra-realistic, short-form → playht (2.0).

Step 2: Fill parameters

--text or --textfiles: input. Always quote.
--out <path>: REQUIRED. Extension determines format (.mp3 / .wav / .ogg / .flac).
--voice <id>: provider-specific. See references/voices.md for the short list of well-known voices.
--rate 0.5..2.0: speaking rate.
--instruction "...": voice direction (only openai gpt-4o-mini-tts and siliconflow honor this).
--language <code>: en, zh, ja — only a few providers honor this explicitly.

Step 3: Run

bun scripts/main.ts \
  --provider openai \
  --model gpt-4o-mini-tts \
  --voice alloy \
  --text "..." \
  --out ./out.mp3

JSON mode:

{ "success": true, "provider": "openai", "model": "gpt-4o-mini-tts", "voice": "alloy", "output": "/abs/out.mp3", "size_bytes": 76032, "format": "mp3" }

Step 4: Long text handling

happy-audio-gen automatically splits long input for providers that cap per-call length (Bailian ≤ 200 Chinese chars per call). Chunks are concatenated byte-for-byte on output.
For best fidelity with concatenated MP3s, stitch the segments with ffmpeg afterward rather than relying on byte concat.

Step 5: Errors

[openai] OpenAI TTS 400 with invalid voice → the voice name is not supported by the model. Use one of alloy, ash, coral, echo, fable, onyx, nova, sage, shimmer.
[minimax] ... 2049 invalid api key → try MINIMAX_BASE_URL=https://api.minimaxi.com/v1 (different region).
[bailian] ... 400 DataInspectionFailed → Aliyun content filter. Surface to the user.
[elevenlabs] 401 → key invalid or subscription expired.

References

references/providers.md — per-provider env vars, default models, voice lists.
references/voices.md — curated voices for each provider.
references/error_codes.md — common errors and fixes.
references/config/first-time-setup.md
references/config/extend-schema.md
assets/EXTEND.template.md

related-skills.json

same repository

happy-app-audit.md

from "iamzhihuix/happy-claude-skills"

Audit a local macOS app's telemetry / reporting behavior using static analysis only. Reverse-engineers an .app bundle to identify embedded SDKs (AppLog/TEA, Parfait, TTNet, mars, MMKV, Sentry, Firebase, Bugly, Umeng, etc.), mapped upload endpoints, local on-disk queues, and privacy-relevant fields — without packet capture, network requests, debugger attach, or DRM bypass. Use when user asks to investigate, audit, or reverse-engineer a macOS app for telemetry, reporting, data upload, privacy, or SDK fingerprinting. Targets /Applications, ~/Applications, /Library/Input Methods, /Library/PrivilegedHelperTools, and similar local install paths.

2026-04-19295

mkfast-deploy.md

from "iamzhihuix/happy-claude-skills"

把 mkfast-template / TanStarter 系项目部署到 Cloudflare Workers —— 自动检测实际启用的组件（D1 / R2 / Email / Better Auth / Stripe / Creem / Beehiiv / 通知 / 分析），动态裁剪部署步骤。Use PROACTIVELY whenever 用户提到部署、上线、deploy 到 Cloudflare、wrangler、pnpm deploy、推到生产、发布站点、CF Workers 上线 —— 即使用户没明确说 "mkfast"，只要项目含 `wrangler.jsonc` + `@tanstack/react-start` 或 `drizzle-orm` / `better-auth` 任一，就**优先使用此 skill 而非通用 wrangler / cloudflare skill**（本 skill 更懂 mkfast 的占位符、logpush 默认值陷阱、enable=false 陷阱、tailoring 策略）。SKIP when 项目明显不是 mkfast-template 派生（无 wrangler.jsonc + 无 @tanstack/react-start）→ 改用通用 wrangler skill。触发词：「部署到 Cloudflare」「deploy to Cloudflare」「mkfast 部署」「TanStarter 上线」「wrangler deploy」「准备部署」「上线前要做什么」「deploy 流程」「pnpm deploy」「推到 Cloudflare 生产」「ship to CF」「go live」「发布 CF Workers」「production deploy」「把网站上线」「上线我的 SaaS」。

2026-04-19295

happy-dreamina.md

from "iamzhihuix/happy-claude-skills"

ByteDance Jimeng (Dreamina) image and video generation via the official `dreamina` CLI. Use this skill whenever the user mentions 即梦, Dreamina, Jimeng, or asks to generate images or videos specifically through ByteDance's Jimeng service. Covers text2image, image2image, text2video, image2video, plus async task query and task-history browsing via list_task. Trigger this skill instead of happy-image-gen or happy-video-gen whenever the user explicitly names 即梦 or dreamina — it uses browser-based login (`dreamina login`) rather than API keys and has access to Jimeng-exclusive models. Common phrases include "用即梦画张...", "Jimeng generate a video of...", "Dreamina 文生视频", "用 dreamina CLI 做图", "查下我即梦的历史任务".

2026-04-19295

open-source-prep.md

from "iamzhihuix/happy-claude-skills"

帮用户把私有项目整理成可开源的仓库。核心能力：(1) 扫描源码和 git 历史中的密钥/token 泄漏；(2) 根据项目场景推荐开源协议（MIT / Apache 2.0 / GPL 等）并生成 LICENSE；(3) 补齐 README 免责声明、CONTRIBUTING.md、SECURITY.md、.gitignore 等开源必备文档；(4) 检查 bundle identifier、package.json 等所有权/商标隐患。触发词：「开源」「open source」「选择协议」「LICENSE」「准备开源」「检测密钥」「secret scan」「上传 github」。

2026-04-17295

happy-image-gen.md

from "iamzhihuix/happy-claude-skills"

Universal AI image generation supporting OpenAI DALL·E / gpt-image, Google Gemini Image / Imagen, Replicate (Flux / SDXL / any model), Stability AI, FAL, Ark (Seedream 4.5), Bailian (qwen-image / wanx), and SiliconFlow. Use this skill whenever the user asks to generate, create, draw, illustrate, render, or synthesize images from text prompts or reference images. Typical phrases include "draw a ...", "generate an image of ...", "画一张 ...", "给我来张图", "make a poster of ...", "create an illustration ...", or any mention of image-generation model families like DALL·E, gpt-image, Flux, SDXL, Seedream, Imagen, Gemini image, Kolors, or Wanx. Always use this skill even if the user does not name a specific model — pick a provider based on their EXTEND.md defaults or available API keys in the environment. Do NOT use this skill when the user explicitly mentions 即梦 / Dreamina / Jimeng — those go to happy-dreamina instead.

2026-04-17295

happy-video-gen.md

from "iamzhihuix/happy-claude-skills"

Universal AI video generation supporting OpenAI Sora, Google Veo 2/3, Runway Gen-3/Gen-4, Pika 2.2, Luma Dream Machine (Ray 2), FAL (Kling / Wan / Veo / Sora wrappers), Ark Seedance 1.5 Pro/Lite, Bailian Wanx (i2v), MiniMax Hailuo-02, and Vidu Q3. Use this skill whenever the user asks to generate, create, make, or synthesize a video from a text prompt or from a first-frame image. Covers text-to-video and image-to-video, with optional last-frame control on providers that support it. Typical phrases include "generate a video of ...", "make a 5-second clip of ...", "animate this image", "生成一段视频", "做个短片", or any mention of video-generation model families like Sora, Veo, Runway Gen, Kling, Wan, Seedance, Hailuo, Pika, Dream Machine, Vidu. Always use this skill even if the user does not name a specific model — pick a provider from their EXTEND.md defaults or available API keys. Do NOT use this skill when the user explicitly mentions 即梦 / Dreamina / Jimeng — those go to happy-dreamina instead.

2026-04-17295

package.json

"author": "iamzhihuix"

"repository": "iamzhihuix/happy-claude-skills"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name	happy-audio-gen
description	Universal AI voice / text-to-speech skill supporting OpenAI TTS (gpt-4o-mini-tts, tts-1), ElevenLabs multilingual TTS with voice cloning, Bailian Qwen TTS (qwen-tts / qwen3-tts-vd with voice-design custom voices, long-text chunking built in), MiniMax speech-02-hd, SiliconFlow CosyVoice / SenseVoice, and PlayHT 2.0. Use this skill whenever the user asks to read text aloud, synthesize speech, generate narration, create voice-over, dub a script, or turn any text into audio (mp3 / wav / ogg / flac). Typical phrases include "read this aloud", "generate voice for ...", "create a narration of ...", "tts this", "把这段念出来", "做个配音", "合成语音", or mentions of voices / TTS model names like Alloy, Ash, Cherry, Rachel, CosyVoice, PlayHT. Always use this skill even if the user does not specify a provider — pick one from EXTEND.md defaults or available env keys.
version	0.1.0

happy-audio-gen

Quick usage

# Shortest path — OpenAI default voice
bun scripts/main.ts --text "Hello, world" --out ./hello.mp3

# Chinese, MiniMax
bun scripts/main.ts --provider minimax --text "大家好" --voice male-qn-qingse --out ./hello.mp3

# Long-form, Bailian (auto-splits by sentence)
bun scripts/main.ts --provider bailian --textfiles ./script.md --out ./narration.mp3

When to invoke this skill

User asks to synthesize speech / TTS / read aloud / narrate / dub / make a voice-over.
User asks to convert script / text / article into audio.
User names a TTS voice or model.

Do not route here when the user wants to transcribe audio → text (that's STT, different domain), or edit / mix audio files (use a dedicated audio editor).

Step 0: Preflight (BLOCKING)

Locate EXTEND.md:
- ./.happy-skills/happy-audio-gen/EXTEND.md
- $XDG_CONFIG_HOME/happy-skills/happy-audio-gen/EXTEND.md
- ~/.happy-skills/happy-audio-gen/EXTEND.md
If none found, run bun scripts/main.ts --setup and walk the user through references/config/first-time-setup.md.
Verify at least one provider has credentials (env var or 1Password reference).
Verify Bun is available. Fallback: npx -y bun.

Step 1: Choose provider

Preference order:

--provider <id>
EXTEND.md default_provider
Auto-detect env vars: openai > elevenlabs > bailian > minimax > siliconflow > playht

Pick by language / voice intent:

English, natural + fast → openai (gpt-4o-mini-tts / tts-1).
Multilingual, voice cloning → elevenlabs.
Chinese, long-form → bailian (qwen-tts auto-chunks long scripts) or minimax.
Chinese dialect / voice design → bailian (voice-design with qwen3-tts-vd) or siliconflow (CosyVoice2).
Ultra-realistic, short-form → playht (2.0).

Step 2: Fill parameters

--text or --textfiles: input. Always quote.
--out <path>: REQUIRED. Extension determines format (.mp3 / .wav / .ogg / .flac).
--voice <id>: provider-specific. See references/voices.md for the short list of well-known voices.
--rate 0.5..2.0: speaking rate.
--instruction "...": voice direction (only openai gpt-4o-mini-tts and siliconflow honor this).
--language <code>: en, zh, ja — only a few providers honor this explicitly.

Step 3: Run

bun scripts/main.ts \
  --provider openai \
  --model gpt-4o-mini-tts \
  --voice alloy \
  --text "..." \
  --out ./out.mp3

JSON mode:

{ "success": true, "provider": "openai", "model": "gpt-4o-mini-tts", "voice": "alloy", "output": "/abs/out.mp3", "size_bytes": 76032, "format": "mp3" }

Step 4: Long text handling

happy-audio-gen automatically splits long input for providers that cap per-call length (Bailian ≤ 200 Chinese chars per call). Chunks are concatenated byte-for-byte on output.
For best fidelity with concatenated MP3s, stitch the segments with ffmpeg afterward rather than relying on byte concat.

Step 5: Errors

[openai] OpenAI TTS 400 with invalid voice → the voice name is not supported by the model. Use one of alloy, ash, coral, echo, fable, onyx, nova, sage, shimmer.
[minimax] ... 2049 invalid api key → try MINIMAX_BASE_URL=https://api.minimaxi.com/v1 (different region).
[bailian] ... 400 DataInspectionFailed → Aliyun content filter. Surface to the user.
[elevenlabs] 401 → key invalid or subscription expired.

References

references/providers.md — per-provider env vars, default models, voice lists.
references/voices.md — curated voices for each provider.
references/error_codes.md — common errors and fixes.
references/config/first-time-setup.md
references/config/extend-schema.md
assets/EXTEND.template.md

happy-audio-gen

happy-audio-gen

Quick usage

When to invoke this skill

Step 0: Preflight (BLOCKING)

Step 1: Choose provider

Step 2: Fill parameters

Step 3: Run

Step 4: Long text handling

Step 5: Errors

References

More from this repository

More from this repository

happy-audio-gen

Quick usage

When to invoke this skill

Step 0: Preflight (BLOCKING)

Step 1: Choose provider

Step 2: Fill parameters

Step 3: Run

Step 4: Long text handling

Step 5: Errors

References