Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

$pwd:

audio-generation

Name: Audio Generation
Author: massgen

// Guide to audio generation and understanding in MassGen. Covers text-to-speech, music, sound effects, and audio understanding across ElevenLabs and OpenAI backends.

Exécuter dans Manus

$ git log --oneline --stat

stars:1 041

forks:161

updated:1 mars 2026 à 20:18

Explorateur de fichiers

4 fichiers

SKILL.md

readonly

name	audio-generation
description	Guide to audio generation and understanding in MassGen. Covers text-to-speech, music, sound effects, and audio understanding across ElevenLabs and OpenAI backends.

Audio Generation

Generate audio using generate_media with mode="audio". Supports speech (TTS), music, and sound effects. ElevenLabs is preferred when available, with OpenAI as fallback.

Quick Start

# Text-to-speech (auto-selects ElevenLabs if key available)
generate_media(prompt="Hello, welcome to our presentation!", mode="audio")

# With specific voice
generate_media(prompt="Hello!", mode="audio", voice="Rachel")

# Music generation (ElevenLabs only)
generate_media(prompt="Upbeat jazz piano with soft drums", mode="audio",
               audio_type="music", duration=30)

# Sound effects (ElevenLabs only)
generate_media(prompt="Thunder rolling across a mountain valley", mode="audio",
               audio_type="sound_effect", duration=5)

Audio Types

Type	Backends	Description
`"speech"` (default)	ElevenLabs, OpenAI	Text-to-speech with voice selection
`"music"`	ElevenLabs only	Music generation from text prompt
`"sound_effect"`	ElevenLabs only	Sound effect generation
`"voice_conversion"`	ElevenLabs only	Change voice of existing audio (speech-to-speech)
`"audio_isolation"`	ElevenLabs only	Remove background noise, isolate vocals
`"voice_design"`	ElevenLabs only	Create a new synthetic voice from text description
`"voice_clone"`	ElevenLabs only	Clone a voice from audio samples
`"dubbing"`	ElevenLabs only	Translate and dub audio to another language

Backend Comparison

Backend	Default Model	Supports	API Key
ElevenLabs (priority 1)	`eleven_multilingual_v2`	Speech, music, SFX	`ELEVENLABS_API_KEY`
OpenAI (priority 2)	`gpt-4o-mini-tts`	Speech only	`OPENAI_API_KEY`

If ElevenLabs TTS fails, the system automatically falls back to OpenAI TTS.

Key Parameters

Parameter	Description	Example
`prompt`	Text to speak (speech) or description (music/SFX)	`"Hello world!"`
`voice`	Voice name or ID	`"Rachel"`, `"nova"`, `"alloy"`
`audio_type`	Type of audio	`"speech"`, `"music"`, `"sound_effect"`
`duration`	Length in seconds (music/SFX only)	`30`
`instructions`	Speaking style (OpenAI `gpt-4o-mini-tts` only)	`"warm, reflective tone"`
`audio_format`	Output format	`"mp3"`, `"wav"`, `"opus"`

Voice Quick Reference

ElevenLabs (top voices):

Voice	Character
Rachel	Warm, conversational female
Sarah	Clear, professional female
Josh	Friendly male
Adam	Deep, authoritative male
Emily	Bright, energetic female

OpenAI voices: alloy, echo, fable, onyx, nova, shimmer, coral, sage

Important: prompt vs instructions

For speech, prompt is the literal text to speak. Style guidance goes in instructions:

# CORRECT: prompt = text to speak, instructions = how to speak it
generate_media(
    prompt="Welcome to the annual report presentation.",
    mode="audio",
    voice="alloy",
    instructions="warm, reflective tone with measured pacing",
    backend_type="openai"
)

# WRONG: Don't put style instructions in prompt
generate_media(prompt="Say this warmly: Welcome...", mode="audio")  # Bad!

instructions only works with OpenAI gpt-4o-mini-tts. ElevenLabs uses voice selection for tone.

Audio Understanding

Use read_media (not generate_media) to analyze existing audio:

read_media(path="recording.mp3", prompt="Transcribe and summarize this audio")

Need More Control?

Full ElevenLabs voice catalog (28+ voices): See references/voices.md
Music and sound effects details: See references/music_and_sfx.md
Advanced audio capabilities (voice conversion, cloning, isolation, dubbing): See references/advanced.md

related-skills.json

même dépôt

massgen-log-analyzer.md

from "massgen/MassGen"

Run MassGen experiments and analyze logs using automation mode, logfire tracing, and SQL queries. Use this skill for performance analysis, debugging agent behavior, evaluating coordination patterns, and improving the logging structure, or whenever an ANALYSIS_REPORT.md is needed in a log directory.

2026-03-311.0k

massgen.md

from "massgen/MassGen"

Invoke MassGen's multi-agent system. Use when the user wants multiple AI agents on a task: writing, code, review, planning, specs, research, design, or any task where parallel iteration beats working alone.

2026-03-281.0k

backend-integrator.md

from "massgen/MassGen"

Complete guide for integrating a new LLM backend into MassGen. Use when adding a new provider (e.g., Codex, Mistral, DeepSeek) or when auditing an existing backend for missing integration points. Covers all ~15 files that need touching.

2026-03-101.0k

image-generation.md

from "massgen/MassGen"

Guide to image generation and editing in MassGen. Use when creating images, editing existing images, iterating on image designs, or choosing between image backends (OpenAI, Google Gemini/Imagen, Grok, OpenRouter).

2026-03-061.0k

video-generation.md

from "massgen/MassGen"

Guide to video generation in MassGen. Use when creating videos from text prompts or images across Grok, Google Veo, and OpenAI Sora backends.

2026-03-031.0k

multimedia-backend-integrator.md

from "massgen/MassGen"

Reference guide for adding new media generation backends to MassGen's unified generate_media tool.

2026-03-021.0k

package.json

"author": "massgen"

"repository": "massgen/MassGen"

Ouvrir le dépôt GitHub Voir les dépôts du créateur

$ install --global

$ download --local

Exécuter dans Manus

$ useful --forSOC

Autres occupations informatiquesProfessions informatiques et mathématiques15-1299L4

name	audio-generation
description	Guide to audio generation and understanding in MassGen. Covers text-to-speech, music, sound effects, and audio understanding across ElevenLabs and OpenAI backends.

Audio Generation

Generate audio using generate_media with mode="audio". Supports speech (TTS), music, and sound effects. ElevenLabs is preferred when available, with OpenAI as fallback.

Quick Start

# Text-to-speech (auto-selects ElevenLabs if key available)
generate_media(prompt="Hello, welcome to our presentation!", mode="audio")

# With specific voice
generate_media(prompt="Hello!", mode="audio", voice="Rachel")

# Music generation (ElevenLabs only)
generate_media(prompt="Upbeat jazz piano with soft drums", mode="audio",
               audio_type="music", duration=30)

# Sound effects (ElevenLabs only)
generate_media(prompt="Thunder rolling across a mountain valley", mode="audio",
               audio_type="sound_effect", duration=5)

Audio Types

Type	Backends	Description
`"speech"` (default)	ElevenLabs, OpenAI	Text-to-speech with voice selection
`"music"`	ElevenLabs only	Music generation from text prompt
`"sound_effect"`	ElevenLabs only	Sound effect generation
`"voice_conversion"`	ElevenLabs only	Change voice of existing audio (speech-to-speech)
`"audio_isolation"`	ElevenLabs only	Remove background noise, isolate vocals
`"voice_design"`	ElevenLabs only	Create a new synthetic voice from text description
`"voice_clone"`	ElevenLabs only	Clone a voice from audio samples
`"dubbing"`	ElevenLabs only	Translate and dub audio to another language

Backend Comparison

Backend	Default Model	Supports	API Key
ElevenLabs (priority 1)	`eleven_multilingual_v2`	Speech, music, SFX	`ELEVENLABS_API_KEY`
OpenAI (priority 2)	`gpt-4o-mini-tts`	Speech only	`OPENAI_API_KEY`

If ElevenLabs TTS fails, the system automatically falls back to OpenAI TTS.

Key Parameters

Parameter	Description	Example
`prompt`	Text to speak (speech) or description (music/SFX)	`"Hello world!"`
`voice`	Voice name or ID	`"Rachel"`, `"nova"`, `"alloy"`
`audio_type`	Type of audio	`"speech"`, `"music"`, `"sound_effect"`
`duration`	Length in seconds (music/SFX only)	`30`
`instructions`	Speaking style (OpenAI `gpt-4o-mini-tts` only)	`"warm, reflective tone"`
`audio_format`	Output format	`"mp3"`, `"wav"`, `"opus"`

Voice Quick Reference

ElevenLabs (top voices):

Voice	Character
Rachel	Warm, conversational female
Sarah	Clear, professional female
Josh	Friendly male
Adam	Deep, authoritative male
Emily	Bright, energetic female

OpenAI voices: alloy, echo, fable, onyx, nova, shimmer, coral, sage

Important: prompt vs instructions

For speech, prompt is the literal text to speak. Style guidance goes in instructions:

# CORRECT: prompt = text to speak, instructions = how to speak it
generate_media(
    prompt="Welcome to the annual report presentation.",
    mode="audio",
    voice="alloy",
    instructions="warm, reflective tone with measured pacing",
    backend_type="openai"
)

# WRONG: Don't put style instructions in prompt
generate_media(prompt="Say this warmly: Welcome...", mode="audio")  # Bad!

instructions only works with OpenAI gpt-4o-mini-tts. ElevenLabs uses voice selection for tone.

Audio Understanding

Use read_media (not generate_media) to analyze existing audio:

read_media(path="recording.mp3", prompt="Transcribe and summarize this audio")

Need More Control?

Full ElevenLabs voice catalog (28+ voices): See references/voices.md
Music and sound effects details: See references/music_and_sfx.md
Advanced audio capabilities (voice conversion, cloning, isolation, dubbing): See references/advanced.md

audio-generation

Audio Generation

Quick Start

Audio Types

Backend Comparison

Key Parameters

Voice Quick Reference

Important: prompt vs instructions

Audio Understanding

Need More Control?

Plus depuis ce dépôt

Plus depuis ce dépôt

Audio Generation

Quick Start

Audio Types

Backend Comparison

Key Parameters

Voice Quick Reference

Important: prompt vs instructions

Audio Understanding

Need More Control?