| name | media-toolkit-production |
| description | Voice generation and audio mixing using ElevenLabs API v3 with intelligent timing optimization. Trigger when user mentions voice generation, TTS, ElevenLabs, audio production, or script format CHARACTER (emotion) dialogue. |
Media Toolkit Production
Instructions
Critical Requirements (ALL MUST BE ENFORCED)
-
Accent Persistence - Every ElevenLabs API call MUST include accent descriptor:
text: "[Irish Midlands accent] Stay close, Jess."
-
Clip Reuse from History - Before generating any clip:
- Check ElevenLabs History API
- If matching text+voice exists → reuse it
- Log: "Reusing from history: [id]"
-
Re-Timing Workflow - Local timing adjustment, NO regeneration:
- Adjust downstream timing based on actual durations
- Zero API calls for timing fixes
-
Zero Voice Overlaps (PRIORITY 1) - Voices NEVER overlap unless intentional
-
Line Refinement - User approval loop:
- User: "Line 2 more panicked"
- Adjust: stability=0.3, style=0.8, text="[panicked, breathless]..."
- Archive both original and rework versions
Script Format
CHARACTER (emotion): dialogue text
[SFX: description, duration: 5s, volume: 0.8]
Production Workflow
- Load character database (Notion)
- Parse script (dialogue + SFX)
- Generate timeline
- Detect overlaps
- Generate audio (v3 with history reuse)
- Optimize timing
- Mix audio (FFmpeg)
DO NOT
- Generate audio without checking history first
- Omit accent descriptors from any API call
- Allow voice overlaps without explicit marking
- Regenerate clips for timing adjustments
DO
- Always include accent in text prompt
- Check history → archive → then generate if needed
- Preserve both original and reworked versions
- Track and report token usage efficiency