with one click
tts-voiceover
// Text-to-speech voice-over generation from YAML speaker notes using Azure Speech SDK with SSML pronunciation control - Brought to you by microsoft/hve-core
// Text-to-speech voice-over generation from YAML speaker notes using Azure Speech SDK with SSML pronunciation control - Brought to you by microsoft/hve-core
PowerPoint slide deck generation and management using python-pptx with YAML-driven content and styling - Brought to you by microsoft/hve-core
Retrieves and groups GitHub code scanning alerts by rule and severity using the gh CLI - Brought to you by microsoft/hve-core
OWASP Docker Top 6 vulnerability knowledge base for identifying, assessing, and remediating security risks in containerized Docker environments - Brought to you by microsoft/hve-core.
Generate customer-card PowerPoint content YAML from Design Thinking canonical artifacts and build using the shared PowerPoint skill pipeline - Brought to you by microsoft/hve-core
Decision-driven installer for HVE-Core with 6 clone-based installation methods, extension quick-install, environment detection, and agent customization workflows - Brought to you by microsoft/hve-core
Generates PR reference XML containing commit history and unified diffs between branches with extension and path filtering. Includes utilities to list changed files by type and read diff chunks. Use when creating pull request descriptions, preparing code reviews, analyzing branch changes, discovering work items from diffs, or generating structured diff summaries. - Brought to you by microsoft/hve-core
| name | tts-voiceover |
| description | Text-to-speech voice-over generation from YAML speaker notes using Azure Speech SDK with SSML pronunciation control - Brought to you by microsoft/hve-core |
| metadata | {"authors":"microsoft/hve-core","spec_version":"1.0"} |
Generates per-slide WAV voice-over files from YAML speaker_notes using Azure Speech SDK with SSML pronunciation control.
This skill reads content.yaml files from a PowerPoint skill content directory, extracts speaker_notes fields, applies SSML acronym aliases for correct pronunciation of technical terms, and produces one WAV file per slide. Supports dry-run mode for SSML template verification without Azure credentials.
SPEECH_KEY) or Microsoft Entra ID (SPEECH_RESOURCE_ID).uv for virtual environment management.export SPEECH_KEY="your-speech-key"
export SPEECH_REGION="eastus"
Requires a custom domain on the Speech resource and Cognitive Services Speech User role.
export SPEECH_RESOURCE_ID="/subscriptions/.../Microsoft.CognitiveServices/accounts/your-resource"
export SPEECH_REGION="eastus"
Install dependencies:
# run from this skill folder
uv sync
Verify SSML templates without generating audio:
uv run scripts/generate_voiceover.py --dry-run --content-dir path/to/content
Generate voice-over WAV files:
uv run scripts/generate_voiceover.py --content-dir path/to/content --output-dir voice-over
Embed audio into a PPTX deck:
uv run scripts/embed_audio.py --input deck.pptx --audio-dir voice-over --output deck-narrated.pptx
| Parameter | Type | Default | Description |
|---|---|---|---|
--dry-run | flag | false | Print SSML templates without generating audio |
--voice | string | en-US-Andrew:DragonHDLatestNeural | Azure TTS voice name |
--rate | string | +10% | Speech prosody rate |
--content-dir | path | content | Path to slide content directory |
--output-dir | path | voice-over | Path to WAV output directory |
--lexicon | path | (auto-detect) | Custom acronyms.yaml path |
--verbose / -v | flag | false | Enable verbose (DEBUG) logging output |
Embeds WAV files into corresponding PPTX slides and adds narration timing XML so PowerPoint recognizes the audio for video export via File > Export > Create a Video > Use Recorded Timings and Narrations.
| Parameter | Type | Default | Description |
|---|---|---|---|
--input | path | (required) | Source PPTX file path |
--audio-dir | path | voice-over | Directory with slide-NNN.wav |
--output | path | *-narrated.pptx | Output PPTX file path |
--verbose / -v | flag | false | Enable verbose (DEBUG) logging output |
Generate with custom voice and rate:
uv run scripts/generate_voiceover.py \
--content-dir content \
--output-dir voice-over \
--voice "en-US-Jenny:DragonHDLatestNeural" \
--rate "+5%"
Use a custom lexicon:
uv run scripts/generate_voiceover.py \
--content-dir content \
--lexicon custom-acronyms.yaml
Embed generated audio:
uv run scripts/embed_audio.py \
--input slide-deck/presentation.pptx \
--audio-dir voice-over \
--output slide-deck/presentation-narrated.pptx
The lexicon controls SSML <sub alias> replacements for acronyms and technical terms. Create an acronyms.yaml file:
acronyms:
HVE-Core: "H V E Core"
OWASP: "Oh wasp"
SBOM: "S Bomb"
SLSA: "Salsa"
CI/CD: "C I C D"
Lexicon resolution order:
--lexicon argument.acronyms.yaml in the content directory.Each slide produces an SSML document:
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"
xmlns:mstts="http://www.w3.org/2001/mstts" xml:lang="en-US">
<voice name="en-US-Andrew:DragonHDLatestNeural">
<prosody rate="+10%">
Text with <sub alias="Oh wasp">OWASP</sub> aliases applied.
</prosody>
</voice>
</speak>
This skill reads from the PowerPoint skill's content directory structure:
content/
├── slide-001/
│ └── content.yaml # Must include speaker_notes: field
├── slide-002/
│ └── content.yaml
└── ...
Each content.yaml should contain a speaker_notes: field with the narration text. The generated WAV files are named slide-NNN.wav matching the directory names.
| Issue | Solution |
|---|---|
Set SPEECH_KEY ... or SPEECH_RESOURCE_ID | Export SPEECH_KEY (key auth) or SPEECH_RESOURCE_ID (Entra ID) with SPEECH_REGION. |
| 401 with Entra ID auth | Verify custom domain on the Speech resource and Cognitive Services Speech User role. RBAC propagation takes up to 5 minutes. |
| Empty WAV files or skipped slides | Verify speaker_notes: is present and non-empty in content.yaml. |
| Mispronounced acronyms | Add entries to acronyms.yaml with phonetic aliases. |
azure-cognitiveservices-speech package is required | Run uv sync in the skill directory. |
| Audio icon visible in PPTX | Reposition or resize the audio object in PowerPoint after embedding. |
| Authored slide animations missing after embedding | embed_audio.py replaces existing p:timing with narration timing; re-apply animations in PowerPoint after embedding audio. |
| Slides no longer advance on click after embedding | embed_audio.py sets advClick="0" for auto-advance. To re-enable, select all slides in PowerPoint and check Advance Slide > On Mouse Click in the Transitions tab. |
| Video export shows "No timings recorded" | Re-embed audio with the updated embed_audio.py which adds narration timing XML automatically. |
Brought to you by microsoft/hve-core
🤖 Crafted with precision by ✨Copilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.