| name | caption-and-emphasis |
| description | Design captions, keywords, labels, and emphasis text for the video pipeline. Use when deciding what transcript words should appear on screen, where captions should sit, how they avoid presenter/content collisions, and when to prefer no caption. |
Caption And Emphasis
Overview
Treat captions as editorial emphasis, not transcript dumping. The visual should feel guided, not subtitled by default.
Rules
- Show text only when it improves comprehension, memory, or orientation.
- Treat burned-in subtitles as signaling, not accessibility captions. Keep full accessibility captions in a separate caption track when needed.
- Budget emphasis captions before writing them: default to zero, then add at most one caption per idea beat or roughly every 8-12 seconds. Back-to-back captions are allowed only for an explicit contrast beat.
- Every caption must pass a reason test: memory hook, section turn, contrast label, or disambiguation of a visual object. Delete captions that only repeat narration.
- Do not duplicate a widget, object label, title, or callout. If the same words already appear near the relevant object, remove the bottom subtitle.
- Prefer short keywords, labels, and callouts.
- Do not cover the presenter face, important hand motion, or the content focal area.
- Do not place captions inside a reserved presenter dock rectangle.
- Avoid changing type style across shots.
- Remove captions during dense generated clips unless one memory phrase is essential.
- For direct presenter sections, prefer side widgets for orientation and use captions only for a thesis or turn that needs to be remembered.
- For content sections, prefer labels attached to objects over bottom subtitles.
- When a shot uses a creator effect, state the caption budget explicitly as
none, keyword, object label, callout, or lower third; include the duplication check and safe zones in the shot brief.
Research Basis
- Mayer and Fiorella's multimedia-learning work supports signaling cues but warns against redundant on-screen text over narration and graphics.
- WCAG captions are a separate accessibility requirement: they should fully represent speech and meaningful audio, while this skill controls editorial text burned into the video.
Output
Return:
- displayed text
- timing
- placement
- relationship to object or presenter
- animation style
- collision notes
Read references/caption-patterns.md for pattern choices.