Ejecuta cualquier Skill en Manus
con un clic

Ejecuta cualquier Skill en Manus con un clic

$pwd:

hyperframes-media

Name: Hyperframes Media
Author: OneWave-AI

// Asset preprocessing for HyperFrames compositions — text-to-speech narration (Kokoro), audio/video transcription (Whisper), and background removal for transparent overlays. Use when a HyperFrames project needs a voiceover, captions/subtitles from existing audio, or a clean cutout from a photo/video for use as an overlay.

Ejecutar en Manus

$ git log --oneline --stat

stars:1

forks:0

updated:22 de mayo de 2026, 03:17

SKILL.md

readonly

name	hyperframes-media
description	Asset preprocessing for HyperFrames compositions — text-to-speech narration (Kokoro), audio/video transcription (Whisper), and background removal for transparent overlays. Use when a HyperFrames project needs a voiceover, captions/subtitles from existing audio, or a clean cutout from a photo/video for use as an overlay.

HyperFrames Media

Three preprocessing pipelines that turn raw inputs into HyperFrames-ready assets:

Narration — text to spoken audio (Kokoro TTS, runs locally)
Transcription — audio/video to timestamped captions (Whisper)
Cutouts — remove backgrounds from images for transparent overlays

Use this skill when the user wants any of those, as input to a HyperFrames composition. For standalone TTS or transcription that won't end up in a video, suggest a simpler approach.

When to use

"Add a voiceover that says ..."
"Generate narration for this script"
"Caption this audio / video"
"Turn this podcast into a captioned video"
"Remove the background from this photo so I can overlay it"
"Make this product shot transparent"

Narration (Kokoro TTS)

Local, fast, no API key needed. Good defaults:

Voice: Kokoro ships several. Default to a neutral American voice for product/marketing; pick a warmer voice for storytelling.
Speed: 1.0x for most cases. Slow to 0.9x if pronunciation matters (technical demos); speed to 1.1x for energetic shorts.
Output: WAV at 24kHz, then convert to MP3 if size matters.

Drop the audio file into the HyperFrames project's assets/audio/ and reference it from a layer with data-audio="./assets/audio/voiceover.mp3". The composition's data-duration should match (or exceed) the audio length.

Transcription (Whisper)

For captions/subtitles, use Whisper to produce a timestamped transcript.

Model size: base is fast and good enough for most accents. Use small or medium only if base produces errors.
Output: Ask for SRT or JSON with per-word timestamps. Word-level timestamps unlock animated/karaoke-style captions.
Format for HyperFrames: convert the timestamps to milliseconds and emit caption layers with data-start / data-end per line (or per word for kinetic captions).

If the user has source audio + a script, prefer alignment (Whisper with --initial_prompt to bias toward the known script) over raw transcription. Result is cleaner.

Background removal (cutouts)

For transparent overlays — speaker headshots floating over a colored background, product cutouts, mascot characters, etc.

Library: rembg (uses U2-Net or similar) is the standard. Output a PNG with alpha.
For video: process every nth frame, then chain with ffmpeg to reassemble. Slower but works.
Quality check: zoom in on hair edges and semi-transparent areas (glass, hair) — those are where rembg struggles. Suggest the user re-shoot against a green screen if quality matters more than speed.

Drop the alpha PNG into assets/images/ and place it as a layer:

<img src="./assets/images/host.png"
     class="absolute right-12 bottom-12 w-64"
     data-start="0" data-end="6000" />

Tying it together

A common flow for a 30-second product demo:

User writes a 30s script
hyperframes-media → Kokoro generates the voiceover (~28s of audio)
Whisper transcribes the voiceover with word-level timestamps
website-to-hyperframes captures the product page at the right moments
HyperFrames composition layers: site capture (background), captions (timed to voiceover), cutout of the founder in the corner (start → 3s and end - 3s → end)
npx hyperframes render produces a captioned, narrated demo

Notes

All three pipelines run locally. No API keys, no data leaving the machine — useful for client work under NDA.
Keep raw inputs in assets/raw/ and processed outputs in assets/. Makes the project regeneratable.
If a tool isn't installed yet, run the install command and proceed — don't ask the user to leave the conversation to set things up.

related-skills.json

mismo repositorio

hyperframes-cli.md

from "OneWave-AI/Crest"

HyperFrames CLI dev loop — use `npx hyperframes` for scaffolding (init), validation (lint, inspect), preview, render, and environment troubleshooting (doctor, browser, info, upgrade). Use when working in or alongside a HyperFrames project and the user asks to scaffold, preview, render, lint, install a registry block, or diagnose a broken environment.

2026-05-221

website-to-hyperframes.md

from "OneWave-AI/Crest"

Capture a website and create a HyperFrames video from it. Use whenever the user (1) provides a URL and wants a video, (2) says "capture this site", "turn this into a video", "make a video tour of this page", or (3) wants a scrolling product walkthrough, marketing reel, or before/after visual built from a real site.

2026-05-221

hyperframes.md

from "OneWave-AI/Crest"

Create video compositions, animations, title cards, overlays, captions, voiceovers, audio-reactive visuals, and scene transitions. Use whenever the user asks to build a video, motion graphic, animated explainer, intro, outro, title card, or convert a website / podcast / talk into a video. HyperFrames is HTML-based — fast iteration, real rendering.

2026-05-221

accessibility-auditor.md

from "OneWave-AI/Crest"

Audit websites for accessibility issues and WCAG compliance. Use when checking accessibility, fixing a11y issues, or ensuring WCAG compliance.

2026-01-091

api-endpoint-scaffolder.md

from "OneWave-AI/Crest"

Generate REST API endpoints with proper structure, validation, error handling, and types. Use when creating new API routes, endpoints, or backend services.

2026-01-091

css-animation-creator.md

from "OneWave-AI/Crest"

Create professional CSS animations, transitions, micro-interactions, and complex motion design. Use when adding animations, hover effects, loading states, page transitions, scroll animations, or any motion design work.

2026-01-091

package.json

"author": "OneWave-AI"

"repository": "OneWave-AI/Crest"

Abrir repositorio de GitHub Ver repositorios del creador

$ install --global

$ download --local

Ejecutar en Manus

name	hyperframes-media
description	Asset preprocessing for HyperFrames compositions — text-to-speech narration (Kokoro), audio/video transcription (Whisper), and background removal for transparent overlays. Use when a HyperFrames project needs a voiceover, captions/subtitles from existing audio, or a clean cutout from a photo/video for use as an overlay.

HyperFrames Media

Three preprocessing pipelines that turn raw inputs into HyperFrames-ready assets:

Narration — text to spoken audio (Kokoro TTS, runs locally)
Transcription — audio/video to timestamped captions (Whisper)
Cutouts — remove backgrounds from images for transparent overlays

Use this skill when the user wants any of those, as input to a HyperFrames composition. For standalone TTS or transcription that won't end up in a video, suggest a simpler approach.

When to use

"Add a voiceover that says ..."
"Generate narration for this script"
"Caption this audio / video"
"Turn this podcast into a captioned video"
"Remove the background from this photo so I can overlay it"
"Make this product shot transparent"

Narration (Kokoro TTS)

Local, fast, no API key needed. Good defaults:

Voice: Kokoro ships several. Default to a neutral American voice for product/marketing; pick a warmer voice for storytelling.
Speed: 1.0x for most cases. Slow to 0.9x if pronunciation matters (technical demos); speed to 1.1x for energetic shorts.
Output: WAV at 24kHz, then convert to MP3 if size matters.

Transcription (Whisper)

For captions/subtitles, use Whisper to produce a timestamped transcript.

Model size: base is fast and good enough for most accents. Use small or medium only if base produces errors.
Output: Ask for SRT or JSON with per-word timestamps. Word-level timestamps unlock animated/karaoke-style captions.
Format for HyperFrames: convert the timestamps to milliseconds and emit caption layers with data-start / data-end per line (or per word for kinetic captions).

If the user has source audio + a script, prefer alignment (Whisper with --initial_prompt to bias toward the known script) over raw transcription. Result is cleaner.

Background removal (cutouts)

For transparent overlays — speaker headshots floating over a colored background, product cutouts, mascot characters, etc.

Library: rembg (uses U2-Net or similar) is the standard. Output a PNG with alpha.
For video: process every nth frame, then chain with ffmpeg to reassemble. Slower but works.
Quality check: zoom in on hair edges and semi-transparent areas (glass, hair) — those are where rembg struggles. Suggest the user re-shoot against a green screen if quality matters more than speed.

Drop the alpha PNG into assets/images/ and place it as a layer:

<img src="./assets/images/host.png"
     class="absolute right-12 bottom-12 w-64"
     data-start="0" data-end="6000" />

Tying it together

A common flow for a 30-second product demo:

User writes a 30s script
hyperframes-media → Kokoro generates the voiceover (~28s of audio)
Whisper transcribes the voiceover with word-level timestamps
website-to-hyperframes captures the product page at the right moments
HyperFrames composition layers: site capture (background), captions (timed to voiceover), cutout of the founder in the corner (start → 3s and end - 3s → end)
npx hyperframes render produces a captioned, narrated demo

Notes

All three pipelines run locally. No API keys, no data leaving the machine — useful for client work under NDA.
Keep raw inputs in assets/raw/ and processed outputs in assets/. Makes the project regeneratable.
If a tool isn't installed yet, run the install command and proceed — don't ask the user to leave the conversation to set things up.

hyperframes-media

HyperFrames Media

When to use

Narration (Kokoro TTS)

Transcription (Whisper)

Background removal (cutouts)

Tying it together

Notes

Más de este repositorio

Más de este repositorio

HyperFrames Media

When to use

Narration (Kokoro TTS)

Transcription (Whisper)

Background removal (cutouts)

Tying it together

Notes