name

video-short-maker

description

Create short clips from local video files using ffmpeg, with optional local Whisper subtitles. Use when Codex is asked to turn a long local video into a short, trim boring pauses, remove dead air, cut silence, create a 30-60 second clip, export a vertical 9:16 MP4, generate captions/subtitles, burn subtitles into a video, or generate a simple edit report. Base editing is ffmpeg-based; subtitles use local whisper.cpp and do not require an API key.

Video Short Maker

Overview

Use this skill to turn a local video into a shorter MP4 by detecting silence and keeping the strongest non-silent parts. The base workflow depends only on ffmpeg and ffprobe. Captions are optional and use local whisper.cpp.

Requirements

Check dependencies first:

ffmpeg -version
ffprobe -version

If missing on macOS, tell the user:

brew install ffmpeg

For subtitles, set up local Whisper once:

python3 /Users/francescomistero/.codex/skills/video-short-maker/scripts/setup_captions.py

This installs whisper.cpp with Homebrew when needed, installs Pillow for burned-in captions when needed, and downloads the default tiny.en model to ~/.cache/video-short-maker/models/.

Workflow

Identify the input video path and output goal.
- Default target duration: 45s.
- Default format: landscape/same aspect ratio.
- Default quality: high for crisp text in screen recordings.
- Use vertical 9:16 only when the user asks for TikTok/Reels/Shorts/vertical.
- Use captions only when the user asks for subtitles/captions.
Run the bundled script:

python3 /Users/francescomistero/.codex/skills/video-short-maker/scripts/make_short.py INPUT_VIDEO --duration 45

For vertical output:

python3 /Users/francescomistero/.codex/skills/video-short-maker/scripts/make_short.py INPUT_VIDEO --duration 45 --vertical

For a custom output path:

python3 /Users/francescomistero/.codex/skills/video-short-maker/scripts/make_short.py INPUT_VIDEO --duration 45 --output ./short.mp4

For smaller files:

python3 /Users/francescomistero/.codex/skills/video-short-maker/scripts/make_short.py INPUT_VIDEO --duration 45 --quality small

For local captions:

python3 /Users/francescomistero/.codex/skills/video-short-maker/scripts/make_short.py INPUT_VIDEO --duration 45 --vertical --captions

Report the result succinctly:
- output file path
- duration target
- number of segments kept
- report path
- whether vertical crop was applied
- quality preset
- subtitle and captioned video paths when captions were used

Quality Presets

high: CRF 18, slower encode, sharper text, larger file. Default.
balanced: CRF 20, faster encode, good quality/size balance.
small: CRF 24, smaller file, visibly more compression.

Captions

Captions are optional and local. They do not require an API key.

When the user asks for subtitles/captions:

Check whether whisper-cli or whisper-cpp exists.
If missing, run or suggest:

python3 /Users/francescomistero/.codex/skills/video-short-maker/scripts/setup_captions.py

Run make_short.py with --captions.

Caption outputs:

*.srt: generated subtitle file
*.captioned.mp4: final video with burned-in subtitles

Burned-in captions are rendered with Python/Pillow so they work even when the local ffmpeg build does not include the subtitles filter.

Default caption model: tiny.en.

For Italian or non-English speech, use the multilingual model and language:

python3 /Users/francescomistero/.codex/skills/video-short-maker/scripts/setup_captions.py --model base
python3 /Users/francescomistero/.codex/skills/video-short-maker/scripts/make_short.py INPUT_VIDEO --captions --caption-model ~/.cache/video-short-maker/models/ggml-base.bin --caption-language it

Editing Behavior

The script:

detects silent ranges with ffmpeg silencedetect
converts them into non-silent candidate segments
supports --cut-style light|normal|aggressive
keeps 0.5s of silence on both sides of each cut by default in normal, so speech feels less abrupt
drops very short fragments
keeps earlier non-silent segments until the target duration is reached
concatenates selected segments
optionally crops/scales to 9:16
exports H.264 MP4 with AAC audio
optionally extracts audio, transcribes with local Whisper, writes SRT, and burns captions
writes an edit report JSON next to the output

This is not content-aware yet. It removes dead air and compresses the video, but it does not understand the spoken content.

Output Guidance

Keep responses short. Do not paste full ffmpeg logs unless the command fails.

Use this format:

Created short:
`/path/to/output.short.mp4`

- Target: 45s
- Kept segments: 6
- Vertical: yes/no
- Quality: high/balanced/small
- Report: `/path/to/output.report.json`
- Subtitles: `/path/to/output.srt` if used
- Captioned video: `/path/to/output.captioned.mp4` if used

If the result is rough, say so plainly and suggest the next useful option:

use --cut-style aggressive if it barely cut anything
use --cut-style light if cuts feel too jumpy
increase --keep-silence to make cuts feel softer
increase --duration if context is missing
use --vertical for Shorts/Reels/TikTok
use --quality balanced or --quality small if the file is too large
use --captions when subtitles are needed