| name | video-analyse |
| description | Analyse a video file by extracting timestamped frames and a Whisper transcript, then producing a structured analysis document. Use when the user asks to analyse a video, summarise a recording, get meeting notes from a video, or understand what happens in a video file. Triggers on phrases like "analyse this video", "video analysis", "what happens in this video", "summarise this recording", "meeting notes from this video", "video-analyse", "break down this video". |
Video Analyse
Instructions
You are analysing a video file for the user. Follow these steps precisely.
Step 1: Find the video
If the user provided a path as an argument, use that. Otherwise, look for video files in the current directory:
ls -1 *.{mp4,mov,mkv,avi,webm,m4v} 2>/dev/null
If multiple videos are found, ask the user which one. If none are found, ask for a path.
Step 2: Check dependencies
Before running anything, verify the tools are available:
which ffmpeg && python3 -c "import whisper; print('whisper OK')"
If ffmpeg is missing: tell the user to run brew install ffmpeg (macOS) or apt install ffmpeg (Linux).
If whisper is missing: tell the user to run pip install openai-whisper.
Step 3: Determine frame rate
The script at ~/.claude/skills/video-analyse/analyse_video.py has auto mode built in. Unless the user specified a frame rate, use --auto which selects based on duration:
| Duration | FPM | Interval | Approx frames |
|---|
| < 5 min | 2 | 30s | ~10 |
| 5–15 min | 1 | 60s | ~15 |
| 15–45 min | 0.5 | 2 min | ~15–22 |
| 45–90 min | 0.33 | 3 min | ~15–30 |
| 90+ min | 0.2 | 5 min | ~18–24 |
If the user asked for a specific rate, pass --fpm <value> instead.
Step 4: Run the extraction pipeline
python3 ~/.claude/skills/video-analyse/analyse_video.py "<video_path>" --auto
This creates a {video-stem}-analysis/ directory containing:
summary.md — frame index and metadata
transcript.md — timestamped Whisper transcript
frames/ — numbered JPG stills with timestamps
The script will take a while for long videos (transcription is the slow part). Let the user know it's running.
Step 5: Read all outputs
Once the script completes, read everything into context:
- Read
{video-stem}-analysis/summary.md
- Read
{video-stem}-analysis/transcript.md
- Read every frame image in
{video-stem}-analysis/frames/ (use the Read tool on each .jpg — Claude can see images)
This gives you full multimodal context: what was said (transcript) and what was shown (frames).
Step 6: Write the analysis
Create analysis.md in the current working directory with the following structure:
# Video Analysis: {filename}
**Duration:** {duration}
**Date analysed:** {today}
## Participants
List everyone visible or audible in the video. Note speaker identification confidence.
## Timeline Summary
For each major section/topic change, write a timestamped entry:
### [{timestamp range}] Section Title
What happened in this segment. Reference both audio (what was said) and visual (what was on screen) evidence.
> Key quote if relevant — [{timestamp}]
## Key Decisions
Bulleted list of decisions made (if applicable, e.g. for meetings).
## Action Items
- [ ] Action item with owner if identifiable — [{timestamp}]
## Visual Reference Points
Notable moments where the screen content is important:
- [{timestamp}] Description of what's shown and why it matters
## Whisper Accuracy Notes
Flag any sections where the transcript seems unreliable (garbled text, obvious misheard words, low-confidence sections).
Adapt the structure to fit the video type — a meeting needs decisions/action items, a tutorial needs step-by-step breakdowns, a presentation needs slide-by-slide coverage, etc.
Step 7: Stay in context
After writing the analysis, tell the user it's done and that you have the full transcript + frames in context. Offer to:
- Answer questions about specific moments
- Find when a particular topic was discussed
- Extract more detail from any section
- Compare what was said vs what was shown on screen
Options
--fpm <number>: Override the auto frame rate (e.g. --fpm 2 for one frame every 30s)
--whisper-model <size>: Use a different Whisper model (tiny, base, small, medium, large). Default is base. Use small or medium for better accuracy on difficult audio.
--no-transcribe: Skip Whisper transcription (frames only). Useful if the video has no speech or you only care about visuals.