Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

Commencer

video-understanding

Étoiles1

Forks0

Mis à jour3 février 2026 à 14:16

Download videos and transcribe their content. Use when asked to understand, summarize, or analyze a video.

Installation

Installer avec Codex ou Claude Copiez ce prompt, collez-le dans Codex, Claude ou un autre assistant, puis laissez-le vérifier la page du skill et l'installer pour vous.

Exécuter dans Manus

Source

az9713

az9713/whatsapp-claude

Ouvrir le dépôt GitHub Voir les dépôts du créateur

Téléchargement

Exécuter dans Manus

Métiers associésSOC

Basé sur la classification professionnelle SOC

Développeurs de logicielsProfessions informatiques et mathématiques·SOC 15-1252

SKILL.md

readonly

name	video-understanding
description	Download videos and transcribe their content. Use when asked to understand, summarize, or analyze a video.
allowed-tools	["Bash","Read","Write"]

Video Understanding Skill

Download videos and transcribe their content for analysis.

Prerequisites

yt-dlp installed (pip install yt-dlp or brew install yt-dlp)
ffmpeg installed (brew install ffmpeg or apt install ffmpeg)
Whisper installed (pip install openai-whisper)

Pipeline

1. Download Video

# Download video with yt-dlp
yt-dlp -o "assets/downloads/%(title)s.%(ext)s" "<VIDEO_URL>"

# For best quality
yt-dlp -f "bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best" \
  -o "assets/downloads/%(title)s.%(ext)s" "<VIDEO_URL>"

# For audio only (faster)
yt-dlp -x --audio-format mp3 \
  -o "assets/downloads/%(title)s.%(ext)s" "<VIDEO_URL>"

2. Extract Audio (if downloaded video)

ffmpeg -i "assets/downloads/video.mp4" \
  -vn -acodec mp3 -ab 128k \
  "assets/downloads/audio.mp3"

3. Transcribe with Whisper

# Basic transcription
whisper "assets/downloads/audio.mp3" \
  --model base \
  --output_format txt \
  --output_dir output/

# Higher quality (slower)
whisper "assets/downloads/audio.mp3" \
  --model medium \
  --output_format all \
  --output_dir output/

# With timestamps
whisper "assets/downloads/audio.mp3" \
  --model base \
  --output_format srt \
  --output_dir output/

4. Read and Analyze Transcript

Read the generated transcript file from output/
Summarize key points
Extract quotes and timestamps
Identify speakers if multiple

Model Options

Model	Size	Speed	Quality
tiny	39M	Fastest	Lower
base	74M	Fast	Good
small	244M	Medium	Better
medium	769M	Slow	High
large	1550M	Slowest	Highest

Output Formats

txt - Plain text transcript
srt - SubRip subtitles with timestamps
vtt - WebVTT subtitles
json - Detailed JSON with word-level timing
all - All formats

Tips

Use base model for speed, medium for accuracy
Add --language en to force English detection
Use --task translate to translate to English
Check assets/downloads/ for downloaded files
Store transcripts in output/transcripts/

Plus depuis ce dépôt

même dépôt

avatar-video

az9713/whatsapp-claude

Generate lip-synced avatar video from text using OmniHuman v1.5. Use when creating talking-head or avatar videos.

2026-02-031

gmail

az9713/whatsapp-claude

Send and read emails via Gmail browser automation. Use when asked to send email or check inbox.

2026-02-031

schedule-job

az9713/whatsapp-claude

Schedule tasks using natural language time expressions. Use when asked to schedule a recurring or timed task.

2026-02-031

tts

az9713/whatsapp-claude

Generate voice-over audio using OpenAI TTS. Use when creating narration or voice for videos.

2026-02-031

video-render

az9713/whatsapp-claude

Render videos using Remotion compositions. Use when creating or generating videos.

2026-02-031

video-research

az9713/whatsapp-claude

Research topics for video content creation. Use when researching ideas for videos.

2026-02-031

Video Understanding Skill

Download videos and transcribe their content for analysis.

Prerequisites

yt-dlp installed (pip install yt-dlp or brew install yt-dlp)

ffmpeg installed (brew install ffmpeg or apt install ffmpeg)

Whisper installed (pip install openai-whisper)

Pipeline

1. Download Video

# Download video with yt-dlp yt-dlp -o "assets/downloads/%(title)s.%(ext)s" "<VIDEO_URL>" # For best quality yt-dlp -f "bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best" \ -o "assets/downloads/%(title)s.%(ext)s" "<VIDEO_URL>" # For audio only (faster) yt-dlp -x --audio-format mp3 \ -o "assets/downloads/%(title)s.%(ext)s" "<VIDEO_URL>"

2. Extract Audio (if downloaded video)

ffmpeg -i "assets/downloads/video.mp4" \ -vn -acodec mp3 -ab 128k \ "assets/downloads/audio.mp3"

3. Transcribe with Whisper

# Basic transcription whisper "assets/downloads/audio.mp3" \ --model base \ --output_format txt \ --output_dir output/ # Higher quality (slower) whisper "assets/downloads/audio.mp3" \ --model medium \ --output_format all \ --output_dir output/ # With timestamps whisper "assets/downloads/audio.mp3" \ --model base \ --output_format srt \ --output_dir output/

4. Read and Analyze Transcript

Read the generated transcript file from output/ Summarize key points Extract quotes and timestamps Identify speakers if multiple

Model Options

Model

Size

Speed

Quality

tiny

39M

Fastest

Lower

base

74M

Fast

Good

small

244M

Medium

Better

medium

769M

Slow

High

large

1550M

Slowest

Highest

Output Formats

txt - Plain text transcript

srt - SubRip subtitles with timestamps

vtt - WebVTT subtitles

json - Detailed JSON with word-level timing

all - All formats

Tips

Use base model for speed, medium for accuracy

Add --language en to force English detection

Use --task translate to translate to English

Check assets/downloads/ for downloaded files

Store transcripts in output/transcripts/