ワンクリックでManusで任意のスキルを実行

始める

video-understanding

スター1

フォーク0

更新日2026年2月3日 14:16

Download videos and transcribe their content. Use when asked to understand, summarize, or analyze a video.

インストール

Codex または Claude でインストールこの Prompt をコピーして Codex、Claude、または他のアシスタントに貼り付けると、Skill ページを確認してインストールできます。

Manusで実行

ソース

az9713

az9713/whatsapp-claude

GitHub リポジトリを開く Creator のリポジトリを見る

ダウンロード

Manusで実行

Video Understanding Skill

Download videos and transcribe their content for analysis.

Prerequisites

yt-dlp installed (pip install yt-dlp or brew install yt-dlp)
ffmpeg installed (brew install ffmpeg or apt install ffmpeg)
Whisper installed (pip install openai-whisper)

Pipeline

1. Download Video

# Download video with yt-dlp
yt-dlp -o "assets/downloads/%(title)s.%(ext)s" "<VIDEO_URL>"

# For best quality
yt-dlp -f "bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best" \
  -o "assets/downloads/%(title)s.%(ext)s" "<VIDEO_URL>"

# For audio only (faster)
yt-dlp -x --audio-format mp3 \
  -o "assets/downloads/%(title)s.%(ext)s" "<VIDEO_URL>"

2. Extract Audio (if downloaded video)

ffmpeg -i "assets/downloads/video.mp4" \
  -vn -acodec mp3 -ab 128k \
  "assets/downloads/audio.mp3"

3. Transcribe with Whisper

# Basic transcription
whisper "assets/downloads/audio.mp3" \
  --model base \
  --output_format txt \
  --output_dir output/

# Higher quality (slower)
whisper "assets/downloads/audio.mp3" \
  --model medium \
  --output_format all \
  --output_dir output/

# With timestamps
whisper "assets/downloads/audio.mp3" \
  --model base \
  --output_format srt \
  --output_dir output/

4. Read and Analyze Transcript

Read the generated transcript file from output/
Summarize key points
Extract quotes and timestamps
Identify speakers if multiple

Model Options

Model	Size	Speed	Quality
tiny	39M	Fastest	Lower
base	74M	Fast	Good
small	244M	Medium	Better
medium	769M	Slow	High
large	1550M	Slowest	Highest

Output Formats

txt - Plain text transcript
srt - SubRip subtitles with timestamps
vtt - WebVTT subtitles
json - Detailed JSON with word-level timing
all - All formats

Tips

Use base model for speed, medium for accuracy
Add --language en to force English detection
Use --task translate to translate to English
Check assets/downloads/ for downloaded files
Store transcripts in output/transcripts/

このリポジトリの他の Skills

同じリポジトリ

avatar-video

az9713/whatsapp-claude

Generate lip-synced avatar video from text using OmniHuman v1.5. Use when creating talking-head or avatar videos.

2026-02-031

gmail

az9713/whatsapp-claude

Send and read emails via Gmail browser automation. Use when asked to send email or check inbox.

2026-02-031

schedule-job

az9713/whatsapp-claude

Schedule tasks using natural language time expressions. Use when asked to schedule a recurring or timed task.

2026-02-031

tts

az9713/whatsapp-claude

Generate voice-over audio using OpenAI TTS. Use when creating narration or voice for videos.

2026-02-031

video-render

az9713/whatsapp-claude

Render videos using Remotion compositions. Use when creating or generating videos.

2026-02-031

video-research

az9713/whatsapp-claude

Research topics for video content creation. Use when researching ideas for videos.

2026-02-031

Video Understanding Skill

Download videos and transcribe their content for analysis.

Prerequisites

yt-dlp installed (pip install yt-dlp or brew install yt-dlp)

ffmpeg installed (brew install ffmpeg or apt install ffmpeg)

Whisper installed (pip install openai-whisper)

Pipeline

1. Download Video

# Download video with yt-dlp yt-dlp -o "assets/downloads/%(title)s.%(ext)s" "<VIDEO_URL>" # For best quality yt-dlp -f "bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best" \ -o "assets/downloads/%(title)s.%(ext)s" "<VIDEO_URL>" # For audio only (faster) yt-dlp -x --audio-format mp3 \ -o "assets/downloads/%(title)s.%(ext)s" "<VIDEO_URL>"

2. Extract Audio (if downloaded video)

ffmpeg -i "assets/downloads/video.mp4" \ -vn -acodec mp3 -ab 128k \ "assets/downloads/audio.mp3"

3. Transcribe with Whisper

# Basic transcription whisper "assets/downloads/audio.mp3" \ --model base \ --output_format txt \ --output_dir output/ # Higher quality (slower) whisper "assets/downloads/audio.mp3" \ --model medium \ --output_format all \ --output_dir output/ # With timestamps whisper "assets/downloads/audio.mp3" \ --model base \ --output_format srt \ --output_dir output/

4. Read and Analyze Transcript

Read the generated transcript file from output/ Summarize key points Extract quotes and timestamps Identify speakers if multiple

Model Options

Model

Size

Speed

Quality

tiny

39M

Fastest

Lower

base

74M

Fast

Good

small

244M

Medium

Better

medium

769M

Slow

High

large

1550M

Slowest

Highest

Output Formats

txt - Plain text transcript

srt - SubRip subtitles with timestamps

vtt - WebVTT subtitles

json - Detailed JSON with word-level timing

all - All formats

Tips

Use base model for speed, medium for accuracy

Add --language en to force English detection

Use --task translate to translate to English

Check assets/downloads/ for downloaded files

Store transcripts in output/transcripts/

name	video-understanding
description	Download videos and transcribe their content. Use when asked to understand, summarize, or analyze a video.
allowed-tools	["Bash","Read","Write"]

name	video-understanding
description	Download videos and transcribe their content. Use when asked to understand, summarize, or analyze a video.
allowed-tools	["Bash","Read","Write"]