一键在 Manus 中运行任何 Skill

开始使用

video-understanding

星标1

分支0

更新时间2026年2月3日 14:16

Download videos and transcribe their content. Use when asked to understand, summarize, or analyze a video.

安装

用 Codex 或 Claude 帮你安装复制这段 Prompt，粘贴到 Codex、Claude 或其他助手里，让它检查 Skill 页面并帮你完成安装。

在 Manus 中运行

来源

az9713

az9713/whatsapp-claude

打开 GitHub 仓库查看创作者相关仓库

下载

在 Manus 中运行

Video Understanding Skill

Download videos and transcribe their content for analysis.

Prerequisites

yt-dlp installed (pip install yt-dlp or brew install yt-dlp)
ffmpeg installed (brew install ffmpeg or apt install ffmpeg)
Whisper installed (pip install openai-whisper)

Pipeline

1. Download Video

# Download video with yt-dlp
yt-dlp -o "assets/downloads/%(title)s.%(ext)s" "<VIDEO_URL>"

# For best quality
yt-dlp -f "bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best" \
  -o "assets/downloads/%(title)s.%(ext)s" "<VIDEO_URL>"

# For audio only (faster)
yt-dlp -x --audio-format mp3 \
  -o "assets/downloads/%(title)s.%(ext)s" "<VIDEO_URL>"

2. Extract Audio (if downloaded video)

ffmpeg -i "assets/downloads/video.mp4" \
  -vn -acodec mp3 -ab 128k \
  "assets/downloads/audio.mp3"

3. Transcribe with Whisper

# Basic transcription
whisper "assets/downloads/audio.mp3" \
  --model base \
  --output_format txt \
  --output_dir output/

# Higher quality (slower)
whisper "assets/downloads/audio.mp3" \
  --model medium \
  --output_format all \
  --output_dir output/

# With timestamps
whisper "assets/downloads/audio.mp3" \
  --model base \
  --output_format srt \
  --output_dir output/

4. Read and Analyze Transcript

Read the generated transcript file from output/
Summarize key points
Extract quotes and timestamps
Identify speakers if multiple

Model Options

Model	Size	Speed	Quality
tiny	39M	Fastest	Lower
base	74M	Fast	Good
small	244M	Medium	Better
medium	769M	Slow	High
large	1550M	Slowest	Highest

Output Formats

txt - Plain text transcript
srt - SubRip subtitles with timestamps
vtt - WebVTT subtitles
json - Detailed JSON with word-level timing
all - All formats

Tips

Use base model for speed, medium for accuracy
Add --language en to force English detection
Use --task translate to translate to English
Check assets/downloads/ for downloaded files
Store transcripts in output/transcripts/

同仓库更多 Skills

同仓库

avatar-video

az9713/whatsapp-claude

Generate lip-synced avatar video from text using OmniHuman v1.5. Use when creating talking-head or avatar videos.

2026-02-031

gmail

az9713/whatsapp-claude

Send and read emails via Gmail browser automation. Use when asked to send email or check inbox.

2026-02-031

schedule-job

az9713/whatsapp-claude

Schedule tasks using natural language time expressions. Use when asked to schedule a recurring or timed task.

2026-02-031

tts

az9713/whatsapp-claude

Generate voice-over audio using OpenAI TTS. Use when creating narration or voice for videos.

2026-02-031

video-render

az9713/whatsapp-claude

Render videos using Remotion compositions. Use when creating or generating videos.

2026-02-031

video-research

az9713/whatsapp-claude

Research topics for video content creation. Use when researching ideas for videos.

2026-02-031

Video Understanding Skill

Download videos and transcribe their content for analysis.

Prerequisites

yt-dlp installed (pip install yt-dlp or brew install yt-dlp)

ffmpeg installed (brew install ffmpeg or apt install ffmpeg)

Whisper installed (pip install openai-whisper)

Pipeline

1. Download Video

# Download video with yt-dlp yt-dlp -o "assets/downloads/%(title)s.%(ext)s" "<VIDEO_URL>" # For best quality yt-dlp -f "bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best" \ -o "assets/downloads/%(title)s.%(ext)s" "<VIDEO_URL>" # For audio only (faster) yt-dlp -x --audio-format mp3 \ -o "assets/downloads/%(title)s.%(ext)s" "<VIDEO_URL>"

2. Extract Audio (if downloaded video)

ffmpeg -i "assets/downloads/video.mp4" \ -vn -acodec mp3 -ab 128k \ "assets/downloads/audio.mp3"

3. Transcribe with Whisper

# Basic transcription whisper "assets/downloads/audio.mp3" \ --model base \ --output_format txt \ --output_dir output/ # Higher quality (slower) whisper "assets/downloads/audio.mp3" \ --model medium \ --output_format all \ --output_dir output/ # With timestamps whisper "assets/downloads/audio.mp3" \ --model base \ --output_format srt \ --output_dir output/

4. Read and Analyze Transcript

Read the generated transcript file from output/ Summarize key points Extract quotes and timestamps Identify speakers if multiple

Model Options

Model

Size

Speed

Quality

tiny

39M

Fastest

Lower

base

74M

Fast

Good

small

244M

Medium

Better

medium

769M

Slow

High

large

1550M

Slowest

Highest

Output Formats

txt - Plain text transcript

srt - SubRip subtitles with timestamps

vtt - WebVTT subtitles

json - Detailed JSON with word-level timing

all - All formats

Tips

Use base model for speed, medium for accuracy

Add --language en to force English detection

Use --task translate to translate to English

Check assets/downloads/ for downloaded files

Store transcripts in output/transcripts/

name	video-understanding
description	Download videos and transcribe their content. Use when asked to understand, summarize, or analyze a video.
allowed-tools	["Bash","Read","Write"]

name	video-understanding
description	Download videos and transcribe their content. Use when asked to understand, summarize, or analyze a video.
allowed-tools	["Bash","Read","Write"]