원클릭으로 Manus에서 모든 스킬 실행

시작하기

video-understanding

스타1

포크0

업데이트2026년 2월 3일 14:16

Download videos and transcribe their content. Use when asked to understand, summarize, or analyze a video.

설치

Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.

Manus에서 실행

출처

az9713

az9713/whatsapp-claude

GitHub 저장소 열기 Creator 저장소 보기

다운로드

Manus에서 실행

Video Understanding Skill

Download videos and transcribe their content for analysis.

Prerequisites

yt-dlp installed (pip install yt-dlp or brew install yt-dlp)
ffmpeg installed (brew install ffmpeg or apt install ffmpeg)
Whisper installed (pip install openai-whisper)

Pipeline

1. Download Video

# Download video with yt-dlp
yt-dlp -o "assets/downloads/%(title)s.%(ext)s" "<VIDEO_URL>"

# For best quality
yt-dlp -f "bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best" \
  -o "assets/downloads/%(title)s.%(ext)s" "<VIDEO_URL>"

# For audio only (faster)
yt-dlp -x --audio-format mp3 \
  -o "assets/downloads/%(title)s.%(ext)s" "<VIDEO_URL>"

2. Extract Audio (if downloaded video)

ffmpeg -i "assets/downloads/video.mp4" \
  -vn -acodec mp3 -ab 128k \
  "assets/downloads/audio.mp3"

3. Transcribe with Whisper

# Basic transcription
whisper "assets/downloads/audio.mp3" \
  --model base \
  --output_format txt \
  --output_dir output/

# Higher quality (slower)
whisper "assets/downloads/audio.mp3" \
  --model medium \
  --output_format all \
  --output_dir output/

# With timestamps
whisper "assets/downloads/audio.mp3" \
  --model base \
  --output_format srt \
  --output_dir output/

4. Read and Analyze Transcript

Read the generated transcript file from output/
Summarize key points
Extract quotes and timestamps
Identify speakers if multiple

Model Options

Model	Size	Speed	Quality
tiny	39M	Fastest	Lower
base	74M	Fast	Good
small	244M	Medium	Better
medium	769M	Slow	High
large	1550M	Slowest	Highest

Output Formats

txt - Plain text transcript
srt - SubRip subtitles with timestamps
vtt - WebVTT subtitles
json - Detailed JSON with word-level timing
all - All formats

Tips

Use base model for speed, medium for accuracy
Add --language en to force English detection
Use --task translate to translate to English
Check assets/downloads/ for downloaded files
Store transcripts in output/transcripts/

이 저장소의 다른 Skills

같은 저장소

avatar-video

az9713/whatsapp-claude

Generate lip-synced avatar video from text using OmniHuman v1.5. Use when creating talking-head or avatar videos.

2026-02-031

gmail

az9713/whatsapp-claude

Send and read emails via Gmail browser automation. Use when asked to send email or check inbox.

2026-02-031

schedule-job

az9713/whatsapp-claude

Schedule tasks using natural language time expressions. Use when asked to schedule a recurring or timed task.

2026-02-031

tts

az9713/whatsapp-claude

Generate voice-over audio using OpenAI TTS. Use when creating narration or voice for videos.

2026-02-031

video-render

az9713/whatsapp-claude

Render videos using Remotion compositions. Use when creating or generating videos.

2026-02-031

video-research

az9713/whatsapp-claude

Research topics for video content creation. Use when researching ideas for videos.

2026-02-031

Video Understanding Skill

Download videos and transcribe their content for analysis.

Prerequisites

yt-dlp installed (pip install yt-dlp or brew install yt-dlp)

ffmpeg installed (brew install ffmpeg or apt install ffmpeg)

Whisper installed (pip install openai-whisper)

Pipeline

1. Download Video

# Download video with yt-dlp yt-dlp -o "assets/downloads/%(title)s.%(ext)s" "<VIDEO_URL>" # For best quality yt-dlp -f "bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best" \ -o "assets/downloads/%(title)s.%(ext)s" "<VIDEO_URL>" # For audio only (faster) yt-dlp -x --audio-format mp3 \ -o "assets/downloads/%(title)s.%(ext)s" "<VIDEO_URL>"

2. Extract Audio (if downloaded video)

ffmpeg -i "assets/downloads/video.mp4" \ -vn -acodec mp3 -ab 128k \ "assets/downloads/audio.mp3"

3. Transcribe with Whisper

# Basic transcription whisper "assets/downloads/audio.mp3" \ --model base \ --output_format txt \ --output_dir output/ # Higher quality (slower) whisper "assets/downloads/audio.mp3" \ --model medium \ --output_format all \ --output_dir output/ # With timestamps whisper "assets/downloads/audio.mp3" \ --model base \ --output_format srt \ --output_dir output/

4. Read and Analyze Transcript

Read the generated transcript file from output/ Summarize key points Extract quotes and timestamps Identify speakers if multiple

Model Options

Model

Size

Speed

Quality

tiny

39M

Fastest

Lower

base

74M

Fast

Good

small

244M

Medium

Better

medium

769M

Slow

High

large

1550M

Slowest

Highest

Output Formats

txt - Plain text transcript

srt - SubRip subtitles with timestamps

vtt - WebVTT subtitles

json - Detailed JSON with word-level timing

all - All formats

Tips

Use base model for speed, medium for accuracy

Add --language en to force English detection

Use --task translate to translate to English

Check assets/downloads/ for downloaded files

Store transcripts in output/transcripts/

name	video-understanding
description	Download videos and transcribe their content. Use when asked to understand, summarize, or analyze a video.
allowed-tools	["Bash","Read","Write"]

name	video-understanding
description	Download videos and transcribe their content. Use when asked to understand, summarize, or analyze a video.
allowed-tools	["Bash","Read","Write"]