一键导入
audio-extraction
Audio Extraction: Extracting audio from videos, converting formats, and managing audio collections
用 Codex 或 Claude 帮你安装 复制这段 Prompt,粘贴到 Codex、Claude 或其他助手里,让它检查 Skill 页面并帮你完成安装。
菜单
Audio Extraction: Extracting audio from videos, converting formats, and managing audio collections
用 Codex 或 Claude 帮你安装 复制这段 Prompt,粘贴到 Codex、Claude 或其他助手里,让它检查 Skill 页面并帮你完成安装。
基于 SOC 职业分类
Generate a 15-30 second scrolling video tour of any GitHub repository page with ElevenLabs AI narration and word-by-word subtitle sync. Captures a full-page mobile-viewport screenshot, scrolls top-to-bottom with GSAP, and burns synced subtitles onto the final MP4 using HyperFrames CLI.
Lightweight personal knowledge base — markdown + YAML frontmatter structured notes with full-text search and cross-referencing for AI agents
Automated daily tech briefing — multi-source collection → knowledge-base deduplication → AI summarization → TTS speech synthesis, generating MP3 audio briefings
Generate 1080x1920 Instagram Reels video promos for GitHub repositories using HyperFrames. 7-beat structure with fullscreen scrolling phone mockup, GSAP animations, dark GitHub theme, repo stats, ElevenLabs AI voiceover synced to scroll duration, and follow CTA. Depends on the website-to-hyperframes skill for HyperFrames composition patterns.
Design safe X/Twitter automation workflows for tweet search, reply reads, monitoring, posting, and agent-operated social media actions
Assess worker classification and compliance risk for temporary event staffing in the US and Canada. Use when a user asks about W-2 vs 1099 event workers, misclassification penalties, joint-employer liability, COI requirements, or wage/hour rules for event staff. Includes live state-by-state lookups via the TempGuru MCP server.
| name | audio-extraction |
| description | Audio Extraction: Extracting audio from videos, converting formats, and managing audio collections |
| metadata | {"author":"cosmicstack-labs","version":"1.0.0","category":"media-download","tags":["audio-extraction","mp3-conversion","audio-conversion","ffmpeg","music"]} |
Extract high-quality audio from video files, convert between formats, manage metadata, and build organized audio collections. This skill covers everything from one-off audio rips to batch processing pipelines.
You cannot create quality that wasn't captured. Start with the highest quality source available — lossy-to-lossy transcoding degrades audio further. Always extract from the best original source.
Untagged audio files are unmanageable at scale. Proper ID3 tags, cover art, and consistent naming conventions turn a pile of files into a browsable music library.
Always keep a copy of the original file or at minimum log what source was used. Once you transcode, you lose information. Archival means keeping the best available original plus a convenient playback copy.
# Simplest audio extraction (best quality)
yt-dlp -x "https://youtube.com/watch?v=VIDEO_ID"
# Specific audio format
yt-dlp -x --audio-format mp3 "https://youtube.com/watch?v=VIDEO_ID"
# Best quality with metadata
yt-dlp -x --audio-format mp3 --audio-quality 0 \
--embed-thumbnail --embed-metadata "URL"
# MP3 at various quality levels
yt-dlp -x --audio-format mp3 --audio-quality 0 "URL" # 320kbps (best)
yt-dlp -x --audio-format mp3 --audio-quality 2 "URL" # ~256kbps
yt-dlp -x --audio-format mp3 --audio-quality 5 "URL" # ~192kbps (good)
yt-dlp -x --audio-format mp3 --audio-quality 9 "URL" # ~128kbps (acceptable)
# FLAC (lossless)
yt-dlp -x --audio-format flac --audio-quality 0 "URL"
# AAC/M4A
yt-dlp -x --audio-format m4a "URL"
# Opus (best quality-per-bitrate)
yt-dlp -x --audio-format opus "URL"
# WAV (uncompressed)
yt-dlp -x --audio-format wav "URL"
# List available audio formats
yt-dlp -F "URL" | grep -E "audio|opus|aac|mp3|m4a"
# Download specific audio stream
yt-dlp -f "140" "URL" # 128kbps AAC (YouTube standard)
# Download highest bitrate audio
yt-dlp -f "bestaudio[abr>128]/bestaudio" "URL"
# Download Opus stream (YouTube music)
yt-dlp -f "251" "URL" # 160kbps Opus
# Convert MP4 to MP3
ffmpeg -i input.mp4 -vn -acodec libmp3lame -ab 320k output.mp3
# Convert any video to FLAC
ffmpeg -i input.mkv -vn -c:a flac output.flac
# Batch convert all MP4s in directory
for f in *.mp4; do
ffmpeg -i "$f" -vn -acodec libmp3lame -ab 320k "${f%.mp4}.mp3"
done
# Trim from 30s to 1m30s
ffmpeg -i input.mp3 -ss 00:00:30 -to 00:01:30 -c copy output.mp3
# Trim from start for 45 seconds
ffmpeg -i input.mp3 -t 45 -c copy output.mp3
# Trim with re-encoding (for precise cuts)
ffmpeg -i input.mp3 -ss 00:00:30 -to 00:01:30 output.mp3
# Concatenate with ffmpeg (same format)
ffmpeg -i "concat:file1.mp3|file2.mp3|file3.mp3" -c copy merged.mp3
# Using concat demuxer
echo "file 'part1.mp3'" > files.txt
echo "file 'part2.mp3'" >> files.txt
echo "file 'part3.mp3'" >> files.txt
ffmpeg -f concat -safe 0 -i files.txt -c copy merged.mp3
# Merge with crossfade
ffmpeg -i part1.mp3 -i part2.mp3 -filter_complex \
"[0:a][1:a]acrossfade=d=2:c1=tri:c2=tri[a]" \
-map "[a]" merged.mp3
# EBU R128 loudness normalization (broadcast standard)
ffmpeg -i input.mp3 -af loudnorm=I=-16:LRA=11:TP=-1.5 output.mp3
# Peak normalization (simpler)
ffmpeg -i input.mp3 -af volume=3dB output.mp3
# Dynamic range compression
ffmpeg -i input.mp3 -af acompressor=threshold=-21dB:ratio=9:attack=200:release=1000 output.mp3
# Normalize batch files
for f in *.mp3; do
ffmpeg -i "$f" -af loudnorm=I=-16:LRA=11:TP=-1.5 "normalized_$f"
done
# Install eyeD3
pip install eyeD3
# Set basic tags
eyeD3 -a "Artist Name" -A "Album Title" -t "Song Title" -n 1 -N 10 track.mp3
# Set genre and year
eyeD3 -G "Rock" -Y 2024 track.mp3
# Add album art
eyeD3 --add-image cover.jpg:FRONT_COVER track.mp3
# Remove all tags
eyeD3 --remove-all track.mp3
from mutagen.mp3 import MP3
from mutagen.id3 import ID3, TIT2, TPE1, TALB, TRCK, TYER, APIC
import os
def tag_audio_file(filepath, metadata, cover_art_path=None):
"""
Tag an audio file with comprehensive metadata.
Args:
filepath: Path to the audio file
metadata: Dict with keys: title, artist, album, track, year, genre
cover_art_path: Path to cover art image
"""
audio = MP3(filepath, ID3=ID3)
audio.tags.add(TIT2(encoding=3, text=metadata['title']))
audio.tags.add(TPE1(encoding=3, text=metadata['artist']))
audio.tags.add(TALB(encoding=3, text=metadata['album']))
audio.tags.add(TRCK(encoding=3, text=str(metadata['track'])))
audio.tags.add(TYER(encoding=3, text=str(metadata['year'])))
if cover_art_path and os.path.exists(cover_art_path):
with open(cover_art_path, 'rb') as img:
audio.tags.add(
APIC(
encoding=3,
mime='image/jpeg',
type=3, # Front cover
desc='Cover',
data=img.read()
)
)
audio.save()
# Usage
tag_audio_file('track.mp3', {
'title': 'Bohemian Rhapsody',
'artist': 'Queen',
'album': 'A Night at the Opera',
'track': 11,
'year': 1975,
'genre': 'Rock'
}, 'cover.jpg')
import os
import re
from mutagen.mp3 import MP3
from mutagen.id3 import ID3, TIT2, TPE1, TALB
def tag_from_filename(directory, pattern=r"(.+?) - (.+?) - (.+)\.mp3"):
"""
Tag files based on filename pattern.
Default pattern: "Artist - Album - Title.mp3"
"""
for filename in os.listdir(directory):
if not filename.endswith('.mp3'):
continue
match = re.match(pattern, filename)
if not match:
continue
artist, album, title = match.groups()
filepath = os.path.join(directory, filename)
audio = MP3(filepath, ID3=ID3)
audio.tags.add(TPE1(encoding=3, text=artist.strip()))
audio.tags.add(TALB(encoding=3, text=album.strip()))
audio.tags.add(TIT2(encoding=3, text=title.strip()))
audio.save()
print(f"Tagged: {filename} → {artist} / {album} / {title}")
# Usage
tag_from_filename("~/Music/Downloads/")
# Download podcast episode from RSS
yt-dlp -x --audio-format mp3 --audio-quality 0 "PODCAST_RSS_URL"
# Download only the latest episode
yt-dlp --playlist-end 1 -x --audio-format mp3 "RSS_URL"
# Download with consistent naming
yt-dlp -o "%(title)s.%(ext)s" -x --audio-format mp3 "RSS_URL"
# Install gPodder
pip install gpodder
# Subscribe to a podcast
gpo add "https://example.com/podcast/rss"
# Download new episodes
gpo download
# List subscriptions
gpo list
import feedparser
import requests
import os
from urllib.parse import urlparse
def download_podcast_episodes(rss_url, output_dir="~/Podcasts"):
"""Download all episodes from an RSS feed."""
output_dir = os.path.expanduser(output_dir)
os.makedirs(output_dir, exist_ok=True)
feed = feedparser.parse(rss_url)
podcast_title = feed.feed.get('title', 'Unknown Podcast')
podcast_dir = os.path.join(output_dir, podcast_title)
os.makedirs(podcast_dir, exist_ok=True)
for entry in feed.entries:
title = entry.get('title', 'Unknown Episode')
# Sanitize filename
safe_title = "".join(c for c in title if c.isalnum() or c in ' -_').rstrip()
# Find audio enclosure
for link in entry.get('links', []):
if link.get('type', '').startswith('audio/'):
audio_url = link['href']
ext = os.path.splitext(urlparse(audio_url).path)[1] or '.mp3'
filepath = os.path.join(podcast_dir, f"{safe_title}{ext}")
if os.path.exists(filepath):
print(f"✓ Already downloaded: {title}")
continue
print(f"↓ Downloading: {title}")
response = requests.get(audio_url, stream=True)
with open(filepath, 'wb') as f:
for chunk in response.iter_content(chunk_size=8192):
if chunk:
f.write(chunk)
print(f"✓ Saved: {filepath}")
break
# Usage
download_podcast_episodes("https://feeds.example.com/podcast/rss.xml")
# Extract audio from all videos in directory
for f in *.mp4 *.mkv *.webm; do
[ -e "$f" ] || continue
ffmpeg -i "$f" -vn -acodec libmp3lame -ab 320k "${f%.*}.mp3"
done
import os
import subprocess
def extract_audio_recursive(root_dir, output_format='mp3', bitrate='320k'):
"""Extract audio from all video files in directory tree."""
video_extensions = {'.mp4', '.mkv', '.webm', '.avi', '.mov', '.flv'}
for dirpath, dirnames, filenames in os.walk(root_dir):
for filename in filenames:
ext = os.path.splitext(filename)[1].lower()
if ext not in video_extensions:
continue
input_path = os.path.join(dirpath, filename)
output_name = os.path.splitext(filename)[0] + f'.{output_format}'
output_path = os.path.join(dirpath, output_name)
if os.path.exists(output_path):
print(f"✓ Already exists: {output_name}")
continue
print(f"⟳ Extracting: {filename} → {output_name}")
cmd = [
'ffmpeg', '-i', input_path,
'-vn',
'-c:a', 'libmp3lame' if output_format == 'mp3' else output_format,
'-b:a', bitrate,
'-y', output_path
]
subprocess.run(cmd, capture_output=True)
print(f"✓ Done: {output_name}")
# Usage
extract_audio_recursive("~/Videos/Recordings", output_format='mp3', bitrate='320k')
import os
import subprocess
from concurrent.futures import ThreadPoolExecutor, as_completed
def extract_audio_parallel(root_dir, workers=4):
"""Extract audio using multiple parallel workers."""
video_files = []
video_extensions = {'.mp4', '.mkv', '.webm'}
for dirpath, _, filenames in os.walk(root_dir):
for f in filenames:
if os.path.splitext(f)[1].lower() in video_extensions:
video_files.append(os.path.join(dirpath, f))
def process_file(filepath):
output = os.path.splitext(filepath)[0] + '.mp3'
if os.path.exists(output):
return f"✓ Skipped (exists): {os.path.basename(filepath)}"
cmd = [
'ffmpeg', '-i', filepath,
'-vn', '-c:a', 'libmp3lame',
'-b:a', '320k', '-y', output
]
subprocess.run(cmd, capture_output=True, timeout=300)
return f"✓ Extracted: {os.path.basename(filepath)}"
with ThreadPoolExecutor(max_workers=workers) as executor:
futures = {executor.submit(process_file, f): f for f in video_files}
for future in as_completed(futures):
print(future.result())
# Usage
extract_audio_parallel("~/Videos", workers=4)
import subprocess
import json
def normalize_loudness(input_file, output_file, target_lufs=-16):
"""
Normalize audio to target loudness using EBU R128 standard.
Args:
input_file: Source audio file
output_file: Output file path
target_lufs: Target loudness in LUFS (default: -16 for podcasts, -14 for music)
"""
# First pass: measure loudness
measure_cmd = [
'ffmpeg', '-i', input_file,
'-af', f'loudnorm=I={target_lufs}:LRA=11:TP=-1.5:print_format=json',
'-f', 'null', '-'
]
result = subprocess.run(measure_cmd, capture_output=True, text=True, timeout=60)
# Second pass: apply normalization
normalize_cmd = [
'ffmpeg', '-i', input_file,
'-af', f'loudnorm=I={target_lufs}:LRA=11:TP=-1.5',
'-c:a', 'libmp3lame', '-b:a', '320k',
'-y', output_file
]
subprocess.run(normalize_cmd, capture_output=True, timeout=120)
print(f"Normalized to {target_lufs} LUFS: {output_file}")
# Usage
normalize_loudness("input.mp3", "output.mp3", target_lufs=-16)
import subprocess
import json
def split_by_chapters(input_file, output_dir="splits"):
"""
Split an audio file into chapters using ffmpeg chapter metadata.
"""
import os
os.makedirs(output_dir, exist_ok=True)
# Get chapter info
cmd = [
'ffprobe', '-i', input_file,
'-print_format', 'json',
'-show_chapters',
'-loglevel', 'error'
]
result = subprocess.run(cmd, capture_output=True, text=True)
chapters = json.loads(result.stdout).get('chapters', [])
if not chapters:
print("No chapters found in the file.")
return
for chapter in chapters:
start = chapter['start_time']
end = chapter['end_time']
title = chapter.get('tags', {}).get('title', f'Chapter {chapter["id"]}')
safe_title = "".join(c for c in title if c.isalnum() or c in ' -_')
output_path = os.path.join(output_dir, f"{safe_title}.mp3")
cmd = [
'ffmpeg', '-i', input_file,
'-ss', str(start),
'-to', str(end),
'-c:a', 'libmp3lame', '-b:a', '320k',
'-y', output_path
]
subprocess.run(cmd, capture_output=True, timeout=300)
print(f"✓ Split: {title} ({start}s → {end}s)")
# Usage
split_by_chapters("podcast.mp3", "~/Music/Splits")
import subprocess
import os
def prepare_for_transcription(video_file, output_wav="speech.wav"):
"""
Extract clean speech-optimized audio for transcription.
Converts to mono 16kHz WAV (standard for speech recognition).
"""
cmd = [
'ffmpeg', '-i', video_file,
'-vn', # No video
'-acodec', 'pcm_s16le', # 16-bit PCM
'-ac', '1', # Mono
'-ar', '16000', # 16kHz sample rate
'-af', 'highpass=200,lowpass=8000', # Speech frequency filter
'-y', output_wav
]
subprocess.run(cmd, capture_output=True, timeout=300)
print(f"✓ Audio prepared for transcription: {output_wav}")
return output_wav
# Usage
prepare_for_transcription("lecture.mp4", "lecture_audio.wav")
| Level | Coverage | Quality | Metadata | Automation |
|---|---|---|---|---|
| 1: Basic | One-off extractions | Default quality | None | Manual |
| 2: Consistent | Format selection, basic batch | Target bitrate | Basic tags | Shell scripts |
| 3: Organized | Batch processing, normalization | Optimized per use case | Full ID3 + album art | Config presets |
| 4: Automated | Watch folders, scheduled jobs | Verified quality | Automatic tagging | Cron jobs + webhooks |
| 5: Library | Full pipeline, multi-format archive | Lossless originals + playback copies | Complete metadata + cover | Full automation with monitoring |
Target: Level 3 for personal music collections. Level 4 for podcast production pipelines. Level 5 for media archiving at scale.
-y cautiously.