Run any Skill in Manus with one click

$pwd:

download-podcast-transcripts

Name: Download Podcast Transcripts
Author: wzh-labs

// Download transcripts for the latest episodes of a podcast hosted on YouTube. Use when the user asks to "download podcast transcripts", "get All-in transcripts", "fetch transcripts from a YouTube channel", "archive podcast captions", or similar. Defaults to the All-in Podcast (@allin) but works with any YouTube channel that posts full episodes.

Run Skill in Manus

$ git log --oneline --stat

stars:0

forks:0

updated:May 6, 2026 at 20:21

File Explorer

2 files

SKILL.md

readonly

package.json

"author": "wzh-labs"

"repository": "wzh-labs/bread-n-butter"

View GitHub Repository

$ install --globalskills.sh

$ download --local

Run Skill in Manus

[HINT] Download the complete skill directory including SKILL.md and all related files

Run any Skill with one click

name	download-podcast-transcripts
description	Download transcripts for the latest episodes of a podcast hosted on YouTube. Use when the user asks to "download podcast transcripts", "get All-in transcripts", "fetch transcripts from a YouTube channel", "archive podcast captions", or similar. Defaults to the All-in Podcast (@allin) but works with any YouTube channel that posts full episodes.

Download podcast transcripts from YouTube

Goal: pull plain-text transcripts for the latest N episodes of a podcast hosted on a YouTube channel into a local archive that re-runs only fetch new episodes. Optimized for podcast channels but works on any channel.

Inputs

All optional:

channel — YouTube handle (@allin), channel URL, or single video URL. Default: @allin (All-in Podcast). If a single video URL is given, only that episode is fetched.
N — number of latest episodes to download (default 5, max 50).
since — YYYY-MM-DD; download all episodes with upload_date >= since. Overrides N when provided.
min_duration_minutes — drop entries shorter than this (default 10, to filter Shorts and trailers). Set to 0 to disable.

If the user names a podcast without giving a URL (e.g. "Acquired", "Lex Fridman"), do a quick web search to resolve the canonical YouTube channel handle and confirm before proceeding.

Storage layout

Under ~/knowledge/podcasts/<channel-slug>/ (override with $KNOWLEDGE_ROOT):

<channel-slug>/
  index.json                              # all episodes ever indexed
  transcripts/
    <YYYY-MM-DD>-<videoid>.txt            # cleaned transcript
    <YYYY-MM-DD>-<videoid>.json           # episode metadata sidecar

<channel-slug> is the lowercased handle without @ (e.g. allin). For channels without handles, use the channel ID slug yt-dlp returns (lowercased, prefix-stripped).

Method

1. Verify yt-dlp is available

yt-dlp --version

If missing, tell the user to install it (brew install yt-dlp or pip install yt-dlp) and stop. Do not try to work around its absence.

2. List candidate videos

For a channel handle or URL, enumerate the Videos tab:

yt-dlp \
  --flat-playlist \
  --playlist-end <N or 50> \
  --print "%(id)s|%(upload_date)s|%(title)s|%(duration)s" \
  "https://www.youtube.com/<channel>/videos"

Notes:

For a handle, the URL is https://www.youtube.com/@<handle>/videos. For a channel ID, https://www.youtube.com/channel/<id>/videos.
yt-dlp's flat-playlist output is newest-first by upload date.
Use --playlist-end <N> to stop yt-dlp from enumerating the entire channel — much faster.
For since mode, set --playlist-end 50 and filter the output to upload_date >= since after fetching.
If the input is a single video URL, skip this step and treat that one video as the only candidate.

Drop candidates with duration < min_duration_minutes * 60 to filter out Shorts.

3. Skip already-downloaded episodes

Load index.json (if present) and build a set of known video IDs. Drop any candidate whose ID is already indexed. If index.json doesn't exist yet, treat the set as empty.

4. Download metadata + auto-captions per new video

For each remaining candidate:

yt-dlp \
  --skip-download \
  --write-info-json \
  --write-sub --write-auto-sub \
  --sub-lang "en,en-US,en-GB" \
  --sub-format vtt \
  -o "<knowledge_root>/podcasts/<channel-slug>/_tmp/%(id)s" \
  "https://www.youtube.com/watch?v=<video-id>"

This writes <videoid>.info.json and <videoid>.<lang>.vtt into _tmp/. yt-dlp prefers manual subs over auto-generated when both exist. If no English captions exist for a video (rare on major podcast channels), log it and skip.

5. Convert VTT to clean plain text

Use the bundled helper:

uv run ~/.claude/skills/download-podcast-transcripts/clean_vtt.py \
  "<knowledge_root>/podcasts/<channel-slug>/_tmp/<videoid>.en.vtt" \
  > "<knowledge_root>/podcasts/<channel-slug>/transcripts/<YYYY-MM-DD>-<videoid>.txt"

The script strips WEBVTT headers, cue timings, inline <00:00:00.000><c> timing tags, and the rolling-window duplicates that YouTube auto-captions emit, leaving plain space-joined text.

If the VTT filename uses a regional variant (en-US.vtt, en-GB.vtt), pass that path instead.

6. Write the metadata sidecar

For each downloaded episode, write <YYYY-MM-DD>-<videoid>.json next to the .txt:

{
  "video_id": "abc123",
  "title": "E218: Buffett retires, Tariff fallout, Anthropic raises",
  "channel": "allin",
  "upload_date": "2026-04-25",
  "duration_seconds": 5421,
  "url": "https://www.youtube.com/watch?v=abc123",
  "captions_source": "auto",
  "transcript_path": "transcripts/2026-04-25-abc123.txt",
  "downloaded_at": "2026-04-29"
}

Pull title, upload_date, and duration from <videoid>.info.json. Set captions_source to manual if info_json["subtitles"]["en"] (or any en-* variant) exists — that means yt-dlp downloaded a manually-uploaded track. Otherwise auto.

7. Update `index.json`

Schema:

{
  "channel": "allin",
  "channel_url": "https://www.youtube.com/@allin",
  "episodes": [
    {
      "video_id": "abc123",
      "upload_date": "2026-04-25",
      "title": "E218: ...",
      "duration_seconds": 5421,
      "captions_source": "auto",
      "transcript_path": "transcripts/2026-04-25-abc123.txt"
    }
  ]
}

Sort episodes[] by upload_date descending. On re-run, merge: keep existing entries, prepend new ones.

8. Clean up `_tmp/`

After successful processing, remove <channel-slug>/_tmp/. On any failure, leave it in place so the user can inspect raw .vtt and .info.json files.

9. Commit and push

~/knowledge is a git repository. From $KNOWLEDGE_ROOT (default ~/knowledge):

git add podcasts/<channel-slug>/
git commit -m "podcasts(<channel-slug>): <n> new episode(s) — <oldest-date>..<newest-date>"
git push

Rules:

Stage only this channel's directory; never git add -A.
No commit if no new transcripts were added.
One commit per skill run.
Transcripts run 50–200KB each — fine to commit. Don't gitignore transcripts/.
If git push fails (no upstream, network, conflict), report it and stop. Do not force-push.

Output

After the run, output a summary:

## Podcast Transcripts: <channel> (<YYYY-MM-DD>)

Downloaded <n> new episode(s):
- E218 — Buffett retires, Tariff fallout (2026-04-25, 90 min) → transcripts/2026-04-25-abc123.txt
- E217 — ... (2026-04-18, 88 min) → ...

Skipped <m> already-downloaded episode(s).
[Filtered <k> entries shorter than <min_duration_minutes> min.]
[Note any caption fetch failures.]

On the first run for a channel, append once:

Auto-generated YouTube captions have no speaker labels.
Timestamps are stripped from the cleaned transcript. If you need timed cues later, re-fetch with yt-dlp --write-auto-sub to get the original .vtt.

Style rules

Don't summarize, analyze, or quote the transcript content unless explicitly asked. This skill archives; it does not interpret.
Always check index.json before fetching — never re-download an existing episode.
Never delete or overwrite existing transcripts during a normal run.
If yt-dlp returns a sign-in / age-gate / region error, surface the exact yt-dlp message and stop. Don't try cookie injection or proxies unless the user explicitly asks.
If the channel resolves to multiple plausible matches (e.g. multiple "All-in" channels), confirm with the user before downloading.

name	download-podcast-transcripts
description	Download transcripts for the latest episodes of a podcast hosted on YouTube. Use when the user asks to "download podcast transcripts", "get All-in transcripts", "fetch transcripts from a YouTube channel", "archive podcast captions", or similar. Defaults to the All-in Podcast (@allin) but works with any YouTube channel that posts full episodes.

Download podcast transcripts from YouTube

Inputs

All optional:

channel — YouTube handle (@allin), channel URL, or single video URL. Default: @allin (All-in Podcast). If a single video URL is given, only that episode is fetched.
N — number of latest episodes to download (default 5, max 50).
since — YYYY-MM-DD; download all episodes with upload_date >= since. Overrides N when provided.
min_duration_minutes — drop entries shorter than this (default 10, to filter Shorts and trailers). Set to 0 to disable.

If the user names a podcast without giving a URL (e.g. "Acquired", "Lex Fridman"), do a quick web search to resolve the canonical YouTube channel handle and confirm before proceeding.

Storage layout

Under ~/knowledge/podcasts/<channel-slug>/ (override with $KNOWLEDGE_ROOT):

<channel-slug>/
  index.json                              # all episodes ever indexed
  transcripts/
    <YYYY-MM-DD>-<videoid>.txt            # cleaned transcript
    <YYYY-MM-DD>-<videoid>.json           # episode metadata sidecar

<channel-slug> is the lowercased handle without @ (e.g. allin). For channels without handles, use the channel ID slug yt-dlp returns (lowercased, prefix-stripped).

Method

1. Verify yt-dlp is available

yt-dlp --version

If missing, tell the user to install it (brew install yt-dlp or pip install yt-dlp) and stop. Do not try to work around its absence.

2. List candidate videos

For a channel handle or URL, enumerate the Videos tab:

yt-dlp \
  --flat-playlist \
  --playlist-end <N or 50> \
  --print "%(id)s|%(upload_date)s|%(title)s|%(duration)s" \
  "https://www.youtube.com/<channel>/videos"

Notes:

For a handle, the URL is https://www.youtube.com/@<handle>/videos. For a channel ID, https://www.youtube.com/channel/<id>/videos.
yt-dlp's flat-playlist output is newest-first by upload date.
Use --playlist-end <N> to stop yt-dlp from enumerating the entire channel — much faster.
For since mode, set --playlist-end 50 and filter the output to upload_date >= since after fetching.
If the input is a single video URL, skip this step and treat that one video as the only candidate.

Drop candidates with duration < min_duration_minutes * 60 to filter out Shorts.

3. Skip already-downloaded episodes

Load index.json (if present) and build a set of known video IDs. Drop any candidate whose ID is already indexed. If index.json doesn't exist yet, treat the set as empty.

4. Download metadata + auto-captions per new video

For each remaining candidate:

yt-dlp \
  --skip-download \
  --write-info-json \
  --write-sub --write-auto-sub \
  --sub-lang "en,en-US,en-GB" \
  --sub-format vtt \
  -o "<knowledge_root>/podcasts/<channel-slug>/_tmp/%(id)s" \
  "https://www.youtube.com/watch?v=<video-id>"

5. Convert VTT to clean plain text

Use the bundled helper:

uv run ~/.claude/skills/download-podcast-transcripts/clean_vtt.py \
  "<knowledge_root>/podcasts/<channel-slug>/_tmp/<videoid>.en.vtt" \
  > "<knowledge_root>/podcasts/<channel-slug>/transcripts/<YYYY-MM-DD>-<videoid>.txt"

The script strips WEBVTT headers, cue timings, inline <00:00:00.000><c> timing tags, and the rolling-window duplicates that YouTube auto-captions emit, leaving plain space-joined text.

If the VTT filename uses a regional variant (en-US.vtt, en-GB.vtt), pass that path instead.

6. Write the metadata sidecar

For each downloaded episode, write <YYYY-MM-DD>-<videoid>.json next to the .txt:

{
  "video_id": "abc123",
  "title": "E218: Buffett retires, Tariff fallout, Anthropic raises",
  "channel": "allin",
  "upload_date": "2026-04-25",
  "duration_seconds": 5421,
  "url": "https://www.youtube.com/watch?v=abc123",
  "captions_source": "auto",
  "transcript_path": "transcripts/2026-04-25-abc123.txt",
  "downloaded_at": "2026-04-29"
}

7. Update `index.json`

Schema:

{
  "channel": "allin",
  "channel_url": "https://www.youtube.com/@allin",
  "episodes": [
    {
      "video_id": "abc123",
      "upload_date": "2026-04-25",
      "title": "E218: ...",
      "duration_seconds": 5421,
      "captions_source": "auto",
      "transcript_path": "transcripts/2026-04-25-abc123.txt"
    }
  ]
}

Sort episodes[] by upload_date descending. On re-run, merge: keep existing entries, prepend new ones.

8. Clean up `_tmp/`

After successful processing, remove <channel-slug>/_tmp/. On any failure, leave it in place so the user can inspect raw .vtt and .info.json files.

9. Commit and push

~/knowledge is a git repository. From $KNOWLEDGE_ROOT (default ~/knowledge):

git add podcasts/<channel-slug>/
git commit -m "podcasts(<channel-slug>): <n> new episode(s) — <oldest-date>..<newest-date>"
git push

Rules:

Stage only this channel's directory; never git add -A.
No commit if no new transcripts were added.
One commit per skill run.
Transcripts run 50–200KB each — fine to commit. Don't gitignore transcripts/.
If git push fails (no upstream, network, conflict), report it and stop. Do not force-push.

Output

After the run, output a summary:

## Podcast Transcripts: <channel> (<YYYY-MM-DD>)

Downloaded <n> new episode(s):
- E218 — Buffett retires, Tariff fallout (2026-04-25, 90 min) → transcripts/2026-04-25-abc123.txt
- E217 — ... (2026-04-18, 88 min) → ...

Skipped <m> already-downloaded episode(s).
[Filtered <k> entries shorter than <min_duration_minutes> min.]
[Note any caption fetch failures.]

On the first run for a channel, append once:

Auto-generated YouTube captions have no speaker labels.
Timestamps are stripped from the cleaned transcript. If you need timed cues later, re-fetch with yt-dlp --write-auto-sub to get the original .vtt.

Style rules

Don't summarize, analyze, or quote the transcript content unless explicitly asked. This skill archives; it does not interpret.
Always check index.json before fetching — never re-download an existing episode.
Never delete or overwrite existing transcripts during a normal run.
If yt-dlp returns a sign-in / age-gate / region error, surface the exact yt-dlp message and stop. Don't try cookie injection or proxies unless the user explicitly asks.
If the channel resolves to multiple plausible matches (e.g. multiple "All-in" channels), confirm with the user before downloading.

download-podcast-transcripts

Download podcast transcripts from YouTube

Inputs

Storage layout

Method

1. Verify yt-dlp is available

2. List candidate videos

3. Skip already-downloaded episodes

4. Download metadata + auto-captions per new video

5. Convert VTT to clean plain text

6. Write the metadata sidecar

7. Update index.json

8. Clean up _tmp/

9. Commit and push

Output

Style rules

Download podcast transcripts from YouTube

Inputs

Storage layout

Method

1. Verify yt-dlp is available

2. List candidate videos

3. Skip already-downloaded episodes

4. Download metadata + auto-captions per new video

5. Convert VTT to clean plain text

6. Write the metadata sidecar

7. Update index.json

8. Clean up _tmp/

9. Commit and push

Output

Style rules

7. Update `index.json`

8. Clean up `_tmp/`

7. Update `index.json`

8. Clean up `_tmp/`