Run any Skill in Manus with one click

$pwd:

tubescribe

Name: Tubescribe
Author: duclm1x1

// YouTube video summarizer with speaker detection, formatted documents, and audio output. Use when user sends a YouTube URL or asks to summarize/transcribe a YouTube video.

Run Skill in Manus

$ git log --oneline --stat

stars:2

forks:2

updated:February 10, 2026 at 17:50

SKILL.md

readonly

package.json

"author": "duclm1x1"

"repository": "duclm1x1/Dive-Ai"

View GitHub Repository

$ install --globalskills.sh

$ download --local

Run Skill in Manus

[HINT] Download the complete skill directory including SKILL.md and all related files

Run any Skill with one click

name	TubeScribe
description	YouTube video summarizer with speaker detection, formatted documents, and audio output. Use when user sends a YouTube URL or asks to summarize/transcribe a YouTube video.

TubeScribe 🎬

Turn any YouTube video into a polished document + audio summary.

Drop a YouTube link → get a beautiful transcript with speaker labels, key quotes, timestamps that link back to the video, and an audio summary you can listen to on the go.

💸 100% Free & Local

No subscription — runs entirely on your machine
No API keys required — works out of the box
No data leaves your computer — your content stays private

✨ Features

📄 Transcript with summary and key quotes — Export as DOCX, HTML, or Markdown
🎯 Smart Speaker Detection — Automatically identifies participants
🔊 Audio Summaries — Listen to key points (MP3/WAV)
📝 Clickable Timestamps — Every quote links directly to that moment in the video
💬 YouTube Comments — Viewer sentiment analysis and best comments
📋 Queue Support — Send multiple links, they get processed in order
🚀 Non-Blocking Workflow — Conversation continues while video processes in background

🎬 Works With Any Video

Interviews & podcasts (multi-speaker detection)
Lectures & tutorials (single speaker)
Music videos (lyrics extraction)
News & documentaries
Any YouTube content with captions

Quick Start

When user sends a YouTube URL:

Spawn sub-agent with the full pipeline task immediately
Reply: "🎬 TubeScribe is processing — I'll let you know when it's ready!"
Continue conversation (don't wait!)
Sub-agent notification will announce completion with title and details

DO NOT BLOCK — spawn and move on instantly.

First-Time Setup

Run setup to check dependencies and configure defaults:

python skills/tubescribe/scripts/setup.py

This checks: summarize CLI, pandoc, ffmpeg, Kokoro TTS

Full Workflow (Single Sub-Agent)

Spawn ONE sub-agent that does the entire pipeline:

sessions_spawn(
    task=f"""
## TubeScribe: Process {youtube_url}

⚠️ CRITICAL: Do NOT install any software.
No pip, brew, curl, venv, or binary downloads.
If a tool is missing, STOP and report what's needed.

Run the COMPLETE pipeline — do not stop until all steps are done.

### Step 1: Extract
```bash
python3 skills/tubescribe/scripts/tubescribe.py "{youtube_url}"

Note the video_id from the output (e.g., "Source: /tmp/tubescribe_ABC123_source.json" → video_id is ABC123).

Step 2: Read source JSON

Read /tmp/tubescribe_<video_id>_source.json and note:

metadata.title (for filename)
metadata.video_id
metadata.channel, upload_date, duration_string

Step 3: Create formatted markdown

Write to /tmp/tubescribe_<video_id>_output.md:

# **<title>**

Video info block — Channel, Date, Duration, URL (clickable). Empty line between each field.

## **Participants** — table with bold headers:

| **Name** | **Role** | **Description** |
|----------|----------|-----------------|

## **Summary** — 3-5 paragraphs of prose

## **Key Quotes** — 5 best with clickable YouTube timestamps. Format each as:
```
"Quote text here." - [12:34](https://www.youtube.com/watch?v=ID&t=754s)

"Another quote." - [25:10](https://www.youtube.com/watch?v=ID&t=1510s)
```
Use regular dash -, NOT em dash —. Do NOT use blockquotes >. Plain paragraphs only.

## **Viewer Sentiment** (if comments exist)

## **Best Comments** (if comments exist) — Top 5, NO lines between them:
```
Comment text here.

*- ▲ 123 @AuthorName*

Next comment text here.

*- ▲ 45 @AnotherAuthor*
```
Attribution line: dash + italic. Just blank line between comments, NO --- separators.

## **Full Transcript** — merge segments, speaker labels, clickable timestamps

Step 4: Create DOCX

Clean the title for filename (remove special chars), then:

pandoc /tmp/tubescribe_<video_id>_output.md -o ~/Documents/TubeScribe/<safe_title>.docx

Step 5: Generate audio

Read ~/.tubescribe/config.json to check audio.tts_engine and kokoro.path.

If tts_engine is "kokoro": activate Kokoro from the configured path, generate with voice blend from config, save as MP3 to the configured output folder.
If tts_engine is "builtin": use say command (macOS) to generate audio.
If audio.enabled is false: skip this step.

Step 6: Cleanup

python3 skills/tubescribe/scripts/tubescribe.py --cleanup <video_id>

Step 7: Open folder

open ~/Documents/TubeScribe/

Report

Tell what was created: DOCX name, MP3 name + duration, video stats. """, label="tubescribe", runTimeoutSeconds=900, cleanup="delete" )


**After spawning, reply immediately:**
> 🎬 TubeScribe is processing - I'll let you know when it's ready!
Then continue the conversation. The sub-agent notification announces completion.

## Configuration

Config file: `~/.tubescribe/config.json`

```json
{
  "output": {
    "folder": "~/Documents/TubeScribe",
    "open_folder_after": true,
    "open_document_after": false,
    "open_audio_after": false
  },
  "document": {
    "format": "docx",
    "engine": "pandoc"
  },
  "audio": {
    "enabled": true,
    "format": "mp3",
    "tts_engine": "builtin"
  },
  "kokoro": {
    "path": "~/.openclaw/tools/kokoro",
    "voice_blend": { "af_heart": 0.6, "af_sky": 0.4 },
    "speed": 1.05
  },
  "processing": {
    "subagent_timeout": 600,
    "cleanup_temp_files": true
  }
}

Output Options

Option	Default	Description
`output.folder`	`~/Documents/TubeScribe`	Where to save files
`output.open_folder_after`	`true`	Open output folder when done
`output.open_document_after`	`false`	Auto-open generated document
`output.open_audio_after`	`false`	Auto-open generated audio summary

Document Options

Option	Default	Values	Description
`document.format`	`docx`	`docx`, `html`, `md`	Output format
`document.engine`	`pandoc`	`pandoc`	Converter for DOCX (falls back to HTML)

Audio Options

Option	Default	Values	Description
`audio.enabled`	`true`	`true`, `false`	Generate audio summary
`audio.format`	`mp3`	`mp3`, `wav`	Audio format (mp3 needs ffmpeg)
`audio.tts_engine`	`builtin`	`builtin`, `kokoro`	TTS engine (builtin = macOS say)

Kokoro TTS Options (optional)

Option	Default	Description
`kokoro.path`	`~/.openclaw/tools/kokoro`	Kokoro repo location
`kokoro.voice_blend`	`{af_heart: 0.6, af_sky: 0.4}`	Custom voice mix
`kokoro.speed`	`1.05`	Playback speed (1.0 = normal, 1.05 = 5% faster)

Processing Options

Option	Default	Description
`processing.subagent_timeout`	`600`	Seconds for sub-agent (increase for long videos)
`processing.cleanup_temp_files`	`true`	Remove /tmp files after completion

Comment Options

Option	Default	Description
`comments.max_count`	`50`	Number of comments to fetch
`comments.timeout`	`90`	Timeout for comment fetching (seconds)

Queue Options

Option	Default	Description
`queue.stale_minutes`	`30`	Consider a processing job stale after this many minutes

Output Structure

~/Documents/TubeScribe/
├── {Video Title}.html         # Formatted document (or .docx / .md)
└── {Video Title}_summary.mp3  # Audio summary (or .wav)

After generation, opens the folder (not individual files) so you can access everything.

Dependencies

Required:

summarize CLI — brew install steipete/tap/summarize
Python 3.8+

Optional (better quality):

pandoc — DOCX output: brew install pandoc
ffmpeg — MP3 audio: brew install ffmpeg
yt-dlp — YouTube comments: brew install yt-dlp
Kokoro TTS — High-quality voices: see https://github.com/hexgrad/kokoro

yt-dlp Search Paths

TubeScribe checks these locations (in order):

Priority	Path	Source
1	`which yt-dlp`	System PATH
2	`/opt/homebrew/bin/yt-dlp`	Homebrew (Apple Silicon)
3	`/usr/local/bin/yt-dlp`	Homebrew (Intel) / Linux
4	`~/.local/bin/yt-dlp`	pip install --user
5	`~/.local/pipx/venvs/yt-dlp/bin/yt-dlp`	pipx
6	`~/.openclaw/tools/yt-dlp/yt-dlp`	TubeScribe auto-install

If not found, setup downloads a standalone binary to the tools directory. The tools directory version doesn't conflict with system installations.

Queue Handling

When user sends multiple YouTube URLs while one is processing:

Check Before Starting

python skills/tubescribe/scripts/tubescribe.py --queue-status

If Already Processing

# Add to queue instead of starting parallel processing
python skills/tubescribe/scripts/tubescribe.py --queue-add "NEW_URL"
# → Replies: "📋 Added to queue (position 2)"

After Completion

# Check if more in queue
python skills/tubescribe/scripts/tubescribe.py --queue-next
# → Automatically pops and processes next URL

Queue Commands

Command	Description
`--queue-status`	Show what's processing + queued items
`--queue-add URL`	Add URL to queue
`--queue-next`	Process next item from queue
`--queue-clear`	Clear entire queue

Batch Processing (multiple URLs at once)

python skills/tubescribe/scripts/tubescribe.py url1 url2 url3

Processes all URLs sequentially with a summary at the end.

Error Handling

The script detects and reports these errors with clear messages:

Error	Message
Invalid URL	❌ Not a valid YouTube URL
Private video	❌ Video is private — can't access
Video removed	❌ Video not found or removed
No captions	❌ No captions available for this video
Age-restricted	❌ Age-restricted video — can't access without login
Region-blocked	❌ Video blocked in your region
Live stream	❌ Live streams not supported — wait until it ends
Network error	❌ Network error — check your connection
Timeout	❌ Request timed out — try again later

When an error occurs, report it to the user and don't proceed with that video.

Tips

For long videos (>30 min), increase sub-agent timeout to 900s
Speaker detection works best with clear interview/podcast formats
Single-speaker videos (tutorials, lectures) skip speaker labels automatically
Timestamps link directly to YouTube at that moment
Use batch mode for multiple videos: tubescribe url1 url2 url3

name	TubeScribe
description	YouTube video summarizer with speaker detection, formatted documents, and audio output. Use when user sends a YouTube URL or asks to summarize/transcribe a YouTube video.