Run any Skill in Manus with one click

youtube-cache

YouTube video cache operations with Qdrant. Auto-checks cache when YouTube URLs detected. Provides semantic search, verification, ingestion, and archive access. Shows metadata by default, transcript on request.

Run Skill in Manus

Stars0

Forks0

UpdatedNovember 25, 2025 at 14:45

Source

ilude

ilude/agent-spike

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

Useful forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

SKILL.md

readonly

YouTube Cache Operations

Activation triggers:

YouTube URLs in conversation (youtube.com/watch?v=..., youtu.be/...)
Mentions: "youtube", "video", "cache", "qdrant", "transcript"
Working with compose/cli/ or compose/services/ directories

Token efficiency:

Upfront: ~50 tokens (just YAML description)
On activation: ~2-3k tokens (this full document)
Compare to MCP: 5-10k tokens loaded immediately

Core Principles

Auto-check YouTube URLs - When user pastes URL, immediately verify cache status
Metadata-first display - Show title, summary, tags by default (NOT full transcript)
Archive-aware - Check both Qdrant cache and archive for historical data
Leverage existing scripts - All operations use tested compose/cli/* commands

Quick Commands

Auto-Check (Automatic on URL Detection)

When user pastes a YouTube URL, automatically run:

uv run python compose/cli/verify_video.py VIDEO_ID

Display format (metadata only):

Video ID: {video_id}
URL: {url}
Title: {title}
Summary: {summary[:100]}...
Subject: {subject_matter[:3]}
Style: {content_style}
Transcript: {length:,} characters

If not cached:

[INFO] Video not found in cache.

To ingest:
uv run python compose/cli/ingest_video.py "URL"

Search Videos (Semantic)

For natural language queries:

uv run python compose/cli/search_videos.py "query text" --limit 10

Examples:

# Find videos about MCP
uv run python compose/cli/search_videos.py "MCP protocol" --limit 5

# Python tutorials
uv run python compose/cli/search_videos.py "python async programming"

# Custom collection
uv run python compose/cli/search_videos.py "agents" --collection test_basic

Output: Relevance scores + metadata (no transcript unless requested)

List All Cached Videos

# Default collection (cached_content)
uv run python compose/cli/list_videos.py

# Custom collection
uv run python compose/cli/list_videos.py my_collection

# Limit results
uv run python compose/cli/list_videos.py cached_content 20

Ingest New Video

Single video (recommended):

uv run python compose/cli/ingest_video.py "https://youtube.com/watch?v=VIDEO_ID"

Batch/REPL mode:

make ingest
# Or: uv run python compose/cli/ingest_youtube.py

What happens:

Fetch transcript (via Webshare proxy if configured)
Archive transcript immediately → projects/data/archive/youtube/YYYY-MM/
Generate metadata with Claude Haiku (title, summary, tags)
Archive LLM output
Cache in Qdrant with embeddings

Features:

Dry run: --dry-run flag
Custom collection: Add collection name as second arg
Archive-first: All expensive data saved before processing

Delete from Cache

# Delete with confirmation prompt (safe)
uv run python compose/cli/delete_video.py VIDEO_ID

# Custom collection
uv run python compose/cli/delete_video.py VIDEO_ID --collection my_collection

# Skip confirmation (use with caution!)
uv run python compose/cli/delete_video.py VIDEO_ID --yes

Note: Only deletes from Qdrant cache, NOT from archive. Always prompts for confirmation unless --yes flag is used.

Advanced Operations

Search by Reference/Entity

Filter videos mentioning specific people, companies, or concepts:

uv run python compose/cli/search_by_reference.py "entity_name"

# Examples
uv run python compose/cli/search_by_reference.py "Anthropic"
uv run python compose/cli/search_by_reference.py "MCP"

Archive Access

Location: projects/data/archive/youtube/YYYY-MM/VIDEO_ID.json

Structure:

{
  "video_id": "...",
  "url": "...",
  "fetched_at": "...",
  "raw_transcript": "...",
  "llm_outputs": [...],
  "processing_history": [...]
}

Reading archive:

from compose.services.archive import create_local_archive_reader

archive = create_local_archive_reader()
video_data = archive.get("VIDEO_ID")

Reingest from archive:

uv run python compose/cli/reingest_from_archive.py VIDEO_ID

Batch Processing

Fetch channel videos to CSV:

uv run python compose/cli/fetch_channel_videos.py "@channel_handle"
uv run python compose/cli/fetch_channel_videos.py "@channel_handle" output.csv

Requires: YOUTUBE_API_KEY in .env

Then ingest CSV:

make ingest
# Select CSV file when prompted

Collection Management

Default collection: cached_content

Custom collections: Pass as argument to most scripts

Check what's in a collection:

uv run python compose/cli/list_videos.py my_collection

No auto-discovery: Must specify collection name explicitly.

Usage Examples

Example 1: User Pastes YouTube URL

User: "Check this out: https://www.youtube.com/watch?v=1_z3h2r93OY"

Auto-action:

uv run python compose/cli/verify_video.py 1_z3h2r93OY

Response:

Video ID: 1_z3h2r93OY
URL: https://www.youtube.com/watch?v=1_z3h2r93OY
Title: MCP token waste and the solution: skills and code execution
Summary: A breakdown of MCP's token inefficiency and context-rot, and how Anthropic's Claude Skills...
Subject: MCP protocol and token inefficiency, context window and context rot, real-time discovery...
Style: demonstration
Transcript: 11,460 characters

[FOUND] Video is cached in Qdrant.

Example 2: Search for Related Content

User: "Do we have other videos about MCP servers?"

Action:

uv run python compose/cli/search_videos.py "MCP servers" --limit 5

Response:

Found 5 results:

1. OIKTsVjTVJE (relevance: 0.575)
   Title: MCP server implementation guide
   Summary: Building MCP servers with practical examples...

2. D92aDGVFcRE (relevance: 0.488)
   Title: Seven MCP architecture failure modes
   ...

Example 3: Ingest New Video

User: "Can you ingest https://youtube.com/watch?v=ABC123 into the cache?"

Action:

uv run python compose/cli/ingest_video.py "https://youtube.com/watch?v=ABC123"

Response:

[1/5] Checking cache...
[2/5] Fetching transcript...
[3/5] Archiving transcript...
[4/5] Generating metadata...
[5/5] Caching in Qdrant...

SUCCESS! Video cached.

Example 4: Video Not Cached

User: "https://youtube.com/watch?v=XYZ789"

Auto-check result:

[INFO] Video XYZ789 not found in cache.

To ingest this video:
uv run python compose/cli/ingest_video.py "https://youtube.com/watch?v=XYZ789"

Would you like me to ingest it now?

Metadata Structure

Full metadata format (from LLM tagging):

{
    "video_id": str,
    "url": str,
    "transcript": str,           # Full transcript text
    "transcript_length": int,    # Character count
    "metadata": {
        "title": str,
        "summary": str,          # 1-2 sentence overview
        "subject_matter": list[str],  # Main topics (3-6 items)
        "content_style": str,    # tutorial | demonstration | lecture | discussion
        "difficulty": str,       # beginner | intermediate | advanced
        "entities": {
            "named_things": list[str],
            "people": list[str],
            "companies": list[str]
        },
        "techniques_or_concepts": list[str],
        "tools_or_materials": list[str],
        "key_points": list[str],
        "references": list[{
            "type": str,
            "name": str,
            "url": str | None,
            "description": str
        }]
    }
}

Important Patterns

1. Archive-First Philosophy

ALWAYS archive before processing:

Fetch transcript → Archive immediately
Generate LLM output → Archive immediately
Process/embed/cache → Derived data (can be rebuilt)

Why: Protects against data loss, enables reprocessing, tracks costs.

2. Transcript Display Rules

Default: Show metadata only (title, summary, subject, style, length)

Never show full transcript unless explicitly requested:

Transcripts are 10k-50k+ characters
Floods context window
User can request: "show me the transcript for VIDEO_ID"

If user requests transcript:

uv run python compose/cli/verify_video.py VIDEO_ID
# Then show full transcript field from output

3. URL Format Support

All formats supported:

https://www.youtube.com/watch?v=VIDEO_ID
https://youtu.be/VIDEO_ID
https://youtube.com/watch?v=VIDEO_ID
Just VIDEO_ID for verify/delete operations

4. Collection Defaults

Default collection: cached_content
Override: Pass --collection NAME or as positional arg
No validation: Script will create collection if doesn't exist

5. Error Handling

Video not found:

[INFO] Video not found in cache.

→ Suggest ingestion command

Transcript unavailable:

[ERROR] Could not fetch transcript (disabled, private, etc.)

→ Cannot ingest, inform user

Already cached:

[INFO] Video already cached, skipping.

→ Use verify to show metadata

Environment Requirements

Required in .env (root directory):

ANTHROPIC_API_KEY=sk-ant-...     # For metadata generation
OPENAI_API_KEY=sk-proj-...       # For embeddings (sentence-transformers)

Optional:

WEBSHARE_PROXY_USERNAME=...      # For YouTube rate limit bypass
WEBSHARE_PROXY_PASSWORD=...
YOUTUBE_API_KEY=...              # For fetch_channel_videos.py

Data paths (auto-created):

Qdrant: projects/data/qdrant/
Archive: projects/data/archive/youtube/YYYY-MM/

Service Layer Reference

If direct Python imports needed:

# Cache operations
from compose.services.cache import create_qdrant_cache

cache = create_qdrant_cache(collection_name="cached_content")
results = cache.search("query", limit=10)
exists = cache.exists("youtube:video:VIDEO_ID")
cache.close()

# Archive operations
from compose.services.archive import create_local_archive_reader

archive = create_local_archive_reader()
video = archive.get("VIDEO_ID")

# YouTube operations
from compose.services.youtube import extract_video_id, get_transcript

video_id = extract_video_id("https://youtube.com/watch?v=...")
transcript = await get_transcript(video_id)

Script Design Philosophy

From compose/cli/README.md:

Scripts should:

Build on services - Use compose.services.* for business logic
CLI-focused - User-friendly command-line interfaces
Self-contained - Run directly or via make targets
Well-documented - Clear help text and examples
Error handling - Graceful exits and helpful messages

This skill wraps these scripts to provide Claude Code with the same capabilities in a token-efficient, context-aware manner.

Future Enhancements (Phase 2)

Not implemented yet, but potential additions:

Direct service imports (skip bash wrapper for speed)
Batch operation helpers (process multiple URLs)
Collection discovery and auto-switching
Cost tracking from archive metadata
Duplicate detection before ingestion
Smart excerpt showing (relevant transcript sections based on query)

Design philosophy: Start simple, add features only when real pain points emerge.

Token efficiency achieved: This skill provides full YouTube cache access with ~50 tokens upfront, loading full documentation only when needed. Compare to MCP server approach which would load 5-10k tokens immediately for similar functionality.

name	youtube-cache
description	YouTube video cache operations with Qdrant. Auto-checks cache when YouTube URLs detected. Provides semantic search, verification, ingestion, and archive access. Shows metadata by default, transcript on request.

YouTube Cache Operations

Activation triggers:

YouTube URLs in conversation (youtube.com/watch?v=..., youtu.be/...)
Mentions: "youtube", "video", "cache", "qdrant", "transcript"
Working with compose/cli/ or compose/services/ directories

Token efficiency:

Upfront: ~50 tokens (just YAML description)
On activation: ~2-3k tokens (this full document)
Compare to MCP: 5-10k tokens loaded immediately

Core Principles

Auto-check YouTube URLs - When user pastes URL, immediately verify cache status
Metadata-first display - Show title, summary, tags by default (NOT full transcript)
Archive-aware - Check both Qdrant cache and archive for historical data
Leverage existing scripts - All operations use tested compose/cli/* commands

Quick Commands

Auto-Check (Automatic on URL Detection)

When user pastes a YouTube URL, automatically run:

uv run python compose/cli/verify_video.py VIDEO_ID

Display format (metadata only):

Video ID: {video_id}
URL: {url}
Title: {title}
Summary: {summary[:100]}...
Subject: {subject_matter[:3]}
Style: {content_style}
Transcript: {length:,} characters

If not cached:

[INFO] Video not found in cache.

To ingest:
uv run python compose/cli/ingest_video.py "URL"

Search Videos (Semantic)

For natural language queries:

uv run python compose/cli/search_videos.py "query text" --limit 10

Examples:

# Find videos about MCP
uv run python compose/cli/search_videos.py "MCP protocol" --limit 5

# Python tutorials
uv run python compose/cli/search_videos.py "python async programming"

# Custom collection
uv run python compose/cli/search_videos.py "agents" --collection test_basic

Output: Relevance scores + metadata (no transcript unless requested)

List All Cached Videos

# Default collection (cached_content)
uv run python compose/cli/list_videos.py

# Custom collection
uv run python compose/cli/list_videos.py my_collection

# Limit results
uv run python compose/cli/list_videos.py cached_content 20

Ingest New Video

Single video (recommended):

uv run python compose/cli/ingest_video.py "https://youtube.com/watch?v=VIDEO_ID"

Batch/REPL mode:

make ingest
# Or: uv run python compose/cli/ingest_youtube.py

What happens:

Fetch transcript (via Webshare proxy if configured)
Archive transcript immediately → projects/data/archive/youtube/YYYY-MM/
Generate metadata with Claude Haiku (title, summary, tags)
Archive LLM output
Cache in Qdrant with embeddings

Features:

Dry run: --dry-run flag
Custom collection: Add collection name as second arg
Archive-first: All expensive data saved before processing

Delete from Cache

# Delete with confirmation prompt (safe)
uv run python compose/cli/delete_video.py VIDEO_ID

# Custom collection
uv run python compose/cli/delete_video.py VIDEO_ID --collection my_collection

# Skip confirmation (use with caution!)
uv run python compose/cli/delete_video.py VIDEO_ID --yes

Note: Only deletes from Qdrant cache, NOT from archive. Always prompts for confirmation unless --yes flag is used.

Advanced Operations

Search by Reference/Entity

Filter videos mentioning specific people, companies, or concepts:

uv run python compose/cli/search_by_reference.py "entity_name"

# Examples
uv run python compose/cli/search_by_reference.py "Anthropic"
uv run python compose/cli/search_by_reference.py "MCP"

Archive Access

Location: projects/data/archive/youtube/YYYY-MM/VIDEO_ID.json

Structure:

{
  "video_id": "...",
  "url": "...",
  "fetched_at": "...",
  "raw_transcript": "...",
  "llm_outputs": [...],
  "processing_history": [...]
}

Reading archive:

from compose.services.archive import create_local_archive_reader

archive = create_local_archive_reader()
video_data = archive.get("VIDEO_ID")

Reingest from archive:

uv run python compose/cli/reingest_from_archive.py VIDEO_ID

Batch Processing

Fetch channel videos to CSV:

uv run python compose/cli/fetch_channel_videos.py "@channel_handle"
uv run python compose/cli/fetch_channel_videos.py "@channel_handle" output.csv

Requires: YOUTUBE_API_KEY in .env

Then ingest CSV:

make ingest
# Select CSV file when prompted

Collection Management

Default collection: cached_content

Custom collections: Pass as argument to most scripts

Check what's in a collection:

uv run python compose/cli/list_videos.py my_collection

No auto-discovery: Must specify collection name explicitly.

Usage Examples

Example 1: User Pastes YouTube URL

User: "Check this out: https://www.youtube.com/watch?v=1_z3h2r93OY"

Auto-action:

uv run python compose/cli/verify_video.py 1_z3h2r93OY

Response:

Video ID: 1_z3h2r93OY
URL: https://www.youtube.com/watch?v=1_z3h2r93OY
Title: MCP token waste and the solution: skills and code execution
Summary: A breakdown of MCP's token inefficiency and context-rot, and how Anthropic's Claude Skills...
Subject: MCP protocol and token inefficiency, context window and context rot, real-time discovery...
Style: demonstration
Transcript: 11,460 characters

[FOUND] Video is cached in Qdrant.

Example 2: Search for Related Content

User: "Do we have other videos about MCP servers?"

Action:

uv run python compose/cli/search_videos.py "MCP servers" --limit 5

Response:

Found 5 results:

1. OIKTsVjTVJE (relevance: 0.575)
   Title: MCP server implementation guide
   Summary: Building MCP servers with practical examples...

2. D92aDGVFcRE (relevance: 0.488)
   Title: Seven MCP architecture failure modes
   ...

Example 3: Ingest New Video

User: "Can you ingest https://youtube.com/watch?v=ABC123 into the cache?"

Action:

uv run python compose/cli/ingest_video.py "https://youtube.com/watch?v=ABC123"

Response:

[1/5] Checking cache...
[2/5] Fetching transcript...
[3/5] Archiving transcript...
[4/5] Generating metadata...
[5/5] Caching in Qdrant...

SUCCESS! Video cached.

Example 4: Video Not Cached

User: "https://youtube.com/watch?v=XYZ789"

Auto-check result:

[INFO] Video XYZ789 not found in cache.

To ingest this video:
uv run python compose/cli/ingest_video.py "https://youtube.com/watch?v=XYZ789"

Would you like me to ingest it now?

Metadata Structure

Full metadata format (from LLM tagging):

{
    "video_id": str,
    "url": str,
    "transcript": str,           # Full transcript text
    "transcript_length": int,    # Character count
    "metadata": {
        "title": str,
        "summary": str,          # 1-2 sentence overview
        "subject_matter": list[str],  # Main topics (3-6 items)
        "content_style": str,    # tutorial | demonstration | lecture | discussion
        "difficulty": str,       # beginner | intermediate | advanced
        "entities": {
            "named_things": list[str],
            "people": list[str],
            "companies": list[str]
        },
        "techniques_or_concepts": list[str],
        "tools_or_materials": list[str],
        "key_points": list[str],
        "references": list[{
            "type": str,
            "name": str,
            "url": str | None,
            "description": str
        }]
    }
}

Important Patterns

1. Archive-First Philosophy

ALWAYS archive before processing:

Fetch transcript → Archive immediately
Generate LLM output → Archive immediately
Process/embed/cache → Derived data (can be rebuilt)

Why: Protects against data loss, enables reprocessing, tracks costs.

2. Transcript Display Rules

Default: Show metadata only (title, summary, subject, style, length)

Never show full transcript unless explicitly requested:

Transcripts are 10k-50k+ characters
Floods context window
User can request: "show me the transcript for VIDEO_ID"

If user requests transcript:

uv run python compose/cli/verify_video.py VIDEO_ID
# Then show full transcript field from output

3. URL Format Support

All formats supported:

https://www.youtube.com/watch?v=VIDEO_ID
https://youtu.be/VIDEO_ID
https://youtube.com/watch?v=VIDEO_ID
Just VIDEO_ID for verify/delete operations

4. Collection Defaults

Default collection: cached_content
Override: Pass --collection NAME or as positional arg
No validation: Script will create collection if doesn't exist

5. Error Handling

Video not found:

[INFO] Video not found in cache.

→ Suggest ingestion command

Transcript unavailable:

[ERROR] Could not fetch transcript (disabled, private, etc.)

→ Cannot ingest, inform user

Already cached:

[INFO] Video already cached, skipping.

→ Use verify to show metadata

Environment Requirements

Required in .env (root directory):

ANTHROPIC_API_KEY=sk-ant-...     # For metadata generation
OPENAI_API_KEY=sk-proj-...       # For embeddings (sentence-transformers)

Optional:

WEBSHARE_PROXY_USERNAME=...      # For YouTube rate limit bypass
WEBSHARE_PROXY_PASSWORD=...
YOUTUBE_API_KEY=...              # For fetch_channel_videos.py

Data paths (auto-created):

Qdrant: projects/data/qdrant/
Archive: projects/data/archive/youtube/YYYY-MM/

Service Layer Reference

If direct Python imports needed:

# Cache operations
from compose.services.cache import create_qdrant_cache

cache = create_qdrant_cache(collection_name="cached_content")
results = cache.search("query", limit=10)
exists = cache.exists("youtube:video:VIDEO_ID")
cache.close()

# Archive operations
from compose.services.archive import create_local_archive_reader

archive = create_local_archive_reader()
video = archive.get("VIDEO_ID")

# YouTube operations
from compose.services.youtube import extract_video_id, get_transcript

video_id = extract_video_id("https://youtube.com/watch?v=...")
transcript = await get_transcript(video_id)

Script Design Philosophy

From compose/cli/README.md:

Scripts should:

Build on services - Use compose.services.* for business logic
CLI-focused - User-friendly command-line interfaces
Self-contained - Run directly or via make targets
Well-documented - Clear help text and examples
Error handling - Graceful exits and helpful messages

This skill wraps these scripts to provide Claude Code with the same capabilities in a token-efficient, context-aware manner.

Future Enhancements (Phase 2)

Not implemented yet, but potential additions:

Direct service imports (skip bash wrapper for speed)
Batch operation helpers (process multiple URLs)
Collection discovery and auto-switching
Cost tracking from archive metadata
Duplicate detection before ingestion
Smart excerpt showing (relevant transcript sections based on query)

Design philosophy: Start simple, add features only when real pain points emerge.

youtube-cache

More from this repository

YouTube Cache Operations

Core Principles

Quick Commands

Auto-Check (Automatic on URL Detection)

Search Videos (Semantic)

List All Cached Videos

Ingest New Video

Delete from Cache

Advanced Operations

Search by Reference/Entity

Archive Access

Batch Processing

Collection Management

Usage Examples

Example 1: User Pastes YouTube URL

Example 2: Search for Related Content

Example 3: Ingest New Video

Example 4: Video Not Cached

Metadata Structure

Important Patterns

1. Archive-First Philosophy

2. Transcript Display Rules

3. URL Format Support

4. Collection Defaults

5. Error Handling

Environment Requirements

Service Layer Reference

Script Design Philosophy

Future Enhancements (Phase 2)

YouTube Cache Operations

Core Principles

Quick Commands

Auto-Check (Automatic on URL Detection)

Search Videos (Semantic)

List All Cached Videos

Ingest New Video

Delete from Cache

Advanced Operations

Search by Reference/Entity

Archive Access

Batch Processing

Collection Management

Usage Examples

Example 1: User Pastes YouTube URL

Example 2: Search for Related Content

Example 3: Ingest New Video

Example 4: Video Not Cached

Metadata Structure

Important Patterns

1. Archive-First Philosophy

2. Transcript Display Rules

3. URL Format Support

4. Collection Defaults

5. Error Handling

Environment Requirements

Service Layer Reference

Script Design Philosophy

Future Enhancements (Phase 2)

More from this repository