一键导入
content-core
// Extract text content from external sources — URLs, PDFs, documents, YouTube videos, and audio/video files. Use when you need to read, analyze, or summarize content from a URL, file, or media source.
// Extract text content from external sources — URLs, PDFs, documents, YouTube videos, and audio/video files. Use when you need to read, analyze, or summarize content from a URL, file, or media source.
| name | content-core |
| description | Extract text content from external sources — URLs, PDFs, documents, YouTube videos, and audio/video files. Use when you need to read, analyze, or summarize content from a URL, file, or media source. |
| allowed-tools | ["Read","Bash","Grep","Glob"] |
Content Core extracts text from external sources so you can read, analyze, or summarize them. Use it whenever you need content from a URL, PDF, document, YouTube video, or audio/video file.
Most extraction works without API keys. Only audio/video transcription and summarization require an LLM API key (e.g., OPENAI_API_KEY).
Content Core runs via uvx (zero-install) which requires uv to be available.
uv --version
If uv is not found, help the user install it:
curl -LsSf https://astral.sh/uv/install.sh | shpowershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"brew install uvpip install uvAfter installation, the user may need to restart their shell or run source ~/.bashrc / source ~/.zshrc for uv to be available on PATH.
| Source | Examples | API Key Needed |
|---|---|---|
| Web pages | Any URL | No |
| YouTube | Video transcript | No |
| Documents | PDF, DOCX, PPTX, XLSX, EPUB, Markdown | No |
| Audio | MP3, WAV, M4A, FLAC, OGG | Yes (STT) |
| Video | MP4, AVI, MOV, MKV | Yes (STT) |
| Plain text / HTML | Raw text, auto-detects HTML | No |
All commands use uvx content-core which runs without installation.
# From a URL
uvx content-core extract "https://example.com"
# From a file
uvx content-core extract document.pdf
# From a YouTube video
uvx content-core extract "https://www.youtube.com/watch?v=VIDEO_ID"
# JSON output (includes title, content, metadata)
uvx content-core extract --format json "https://example.com"
# With a specific extraction engine
uvx content-core extract --engine firecrawl "https://example.com"
uvx content-core extract --engine docling document.pdf
# Enable formula extraction (LaTeX)
uvx content-core extract --engine docling --formulas paper.pdf
# Enable image descriptions and chart data extraction
uvx content-core extract --engine docling --pictures paper.pdf
# Disable OCR (faster, for PDFs with embedded text)
uvx content-core extract --engine docling --no-ocr paper.pdf
Requires an LLM API key (OPENAI_API_KEY or another provider).
# Summarize text
uvx content-core summarize "Long text here..."
# With context to guide the summary
uvx content-core summarize --context "bullet points" "Long text..."
# Pipe extraction into summarization
uvx content-core extract "https://example.com" | uvx content-core summarize --context "key takeaways"
# View current config
uvx content-core config list
# Set persistent defaults
uvx content-core config set llm_provider anthropic
uvx content-core config set llm_model claude-sonnet-4-20250514
uvx content-core config set url_engine firecrawl
# Delete a config value
uvx content-core config delete llm_provider
# See all available config keys
uvx content-core config --help
Content Core can also run as an MCP server. It may or may not be available in your current environment.
Look for content-core in the list of available MCP servers. If available, you will have access to these tools:
Extracts text from a URL or file. No API key needed for most sources.
extract_content(url="https://example.com")
extract_content(file_path="/path/to/document.pdf")
extract_content(url="https://youtube.com/watch?v=ID")
# With engine override
extract_content(file_path="paper.pdf", engine="docling")
# With Docling enrichment
extract_content(file_path="paper.pdf", engine="docling", formulas=true, pictures=true)
Summarizes text using an LLM. Requires an API key.
summarize_content(content="Long text...", context="bullet points")
If summarization fails with an API key error, fall back to extract_content and return the raw content instead.
uvx content-core extract "URL" > output.md). This avoids flooding the agent's context window with large payloads. Read only the relevant sections from the file as needed.uvx content-coreOPENAI_API_KEY (or another STT provider key)--format json when you need structured metadata (title, source type, identified type)--engine docling with --formulas or --picturesuvx is not found: help the user install uv (see Prerequisites above)uvx content-coreextract_content instead and summarize the content yourself--engine to use the auto-detection fallback chain