Format-specific document extraction workflows
Extract text, tables, metadata, and images from 91+ document formats (PDF, Office, images, HTML, email, archives, academic) using Kreuzberg. Use when writing code that calls Kreuzberg APIs in Python, Node.js/TypeScript, Rust, or CLI. Covers installation, extraction (sync/async), configuration (OCR, chunking, output format), batch processing, error handling, and plugins.
REST API server and MCP protocol integration
Chunking, embeddings, and RAG pipeline integration
Document extraction pipeline architecture and patterns
Plugin architecture, registration, and trait patterns