Request Flow:
HTTP Client / AI Agent (Claude)
    |
[Transport Layer]
├── REST API (Axum HTTP)
└── MCP Protocol (HTTP or Stdio)
    |
[Middleware Layer]
├── CORS, Request Logging (TraceLayer)
├── Request/Response size limits
└── Rate limiting (optional)
    |
[Router]
├── REST Endpoints
│   ├── POST /extract - File upload extraction
│   ├── POST /extract-url - URL-based extraction
│   ├── GET /formats - List supported formats
│   ├── GET /health - Server health check
│   ├── POST /batch - Batch document processing
│   ├── GET /cache/stats - Cache statistics
│   └── DELETE /cache - Clear extraction cache
├── MCP Endpoints
│   ├── POST /mcp/tools - List available tools
│   ├── POST /mcp/tools/call - Call a tool
│   ├── GET /mcp/resources - List resources
│   ├── GET /mcp/resources/:uri - Read resource
│   ├── GET /mcp/prompts - List prompts
│   └── GET /mcp/prompts/:name - Get prompt
    |
[Handler / Tool Layer]
├── extract_handler / extract_file tool
├── batch_handler / batch_extract tool
├── health_handler / get_capabilities tool
└── format_handler
    |
[Extraction Core]
├── Format detection
├── Extraction pipeline
├── Post-processing (chunking, embeddings)
└── Result formatting
    |
JSON Response / MCP ToolResult

Server Setup & Configuration

Location: crates/xberg/src/api/server.rs

Server initialization pattern: Create ApiState (holds ExtractionConfig + ExtractionCache), build Axum Router with all REST + MCP routes, apply middleware layers (body limits, CORS, tracing), serve via tokio::net::TcpListener.

Key middleware layers applied in order:

DefaultBodyLimit::max(100MB) + RequestBodyLimitLayer -- configurable via env vars
CorsLayer::permissive() -- restrict in production via CORS_ALLOWED_ORIGINS
TraceLayer::new_for_http() -- request/response logging

Core REST Handlers

Location: crates/xberg/src/api/handlers.rs

Handler	Method	Description
`extract_handler`	POST /extract	Multipart upload: parse file + optional config JSON, check cache, call `extract_bytes()`, cache result
`extract_url_handler`	POST /extract-url	Fetch URL via reqwest, extract bytes
`batch_handler`	POST /batch	Parallel extraction with `Semaphore`-limited concurrency (default: CPU count)
`health_handler`	GET /health	Report status, version, uptime, feature availability (OCR, embeddings), cache stats
`formats_handler`	GET /formats	Return supported format categories (office, pdf, images, web, email, archives, academic)
`cache_stats_handler`	GET /cache/stats	Hit/miss counts and hit rate
`cache_clear_handler`	DELETE /cache	Clear LRU cache

Caching Strategy

Location: crates/xberg/src/cache/mod.rs

LRU cache keyed by SHA256(file_content), stores Arc<ExtractionResult>. Default 1000 entries. Thread-safe via RwLock. Tracks hit/miss counters with AtomicU64 for stats endpoint.

Error Handling

Location: crates/xberg/src/api/error.rs

ApiError enum maps to HTTP status codes:

MissingFile -> 400, FileNotFound -> 404
OnnxRuntimeMissing / TesseractMissing -> 503 (with remediation message)
PayloadTooLarge -> 413
ExtractionFailed / InvalidConfig / UnsupportedFormat -> 500

MCP Server Implementation

Location: crates/xberg/src/mcp/server.rs

The MCP server allows Claude and other AI agents to call Xberg extraction functions through the Model Context Protocol.

MCP Tools (Callable Functions)

Three tools are registered:

Tool	Purpose	Required Params
`extract_file`	Extract text/tables/metadata from documents (75+ formats)	`file_path`
`batch_extract`	Extract from multiple documents in parallel	`file_paths[]`
`get_capabilities`	List supported formats, features, backends	(none)

Tool registration pattern (example: extract_file):

// Define Tool with name, description, JSON Schema inputSchema
// Register with server.register_tool(tool, handler_fn)
// Handler: parse params -> build ExtractionConfig -> call extract_file() -> return ToolResult as JSON

extract_file optional params: format, extract_tables, extract_images, ocr_enabled, extract_metadata, chunking_preset, generate_embeddings.

MCP Resources (Static Knowledge)

Three resources provide static information to agents:

xberg://formats -- Supported format list as JSON
xberg://features -- Cross-binding feature matrix (from FEATURE_MATRIX.md)
xberg://api-reference -- Generated API documentation

MCP Prompts (Agent Templates)

Two prompts guide agent extraction workflows:

extract_for_rag -- Document type-specific RAG extraction guidance (research paper, contract, report). Recommends chunking preset and embedding config.
batch_document_processing -- Optimal concurrency, grouping, and error handling for batch workflows.

MCP Transport Protocols

HTTP/REST: MCP routes mounted alongside REST API on separate /mcp/ prefix
Stdio: JSON-RPC 2.0 over stdin/stdout for local CLI integration (e.g., Claude Desktop)

Integration with Claude Desktop

{
  "mcpServers": {
    "xberg": {
      "command": "xberg-mcp",
      "env": {
        "XBERG_API_BASE": "http://localhost:8000",
        "XBERG_MCP_TRANSPORT": "stdio"
      }
    }
  }
}

MCP Error Handling

ToolError variants: FileNotFound, UnsupportedFormat, ExtractionFailed, OnnxRuntimeMissing, TesseractMissing, Timeout. Each maps to an MCP ToolResultError with descriptive code and message.

Environment Configuration

See .env.example for all configurable variables. Key categories:

Server: XBERG_HOST, XBERG_PORT
Size limits: XBERG_MAX_REQUEST_BODY_BYTES (default 100MB), XBERG_MAX_MULTIPART_FIELD_BYTES
Features: XBERG_ENABLE_OCR, XBERG_ENABLE_EMBEDDINGS, XBERG_ENABLE_KEYWORDS
Cache: XBERG_CACHE_ENABLED, XBERG_CACHE_SIZE
CORS: CORS_ALLOWED_ORIGINS (comma-separated)
MCP: XBERG_MCP_HOST, XBERG_MCP_PORT, XBERG_MCP_TRANSPORT (stdio/http)
Logging: RUST_LOG=xberg=info,tower_http=debug

Critical Rules

REST API Rules

Always validate multipart file uploads - Check MIME type, size, magic bytes
Timeout long-running extractions - Set per-handler timeout (5 min default)
Stream large files - Never buffer entire multi-GB file in memory
Cache aggressively - Identical files should return from cache in <1ms
Parallel extraction is CPU-bound - Limit workers to CPU count + 1
Error responses must be actionable - Include error code and remediation suggestion
Health checks must verify features - Report missing dependencies (ONNX, Tesseract)
Size limits are configurable - Allow override via env var for large deployments
CORS is permissive by default - Restrict in production via env var
Logging all requests - Track extraction metrics for observability

MCP Rules

All tools must have timeout - Prevent hanging on large files (default 5 min)
Error responses must be detailed - Include suggestions for missing dependencies
Feature gates must be checked - Return helpful message if feature unavailable (embeddings, OCR)
Resources should be static - Don't query external services in resource handlers
Prompts guide agents - Provide clear examples and best practices
Batch tools must support cancellation - Allow agent to stop long-running batch operations
Logging all tool calls - Track usage for analytics and debugging

Related Skills

extraction-pipeline-patterns - Core extraction called by handlers and MCP tools
chunking-embeddings - Optional chunking/embedding parameters in extraction
ocr-backend-management - OCR engine selection and image preprocessing