| description | REST API server and MCP protocol integration |
| name | api-server-mcp |
| priority | critical |
API Server & MCP Protocol
Axum server design for document extraction endpoints, middleware, async processing, and Model Context Protocol integration for AI agents
Xberg API Architecture
Location: crates/xberg/src/api/, crates/xberg-cli/
Xberg provides a dual REST API + MCP server built with Axum + Tokio.
Request Flow:
HTTP Client / AI Agent (Claude)
|
[Transport Layer]
├── REST API (Axum HTTP)
└── MCP Protocol (HTTP or Stdio)
|
[Middleware Layer]
├── CORS, Request Logging (TraceLayer)
├── Request/Response size limits
└── Rate limiting (optional)
|
[Router]
├── REST Endpoints
│ ├── POST /extract - File upload extraction
│ ├── POST /extract-url - URL-based extraction
│ ├── GET /formats - List supported formats
│ ├── GET /health - Server health check
│ ├── POST /batch - Batch document processing
│ ├── GET /cache/stats - Cache statistics
│ └── DELETE /cache - Clear extraction cache
├── MCP Endpoints
│ ├── POST /mcp/tools - List available tools
│ ├── POST /mcp/tools/call - Call a tool
│ ├── GET /mcp/resources - List resources
│ ├── GET /mcp/resources/:uri - Read resource
│ ├── GET /mcp/prompts - List prompts
│ └── GET /mcp/prompts/:name - Get prompt
|
[Handler / Tool Layer]
├── extract_handler / extract_file tool
├── batch_handler / batch_extract tool
├── health_handler / get_capabilities tool
└── format_handler
|
[Extraction Core]
├── Format detection
├── Extraction pipeline
├── Post-processing (chunking, embeddings)
└── Result formatting
|
JSON Response / MCP ToolResult
Server Setup & Configuration
Location: crates/xberg/src/api/server.rs
Server initialization pattern: Create ApiState (holds ExtractionConfig + ExtractionCache), build Axum Router with all REST + MCP routes, apply middleware layers (body limits, CORS, tracing), serve via tokio::net::TcpListener.
Key middleware layers applied in order:
DefaultBodyLimit::max(100MB) + RequestBodyLimitLayer -- configurable via env vars
CorsLayer::permissive() -- restrict in production via CORS_ALLOWED_ORIGINS
TraceLayer::new_for_http() -- request/response logging
Core REST Handlers
Location: crates/xberg/src/api/handlers.rs
| Handler | Method | Description |
|---|
extract_handler | POST /extract | Multipart upload: parse file + optional config JSON, check cache, call extract_bytes(), cache result |
extract_url_handler | POST /extract-url | Fetch URL via reqwest, extract bytes |
batch_handler | POST /batch | Parallel extraction with Semaphore-limited concurrency (default: CPU count) |
health_handler | GET /health | Report status, version, uptime, feature availability (OCR, embeddings), cache stats |
formats_handler | GET /formats | Return supported format categories (office, pdf, images, web, email, archives, academic) |
cache_stats_handler | GET /cache/stats | Hit/miss counts and hit rate |
cache_clear_handler | DELETE /cache | Clear LRU cache |
Caching Strategy
Location: crates/xberg/src/cache/mod.rs
LRU cache keyed by SHA256(file_content), stores Arc<ExtractionResult>. Default 1000 entries. Thread-safe via RwLock. Tracks hit/miss counters with AtomicU64 for stats endpoint.
Error Handling
Location: crates/xberg/src/api/error.rs
ApiError enum maps to HTTP status codes:
MissingFile -> 400, FileNotFound -> 404
OnnxRuntimeMissing / TesseractMissing -> 503 (with remediation message)
PayloadTooLarge -> 413
ExtractionFailed / InvalidConfig / UnsupportedFormat -> 500
MCP Server Implementation
Location: crates/xberg/src/mcp/server.rs
The MCP server allows Claude and other AI agents to call Xberg extraction functions through the Model Context Protocol.
MCP Tools (Callable Functions)
Three tools are registered:
| Tool | Purpose | Required Params |
|---|
extract_file | Extract text/tables/metadata from documents (75+ formats) | file_path |
batch_extract | Extract from multiple documents in parallel | file_paths[] |
get_capabilities | List supported formats, features, backends | (none) |
Tool registration pattern (example: extract_file):
extract_file optional params: format, extract_tables, extract_images, ocr_enabled, extract_metadata, chunking_preset, generate_embeddings.
MCP Resources (Static Knowledge)
Three resources provide static information to agents:
xberg://formats -- Supported format list as JSON
xberg://features -- Cross-binding feature matrix (from FEATURE_MATRIX.md)
xberg://api-reference -- Generated API documentation
MCP Prompts (Agent Templates)
Two prompts guide agent extraction workflows:
extract_for_rag -- Document type-specific RAG extraction guidance (research paper, contract, report). Recommends chunking preset and embedding config.
batch_document_processing -- Optimal concurrency, grouping, and error handling for batch workflows.
MCP Transport Protocols
- HTTP/REST: MCP routes mounted alongside REST API on separate
/mcp/ prefix
- Stdio: JSON-RPC 2.0 over stdin/stdout for local CLI integration (e.g., Claude Desktop)
Integration with Claude Desktop
{
"mcpServers": {
"xberg": {
"command": "xberg-mcp",
"env": {
"XBERG_API_BASE": "http://localhost:8000",
"XBERG_MCP_TRANSPORT": "stdio"
}
}
}
}
MCP Error Handling
ToolError variants: FileNotFound, UnsupportedFormat, ExtractionFailed, OnnxRuntimeMissing, TesseractMissing, Timeout. Each maps to an MCP ToolResultError with descriptive code and message.
Environment Configuration
See .env.example for all configurable variables. Key categories:
- Server:
XBERG_HOST, XBERG_PORT
- Size limits:
XBERG_MAX_REQUEST_BODY_BYTES (default 100MB), XBERG_MAX_MULTIPART_FIELD_BYTES
- Features:
XBERG_ENABLE_OCR, XBERG_ENABLE_EMBEDDINGS, XBERG_ENABLE_KEYWORDS
- Cache:
XBERG_CACHE_ENABLED, XBERG_CACHE_SIZE
- CORS:
CORS_ALLOWED_ORIGINS (comma-separated)
- MCP:
XBERG_MCP_HOST, XBERG_MCP_PORT, XBERG_MCP_TRANSPORT (stdio/http)
- Logging:
RUST_LOG=xberg=info,tower_http=debug
Critical Rules
REST API Rules
- Always validate multipart file uploads - Check MIME type, size, magic bytes
- Timeout long-running extractions - Set per-handler timeout (5 min default)
- Stream large files - Never buffer entire multi-GB file in memory
- Cache aggressively - Identical files should return from cache in <1ms
- Parallel extraction is CPU-bound - Limit workers to CPU count + 1
- Error responses must be actionable - Include error code and remediation suggestion
- Health checks must verify features - Report missing dependencies (ONNX, Tesseract)
- Size limits are configurable - Allow override via env var for large deployments
- CORS is permissive by default - Restrict in production via env var
- Logging all requests - Track extraction metrics for observability
MCP Rules
- All tools must have timeout - Prevent hanging on large files (default 5 min)
- Error responses must be detailed - Include suggestions for missing dependencies
- Feature gates must be checked - Return helpful message if feature unavailable (embeddings, OCR)
- Resources should be static - Don't query external services in resource handlers
- Prompts guide agents - Provide clear examples and best practices
- Batch tools must support cancellation - Allow agent to stop long-running batch operations
- Logging all tool calls - Track usage for analytics and debugging
Related Skills
- extraction-pipeline-patterns - Core extraction called by handlers and MCP tools
- chunking-embeddings - Optional chunking/embedding parameters in extraction
- ocr-backend-management - OCR engine selection and image preprocessing