بنقرة واحدة
api-server-mcp
REST API server and MCP protocol integration
التثبيت باستخدام Codex أو Claude انسخ هذا Prompt والصقه في Codex أو Claude أو مساعد آخر ليراجع صفحة Skill ويثبّتها لك.
القائمة
REST API server and MCP protocol integration
التثبيت باستخدام Codex أو Claude انسخ هذا Prompt والصقه في Codex أو Claude أو مساعد آخر ليراجع صفحة Skill ويثبّتها لك.
Chunking, embeddings, and RAG pipeline integration
Document extraction pipeline architecture and patterns
Plugin architecture, registration, and trait patterns
Format-specific document extraction workflows
| description | REST API server and MCP protocol integration |
| name | api-server-mcp |
| priority | critical |
Axum server design for document extraction endpoints, middleware, async processing, and Model Context Protocol integration for AI agents
Location: crates/xberg/src/api/, crates/xberg-cli/
Xberg provides a dual REST API + MCP server built with Axum + Tokio.
Request Flow:
HTTP Client / AI Agent (Claude)
|
[Transport Layer]
├── REST API (Axum HTTP)
└── MCP Protocol (HTTP or Stdio)
|
[Middleware Layer]
├── CORS, Request Logging (TraceLayer)
├── Request/Response size limits
└── Rate limiting (optional)
|
[Router]
├── REST Endpoints
│ ├── POST /extract - File upload extraction
│ ├── POST /extract-url - URL-based extraction
│ ├── GET /formats - List supported formats
│ ├── GET /health - Server health check
│ ├── POST /batch - Batch document processing
│ ├── GET /cache/stats - Cache statistics
│ └── DELETE /cache - Clear extraction cache
├── MCP Endpoints
│ ├── POST /mcp/tools - List available tools
│ ├── POST /mcp/tools/call - Call a tool
│ ├── GET /mcp/resources - List resources
│ ├── GET /mcp/resources/:uri - Read resource
│ ├── GET /mcp/prompts - List prompts
│ └── GET /mcp/prompts/:name - Get prompt
|
[Handler / Tool Layer]
├── extract_handler / extract_file tool
├── batch_handler / batch_extract tool
├── health_handler / get_capabilities tool
└── format_handler
|
[Extraction Core]
├── Format detection
├── Extraction pipeline
├── Post-processing (chunking, embeddings)
└── Result formatting
|
JSON Response / MCP ToolResult
Location: crates/xberg/src/api/server.rs
Server initialization pattern: Create ApiState (holds ExtractionConfig + ExtractionCache), build Axum Router with all REST + MCP routes, apply middleware layers (body limits, CORS, tracing), serve via tokio::net::TcpListener.
Key middleware layers applied in order:
DefaultBodyLimit::max(100MB) + RequestBodyLimitLayer -- configurable via env varsCorsLayer::permissive() -- restrict in production via CORS_ALLOWED_ORIGINSTraceLayer::new_for_http() -- request/response loggingLocation: crates/xberg/src/api/handlers.rs
| Handler | Method | Description |
|---|---|---|
extract_handler | POST /extract | Multipart upload: parse file + optional config JSON, check cache, call extract_bytes(), cache result |
extract_url_handler | POST /extract-url | Fetch URL via reqwest, extract bytes |
batch_handler | POST /batch | Parallel extraction with Semaphore-limited concurrency (default: CPU count) |
health_handler | GET /health | Report status, version, uptime, feature availability (OCR, embeddings), cache stats |
formats_handler | GET /formats | Return supported format categories (office, pdf, images, web, email, archives, academic) |
cache_stats_handler | GET /cache/stats | Hit/miss counts and hit rate |
cache_clear_handler | DELETE /cache | Clear LRU cache |
Location: crates/xberg/src/cache/mod.rs
LRU cache keyed by SHA256(file_content), stores Arc<ExtractionResult>. Default 1000 entries. Thread-safe via RwLock. Tracks hit/miss counters with AtomicU64 for stats endpoint.
Location: crates/xberg/src/api/error.rs
ApiError enum maps to HTTP status codes:
MissingFile -> 400, FileNotFound -> 404OnnxRuntimeMissing / TesseractMissing -> 503 (with remediation message)PayloadTooLarge -> 413ExtractionFailed / InvalidConfig / UnsupportedFormat -> 500Location: crates/xberg/src/mcp/server.rs
The MCP server allows Claude and other AI agents to call Xberg extraction functions through the Model Context Protocol.
Three tools are registered:
| Tool | Purpose | Required Params |
|---|---|---|
extract_file | Extract text/tables/metadata from documents (75+ formats) | file_path |
batch_extract | Extract from multiple documents in parallel | file_paths[] |
get_capabilities | List supported formats, features, backends | (none) |
Tool registration pattern (example: extract_file):
// Define Tool with name, description, JSON Schema inputSchema
// Register with server.register_tool(tool, handler_fn)
// Handler: parse params -> build ExtractionConfig -> call extract_file() -> return ToolResult as JSON
extract_file optional params: format, extract_tables, extract_images, ocr_enabled, extract_metadata, chunking_preset, generate_embeddings.
Three resources provide static information to agents:
xberg://formats -- Supported format list as JSONxberg://features -- Cross-binding feature matrix (from FEATURE_MATRIX.md)xberg://api-reference -- Generated API documentationTwo prompts guide agent extraction workflows:
extract_for_rag -- Document type-specific RAG extraction guidance (research paper, contract, report). Recommends chunking preset and embedding config.batch_document_processing -- Optimal concurrency, grouping, and error handling for batch workflows./mcp/ prefix{
"mcpServers": {
"xberg": {
"command": "xberg-mcp",
"env": {
"XBERG_API_BASE": "http://localhost:8000",
"XBERG_MCP_TRANSPORT": "stdio"
}
}
}
}
ToolError variants: FileNotFound, UnsupportedFormat, ExtractionFailed, OnnxRuntimeMissing, TesseractMissing, Timeout. Each maps to an MCP ToolResultError with descriptive code and message.
See .env.example for all configurable variables. Key categories:
XBERG_HOST, XBERG_PORTXBERG_MAX_REQUEST_BODY_BYTES (default 100MB), XBERG_MAX_MULTIPART_FIELD_BYTESXBERG_ENABLE_OCR, XBERG_ENABLE_EMBEDDINGS, XBERG_ENABLE_KEYWORDSXBERG_CACHE_ENABLED, XBERG_CACHE_SIZECORS_ALLOWED_ORIGINS (comma-separated)XBERG_MCP_HOST, XBERG_MCP_PORT, XBERG_MCP_TRANSPORT (stdio/http)RUST_LOG=xberg=info,tower_http=debug