Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

Loslegen

api-server-mcp

Sterne8.552

Forks504

Aktualisiert25. Juni 2026 um 06:26

REST API server and MCP protocol integration

Installation

Mit Codex oder Claude installieren Kopieren Sie diesen Prompt, fügen Sie ihn in Codex, Claude oder einen anderen Assistant ein und lassen Sie die Skill-Seite prüfen und installieren.

In Manus ausführen

Quelle

xberg-io

xberg-io/xberg

GitHub-Repository öffnen Creator-Repositorys ansehen

Download

In Manus ausführen

SKILL.md

readonly

Mehr aus diesem Repository

gleiches Repository

chunking-embeddings

xberg-io/xberg

Chunking, embeddings, and RAG pipeline integration

2026-06-258.6k

extraction-pipeline-patterns

xberg-io/xberg

Document extraction pipeline architecture and patterns

2026-06-258.6k

plugin-architecture-patterns

xberg-io/xberg

Plugin architecture, registration, and trait patterns

2026-06-258.6k

format-specific-extraction

xberg-io/xberg

Format-specific document extraction workflows

2026-05-078.6k

description	REST API server and MCP protocol integration
name	api-server-mcp
priority	critical

API Server & MCP Protocol

Axum server design for document extraction endpoints, middleware, async processing, and Model Context Protocol integration for AI agents

Xberg API Architecture

Location: crates/xberg/src/api/, crates/xberg-cli/

Xberg provides a dual REST API + MCP server built with Axum + Tokio.

Request Flow:
HTTP Client / AI Agent (Claude)
    |
[Transport Layer]
├── REST API (Axum HTTP)
└── MCP Protocol (HTTP or Stdio)
    |
[Middleware Layer]
├── CORS, Request Logging (TraceLayer)
├── Request/Response size limits
└── Rate limiting (optional)
    |
[Router]
├── REST Endpoints
│   ├── POST /extract - File upload extraction
│   ├── POST /extract-url - URL-based extraction
│   ├── GET /formats - List supported formats
│   ├── GET /health - Server health check
│   ├── POST /batch - Batch document processing
│   ├── GET /cache/stats - Cache statistics
│   └── DELETE /cache - Clear extraction cache
├── MCP Endpoints
│   ├── POST /mcp/tools - List available tools
│   ├── POST /mcp/tools/call - Call a tool
│   ├── GET /mcp/resources - List resources
│   ├── GET /mcp/resources/:uri - Read resource
│   ├── GET /mcp/prompts - List prompts
│   └── GET /mcp/prompts/:name - Get prompt
    |
[Handler / Tool Layer]
├── extract_handler / extract_file tool
├── batch_handler / batch_extract tool
├── health_handler / get_capabilities tool
└── format_handler
    |
[Extraction Core]
├── Format detection
├── Extraction pipeline
├── Post-processing (chunking, embeddings)
└── Result formatting
    |
JSON Response / MCP ToolResult

Server Setup & Configuration

Location: crates/xberg/src/api/server.rs

Server initialization pattern: Create ApiState (holds ExtractionConfig + ExtractionCache), build Axum Router with all REST + MCP routes, apply middleware layers (body limits, CORS, tracing), serve via tokio::net::TcpListener.

Key middleware layers applied in order:

DefaultBodyLimit::max(100MB) + RequestBodyLimitLayer -- configurable via env vars
CorsLayer::permissive() -- restrict in production via CORS_ALLOWED_ORIGINS
TraceLayer::new_for_http() -- request/response logging

Core REST Handlers

Location: crates/xberg/src/api/handlers.rs

Handler	Method	Description
`extract_handler`	POST /extract	Multipart upload: parse file + optional config JSON, check cache, call `extract_bytes()`, cache result
`extract_url_handler`	POST /extract-url	Fetch URL via reqwest, extract bytes
`batch_handler`	POST /batch	Parallel extraction with `Semaphore`-limited concurrency (default: CPU count)
`health_handler`	GET /health	Report status, version, uptime, feature availability (OCR, embeddings), cache stats
`formats_handler`	GET /formats	Return supported format categories (office, pdf, images, web, email, archives, academic)
`cache_stats_handler`	GET /cache/stats	Hit/miss counts and hit rate
`cache_clear_handler`	DELETE /cache	Clear LRU cache

Caching Strategy

Location: crates/xberg/src/cache/mod.rs

LRU cache keyed by SHA256(file_content), stores Arc<ExtractionResult>. Default 1000 entries. Thread-safe via RwLock. Tracks hit/miss counters with AtomicU64 for stats endpoint.

Error Handling

Location: crates/xberg/src/api/error.rs

ApiError enum maps to HTTP status codes:

MissingFile -> 400, FileNotFound -> 404
OnnxRuntimeMissing / TesseractMissing -> 503 (with remediation message)
PayloadTooLarge -> 413
ExtractionFailed / InvalidConfig / UnsupportedFormat -> 500

MCP Server Implementation

Location: crates/xberg/src/mcp/server.rs

The MCP server allows Claude and other AI agents to call Xberg extraction functions through the Model Context Protocol.

MCP Tools (Callable Functions)

Three tools are registered:

Tool	Purpose	Required Params
`extract_file`	Extract text/tables/metadata from documents (75+ formats)	`file_path`
`batch_extract`	Extract from multiple documents in parallel	`file_paths[]`
`get_capabilities`	List supported formats, features, backends	(none)

Tool registration pattern (example: extract_file):

// Define Tool with name, description, JSON Schema inputSchema
// Register with server.register_tool(tool, handler_fn)
// Handler: parse params -> build ExtractionConfig -> call extract_file() -> return ToolResult as JSON

extract_file optional params: format, extract_tables, extract_images, ocr_enabled, extract_metadata, chunking_preset, generate_embeddings.

MCP Resources (Static Knowledge)

Three resources provide static information to agents:

xberg://formats -- Supported format list as JSON
xberg://features -- Cross-binding feature matrix (from FEATURE_MATRIX.md)
xberg://api-reference -- Generated API documentation

MCP Prompts (Agent Templates)

Two prompts guide agent extraction workflows:

extract_for_rag -- Document type-specific RAG extraction guidance (research paper, contract, report). Recommends chunking preset and embedding config.
batch_document_processing -- Optimal concurrency, grouping, and error handling for batch workflows.

MCP Transport Protocols

HTTP/REST: MCP routes mounted alongside REST API on separate /mcp/ prefix
Stdio: JSON-RPC 2.0 over stdin/stdout for local CLI integration (e.g., Claude Desktop)

Integration with Claude Desktop

{
  "mcpServers": {
    "xberg": {
      "command": "xberg-mcp",
      "env": {
        "XBERG_API_BASE": "http://localhost:8000",
        "XBERG_MCP_TRANSPORT": "stdio"
      }
    }
  }
}

MCP Error Handling

ToolError variants: FileNotFound, UnsupportedFormat, ExtractionFailed, OnnxRuntimeMissing, TesseractMissing, Timeout. Each maps to an MCP ToolResultError with descriptive code and message.

Environment Configuration

See .env.example for all configurable variables. Key categories:

Server: XBERG_HOST, XBERG_PORT
Size limits: XBERG_MAX_REQUEST_BODY_BYTES (default 100MB), XBERG_MAX_MULTIPART_FIELD_BYTES
Features: XBERG_ENABLE_OCR, XBERG_ENABLE_EMBEDDINGS, XBERG_ENABLE_KEYWORDS
Cache: XBERG_CACHE_ENABLED, XBERG_CACHE_SIZE
CORS: CORS_ALLOWED_ORIGINS (comma-separated)
MCP: XBERG_MCP_HOST, XBERG_MCP_PORT, XBERG_MCP_TRANSPORT (stdio/http)
Logging: RUST_LOG=xberg=info,tower_http=debug

Critical Rules

REST API Rules

Always validate multipart file uploads - Check MIME type, size, magic bytes
Timeout long-running extractions - Set per-handler timeout (5 min default)
Stream large files - Never buffer entire multi-GB file in memory
Cache aggressively - Identical files should return from cache in <1ms
Parallel extraction is CPU-bound - Limit workers to CPU count + 1
Error responses must be actionable - Include error code and remediation suggestion
Health checks must verify features - Report missing dependencies (ONNX, Tesseract)
Size limits are configurable - Allow override via env var for large deployments
CORS is permissive by default - Restrict in production via env var
Logging all requests - Track extraction metrics for observability

MCP Rules

All tools must have timeout - Prevent hanging on large files (default 5 min)
Error responses must be detailed - Include suggestions for missing dependencies
Feature gates must be checked - Return helpful message if feature unavailable (embeddings, OCR)
Resources should be static - Don't query external services in resource handlers
Prompts guide agents - Provide clear examples and best practices
Batch tools must support cancellation - Allow agent to stop long-running batch operations
Logging all tool calls - Track usage for analytics and debugging

Related Skills

extraction-pipeline-patterns - Core extraction called by handlers and MCP tools
chunking-embeddings - Optional chunking/embedding parameters in extraction
ocr-backend-management - OCR engine selection and image preprocessing