Skip to main content
Jeden Skill in Manus ausführen
mit einem Klick

mlx-serving

Sterne2
Forks2
Aktualisiert9. Mai 2026 um 22:13

This skill should be used when the user asks about "MLX serving", "mlx_lm.server", "oMLX", "Apple Silicon LLM serving", or "local LLM on Mac" — and when troubleshooting symptoms like model fails to load, OOM during load or inference, server hangs or crashes at batch>1, tool calls returning as plaintext content, throughput regression, or choosing between mlx-lm and oMLX. Also applies to oMLX feature-flag tuning ("turboquant_kv", "dflash", "MTP", "specprefill", "thinking_budget", "max-concurrent-requests", "force_sampling"), OptiQ proxy for models exceeding RAM, Llama-4 ChunkedKVCache batch handling, Llama-3 tool-call JSON format ("name"/"parameters"), and bench-driven validation of serving configs. For Apple Silicon (M-series) only — not for cloud LLM hosting (Bedrock, OpenAI API, Anthropic API), not for non-MLX backends (llama.cpp, Ollama, vLLM), not for model training.

Installation

Mit Codex oder Claude installieren Kopieren Sie diesen Prompt, fügen Sie ihn in Codex, Claude oder einen anderen Assistant ein und lassen Sie die Skill-Seite prüfen und installieren.

Datei-Explorer
5 Dateien
SKILL.md
readonly