Skip to main content
Exécutez n'importe quel Skill dans Manus
en un clic

mlx-serving

Étoiles2
Forks2
Mis à jour9 mai 2026 à 22:13

This skill should be used when the user asks about "MLX serving", "mlx_lm.server", "oMLX", "Apple Silicon LLM serving", or "local LLM on Mac" — and when troubleshooting symptoms like model fails to load, OOM during load or inference, server hangs or crashes at batch>1, tool calls returning as plaintext content, throughput regression, or choosing between mlx-lm and oMLX. Also applies to oMLX feature-flag tuning ("turboquant_kv", "dflash", "MTP", "specprefill", "thinking_budget", "max-concurrent-requests", "force_sampling"), OptiQ proxy for models exceeding RAM, Llama-4 ChunkedKVCache batch handling, Llama-3 tool-call JSON format ("name"/"parameters"), and bench-driven validation of serving configs. For Apple Silicon (M-series) only — not for cloud LLM hosting (Bedrock, OpenAI API, Anthropic API), not for non-MLX backends (llama.cpp, Ollama, vLLM), not for model training.

Installation

Installer avec Codex ou Claude Copiez ce prompt, collez-le dans Codex, Claude ou un autre assistant, puis laissez-le vérifier la page du skill et l'installer pour vous.

Explorateur de fichiers
5 fichiers
SKILL.md
readonly