원클릭으로
skippy-server
// Use this skill when running, configuring, debugging, or embedding skippy-server, binary stage transport, OpenAI frontend integration, activation wire dtype settings, stage configs, lifecycle status, or nonblocking telemetry.
// Use this skill when running, configuring, debugging, or embedding skippy-server, binary stage transport, OpenAI frontend integration, activation wire dtype settings, stage configs, lifecycle status, or nonblocking telemetry.
| name | skippy-server |
| description | Use this skill when running, configuring, debugging, or embedding skippy-server, binary stage transport, OpenAI frontend integration, activation wire dtype settings, stage configs, lifecycle status, or nonblocking telemetry. |
| metadata | {"short-description":"Run and debug skippy serving"} |
Use this skill for skippy serving, embedded runtime lifecycle, and binary stage-to-stage transport.
The mesh integration embeds skippy-server through Rust APIs instead of
launching it as mesh's public OpenAI surface. Public OpenAI compatibility
belongs in openai-frontend; skippy-server should remain the backend stage
runtime.
Important crates:
crates/skippy-server
crates/skippy-protocol
crates/skippy-runtime
crates/mesh-llm/src/inference/skippy
Run cargo commands serially:
cargo check -p mesh-llm
cargo test -p skippy-server --lib
cargo test -p skippy-protocol --lib
cargo test -p mesh-llm-host-runtime --lib inference::skippy
For lifecycle/status changes, also run:
cargo test -p mesh-llm-host-runtime --lib
Do not reintroduce standalone kv-server or ngram-pool dependencies into
mesh. Keep structured outputs, tools, logprobs, and /v1/responses
compatibility in openai-frontend.
Stage status exposed by mesh should be backend-neutral at the API boundary. Backend-specific details can remain in internal skippy structs.
Use this skill when running benchmark orchestration, local single-stage or split benchmarks, benchmark report flow, or performance-oriented skippy runtime checks.
Use this skill when validating skippy staged execution against full-model execution, adding model families, changing split boundaries, testing activation wire dtypes, or diagnosing mismatch behavior.
Use this skill when inspecting GGUF models, planning layer ranges, generating or validating skippy package artifacts, fake packages for direct GGUFs, materialized stage cache behavior, or GGUF writer integration.
Use this skill when certifying mesh-llm KV/cache stability under repeated OpenAI tool-call loops, same-prefix cache reuse, suffix-prefill limits, or native Skippy slot/decode/eviction failures.
Use when changing mesh-llm automation or CLI flows that discover Hugging Face GGUF models, plan CPU Hugging Face Jobs for layer-package splitting, estimate max cost, or publish skippy layer packages/catalog entries.
Use this skill when adding, renaming, removing, or reviewing mesh-llm OTLP metrics, telemetry attributes, metrics exporter settings, or telemetry documentation.