with one click
kv-tool-loop-stability
// Use this skill when certifying mesh-llm KV/cache stability under repeated OpenAI tool-call loops, same-prefix cache reuse, suffix-prefill limits, or native Skippy slot/decode/eviction failures.
// Use this skill when certifying mesh-llm KV/cache stability under repeated OpenAI tool-call loops, same-prefix cache reuse, suffix-prefill limits, or native Skippy slot/decode/eviction failures.
Use this skill when running benchmark orchestration, local single-stage or split benchmarks, benchmark report flow, or performance-oriented skippy runtime checks.
Use this skill when validating skippy staged execution against full-model execution, adding model families, changing split boundaries, testing activation wire dtypes, or diagnosing mismatch behavior.
Use this skill when inspecting GGUF models, planning layer ranges, generating or validating skippy package artifacts, fake packages for direct GGUFs, materialized stage cache behavior, or GGUF writer integration.
Use this skill when running, configuring, debugging, or embedding skippy-server, binary stage transport, OpenAI frontend integration, activation wire dtype settings, stage configs, lifecycle status, or nonblocking telemetry.
Use when changing mesh-llm automation or CLI flows that discover Hugging Face GGUF models, plan CPU Hugging Face Jobs for layer-package splitting, estimate max cost, or publish skippy layer packages/catalog entries.
Use this skill when adding, renaming, removing, or reviewing mesh-llm OTLP metrics, telemetry attributes, metrics exporter settings, or telemetry documentation.
| name | kv-tool-loop-stability |
| description | Use this skill when certifying mesh-llm KV/cache stability under repeated OpenAI tool-call loops, same-prefix cache reuse, suffix-prefill limits, or native Skippy slot/decode/eviction failures. |
| metadata | {"short-description":"Certify KV/tool-loop stability"} |
Use this skill when changing Skippy KV slot cleanup, prefix-cache lookup,
OpenAI tool-loop behavior, agent harnesses, or any runtime path related to
llama_decode failed, failed to find a memory slot, low same-prefix cache
reuse, or proactive eviction failures.
/v1 endpoint. This harness does
not start nodes, load models, join meshes, or change routing policy.auto
only when intentionally validating routed behavior.--print-plan first and confirm the models, attempts,
pressure_turns, timeout, cache thresholds, output directory, and native
logs.manifest.json, results.jsonl,
summary.json, summary.md, and transcripts/*.jsonl.Preview the run without touching the endpoint:
scripts/qa-kv-tool-loop-stability.py \
--base-url http://127.0.0.1:9337/v1 \
--models Qwen/Qwen2.5-3B-Instruct-GGUF:q4_k_m \
--attempts 5 \
--pressure-turns 8 \
--timeout 180 \
--min-cached-tokens 2048 \
--suffix-prefill-limit 256 \
--native-log ~/.mesh-llm/runtime/<pid>/logs/skippy-native.log \
--output-dir target/kv-tool-loop-stability/local \
--print-plan
Run the certification:
scripts/qa-kv-tool-loop-stability.py \
--base-url http://127.0.0.1:9337/v1 \
--models Qwen/Qwen2.5-3B-Instruct-GGUF:q4_k_m \
--attempts 5 \
--pressure-turns 8 \
--timeout 180 \
--min-cached-tokens 2048 \
--suffix-prefill-limit 256 \
--native-log ~/.mesh-llm/runtime/<pid>/logs/skippy-native.log \
--output-dir target/kv-tool-loop-stability/local
summary.md or
summary.json.When changing this harness, run:
python3 -m unittest scripts.tests.test_qa_kv_tool_loop_stability
python3 -m py_compile scripts/qa-kv-tool-loop-stability.py scripts/tests/test_qa_kv_tool_loop_stability.py