Install vLLM Semantic Router in agent-safe mode, import supported OpenClaw model providers into canonical VSR config, and rewrite OpenClaw to target VSR.

2026-07-13

vllm-semantic-router-harness

Sonstige Computerberufe

Bridges native skill discovery into the vLLM Semantic Router repository harness, routing tasks through the canonical agent-report flow, repo-local skill registry, and validation commands. Use when starting any task inside the vLLM Semantic Router repository to resolve the correct primary skill, read canonical docs, and run harness validation.

2026-05-30

config-platform-change

Softwareentwickler

Synchronizes config representations across router config, Python CLI schema, and dashboard config UI. Use when adding or changing a config concept that spans those surfaces or addressing config representation debt before Kubernetes-facing translation.

2026-05-30

k8s-platform-change

Softwareentwickler

Modifies Kubernetes-facing operator, CRD, deployment-profile, or DSL translation behavior for semantic-router platform integration. Use when changing operator APIs or controllers, deployment stack manifests, profile-owned platform wiring, or router-to-Kubernetes translation layers.

2026-05-30

maintainer-issue-pr-management

Sonstige Computerberufe

Manages GitHub issue and pull-request lifecycle including creation, updates, triage labelling, and closeout metadata using canonical templates and repository taxonomy. Use when a maintainer asks to create, update, close, or triage GitHub issues or PRs, or when issue creation requires codebase analysis for scope, labels, or acceptance criteria.

2026-05-30

maintainer-release-ops

Sonstige Computerberufe

Maintainer release and milestone operating workflow. Use when a maintainer wants to plan a release, create milestone issues, sync GitHub issue or PR state, generate a daily review brief, or manage stale PRs and backlog routing.

2026-05-30

routing-calibration-loop

Softwareentwickler

Calibrates routing changes against a live router endpoint with executable probes, local DSL validation, versioned deploys, and structured failure review. Use when tuning signals, projections, decisions, or maintained route examples against a real apiserver.

2026-05-30

plugin-end-to-end

Softwareentwickler

Implements end-to-end plugin changes spanning router config, post-decision processing, optional CLI/UI exposure, and E2E test coverage. Use when adding a new plugin type, changing plugin config schema or execution semantics, updating plugin chain behavior, or modifying plugin-exposed metadata across surfaces.

2026-05-30

Zeigt die Top 8 von 17 gesammelten Skills in diesem Repository.

#002

vllm-omni

7 Skills5.6k1.3kaktualisiert 2026-07-16

16% des Creators

Skill

Beruf

Beschreibung

Aktualisiert

add-diffusion-model

Softwareentwickler

Add a new diffusion model (text-to-image, text-to-video, image-to-video, text-to-audio, image editing) to vLLM-Omni, including Cache-DiT acceleration and parallelism support (TP, SP/USP, CFG-Parallel, HSDP). Use when integrating a new diffusion model, porting a diffusers pipeline or a custom model repo to vllm-omni, creating a new DiT transformer adapter, adding diffusion model support, or enabling multi-GPU parallelism and cache acceleration for an existing model.

2026-07-16

vllm-omni-test

Softwarequalitätssicherungsanalysten und -tester

Generate and run tests for vllm-project/vllm-omni with CI-aligned levels and markers; wire new tests into Buildkite (test-ready.yml for L1/L2, test-merge.yml for L3, test-nightly.yml for L4). On completion, always provide copy-paste local and CI-like pytest commands plus prerequisites. Use when creating regression tests, adding L1-L4 coverage, selecting pytest markers, or validating fixes from issues/PRs.

2026-07-16

precheck-pr

Softwarequalitätssicherungsanalysten und -tester

Self-check your branch before creating a PR — catch dead code, verify accuracy/perf claims, validate PR title format, and confirm merge readiness. Use when the user says "precheck", "self review", "pre-submit check", or "check my PR before I open it." Never posts to GitHub.

2026-06-26

add-tts-model

Softwareentwickler

Integrate a new text-to-speech model into vLLM-Omni from HuggingFace reference implementation through production-ready serving with streaming and CUDA graph acceleration. Use when adding a new TTS model, wiring stage separation for speech synthesis, enabling online voice generation serving, debugging TTS integration behavior, or building audio output pipelines.

2026-06-18

quantization

Softwareentwickler

Work on vLLM-Omni quantization for diffusion, autoregressive, omni, or multi-stage models. Use when choosing or adding methods such as fp8, int8, gguf, mxfp8, mxfp4, mxfp4_dualscale, ModelOpt, AutoRound, INC, msModelSlim, awq, or gptq; debugging quantized loading; or validating memory, speed, and output quality.

2026-06-10

diffusion-perf-opt

Softwareentwickler

Diagnose and optimize vLLM Omni diffusion workloads, especially Wan/Qwen/Flux-style image and video generation. Use when Codex is asked to analyze profiling traces, choose parallel strategies, inspect torch profiler trace.json or trace.json.gz timelines, estimate optimization ROI, investigate GPU idle/free bubbles, compare USP/CFG/HSDP/VAE parallelism, or design operator/host/quantization optimizations for vLLM Omni.

2026-05-26

vllm-omni-npu-model-runner-upgrade

Softwareentwickler

Upgrade vllm-omni NPU model runners (OmniNPUModelRunner, NPUARModelRunner, NPUGenerationModelRunner) to align with the latest vllm-ascend NPUModelRunner while preserving omni-specific logic.

2026-04-18

#003

vime

6 Skills37060aktualisiert 2026-06-29

14% des Creators

Skill

Beruf

Beschreibung

Aktualisiert

add-tests-and-ci

Softwarequalitätssicherungsanalysten und -tester

Guide for adding or updating vime tests and CI wiring. Use when tasks require new test cases, CI registration, test matrix updates, or workflow template changes.

2026-06-29

vime-code-review-preferences

Softwarequalitätssicherungsanalysten und -tester

Use when reviewing or editing vime code, especially refactors around helper APIs, branch selection, argument validation, or recurring reviewer preferences about avoiding unnecessary wrappers and making control flow self-explanatory.

2026-06-29

add-dynamic-filter

Softwareentwickler

Guide for adding dynamic/filter hooks in vime rollout pipeline. Use when user wants sample-group selection during rollout, buffer filtering before training, or per-sample masking/processing hooks.

2026-06-04

add-eval-dataset-config

Softwareentwickler

Guide for adding and validating evaluation dataset configuration in vime. Use when user wants to configure eval datasets via --eval-config or --eval-prompt-data, add per-dataset overrides, or customize evaluation rollout behavior.

2026-06-04

add-reward-function

Softwareentwickler

Guide for adding a custom reward function in vime and wiring it through --custom-rm-path (and optional reward post-processing). Use when user wants new reward logic, remote/service reward integration, or task-specific reward shaping.

2026-06-04

add-rollout-function

Softwareentwickler

Guide for adding a new rollout function in vime and wiring it through --rollout-function-path. Use when user wants to implement custom rollout data generation logic, custom train/eval rollout outputs, or migrate from the default vLLM rollout path.

2026-06-04

#004

vllm-skills

6 Skills8723aktualisiert 2026-04-03

14% des Creators

Skill

Beruf

Beschreibung

Aktualisiert

vllm-bench-random-synthetic

Datenwissenschaftler

Run vLLM performance benchmark using synthetic random data to measure throughput, TTFT (Time to First Token), TPOT (Time per Output Token), and other key performance metrics. Use when the user wants to quickly test vLLM serving performance without downloading external datasets.

2026-04-03

vllm-bench-serve

Datenwissenschaftler

Benchmark vLLM or OpenAI-compatible serving endpoints using vllm bench serve. Supports multiple datasets (random, sharegpt, sonnet, HF), backends (openai, openai-chat, vllm-pooling, embeddings), throughput/latency testing with request-rate control, and result saving. Use when benchmarking LLM serving performance, measuring TTFT/TPOT, or load testing inference APIs.

2026-04-03

vllm-deploy-docker

Netzwerk- und Computersystemadministratoren

Deploy vLLM using Docker (pre-built images or build-from-source) with NVIDIA GPU support and run the OpenAI-compatible server.

2026-04-03

vllm-deploy-k8s

Netzwerk- und Computersystemadministratoren

Deploy vLLM to Kubernetes (K8s) with GPU support, health probes, and OpenAI-compatible API endpoint. Use this skill whenever the user wants to deploy, run, or serve vLLM on a Kubernetes cluster, including creating deployments, services, checking existing deployments, or managing vLLM on K8s.

2026-04-03

vllm-deploy-simple

Softwareentwickler

Quick install and deploy vLLM, start serving with a simple LLM, and test OpenAI API.

2026-04-03

vllm-prefix-cache-bench

Softwareentwickler

This is a skill for benchmarking the efficiency of automatic prefix caching in vLLM using fixed prompts, real-world datasets, or synthetic prefix/suffix patterns. Use when the user asks to benchmark prefix caching hit rate, caching efficiency, or repeated-prompt performance in vLLM.

2026-04-03

#005

llm-compressor

3 Skills3.6k581aktualisiert 2026-07-08

6.8% des Creators

Skill

Beruf

Beschreibung

Aktualisiert

fp8

Softwareentwickler

Generate a working FP8 quantization example script and save a compressed-tensors checkpoint. Triggers on: "fp8", "FP8_DYNAMIC", "FP8_BLOCK", "MXFP8", "fp8 example", "quantize to fp8".

2026-07-08

nvfp4

Softwareentwickler

Generate a working NVFP4 (W4A4) quantization example script and save a compressed-tensors checkpoint. Triggers on: "nvfp4", "NVFP4", "fp4", "nvfp4 example", "quantize to nvfp4", "w4a4".

2026-07-08

create-tiny-model

Softwareentwickler

Create and manage tiny models for testing and development. Includes utilities for saving tiny models, inspecting tensors, and finetuning workflows.

2026-06-30

#006

vllm-ascend

2 Skills2.4k1.7kaktualisiert 2026-07-15

4.5% des Creators

Skill

Beruf

Beschreibung

Aktualisiert

vllm-ascend-release

Softwareentwickler

End-to-end release management skill for vLLM Ascend. Creates release checklist issues, identifies critical bugs, runs functional tests, invokes release note generation, and guides through the complete release process.

2026-07-15

vllm-ascend-model-adapter

Softwareentwickler

Adapt and debug existing or new models for vLLM on Ascend NPU. Implement in /vllm-workspace/vllm and /vllm-workspace/vllm-ascend, validate via direct vllm serve from /workspace, and deliver one signed commit in the current repo.

2026-02-26

#007

vllm

1 Skills86.5k19.5kaktualisiert 2026-06-19

2.3% des Creators

Skill

Beruf

Beschreibung

Aktualisiert

ci-fails-buildkite

Softwarequalitätssicherungsanalysten und -tester

Fetch and diagnose vLLM Buildkite CI failure logs. Use when investigating failing CI jobs on a PR or build, when the user pastes a buildkite.com URL, or asks to fetch/diagnose CI logs.

2026-06-19

#008

recipes

1 Skills919332aktualisiert 2026-07-13

2.3% des Creators

Skill

Beruf

Beschreibung

Aktualisiert

add-recipe

Softwareentwickler

Use when the user asks to add, contribute, or create a new vLLM recipe in this repo (e.g. "add a recipe for Qwen/Qwen3-XYZ", "create a recipe for huggingface.co/org/model"). Walks through fetching HF metadata, authoring the YAML at models/<hf_org>/<hf_repo>.yaml, picking variants/strategies, validating, and committing.

2026-07-13

#009

speculators

1 Skills624158aktualisiert 2026-07-16

2.3% des Creators

Skill

Beruf

Beschreibung

Aktualisiert

pr-review

Softwarequalitätssicherungsanalysten und -tester

Review a GitHub PR with design-first analysis, posted as a GitHub review.

2026-07-16

9 von 9 Repositories angezeigt

Alle Repositories angezeigt