Skip to main content
Jeden Skill in Manus ausführen
mit einem Klick
$pwd:
vllm-project
GitHub creator profile

vllm-project

Repository-level view of 30 collected skills across 5 GitHub repositories, including approximate occupation coverage.

skills collected
30
repositories
5
occupation fields
1
updated
2026-05-30
occupation focus
Major fields detected across this creator.
repository explorer

Repositories and representative skills

#001
semantic-router
17 skills4.2k689updated 2026-05-30
57% of creator
vllm-semantic-router-harness
Softwareentwickler

Bridges native skill discovery into the vLLM Semantic Router repository harness, routing tasks through the canonical agent-report flow, repo-local skill registry, and validation commands. Use when starting any task inside the vLLM Semantic Router repository to resolve the correct primary skill, read canonical docs, and run harness validation.

2026-05-30
config-platform-change
Softwareentwickler

Synchronizes config representations across router config, Python CLI schema, and dashboard config UI. Use when adding or changing a config concept that spans those surfaces or addressing config representation debt before Kubernetes-facing translation.

2026-05-30
k8s-platform-change
Softwareentwickler

Modifies Kubernetes-facing operator, CRD, deployment-profile, or DSL translation behavior for semantic-router platform integration. Use when changing operator APIs or controllers, deployment stack manifests, profile-owned platform wiring, or router-to-Kubernetes translation layers.

2026-05-30
maintainer-issue-pr-management
Softwareentwickler

Manages GitHub issue and pull-request lifecycle including creation, updates, triage labelling, and closeout metadata using canonical templates and repository taxonomy. Use when a maintainer asks to create, update, close, or triage GitHub issues or PRs, or when issue creation requires codebase analysis for scope, labels, or acceptance criteria.

2026-05-30
maintainer-release-ops
Softwareentwickler

Maintainer release and milestone operating workflow. Use when a maintainer wants to plan a release, create milestone issues, sync GitHub issue or PR state, generate a daily review brief, or manage stale PRs and backlog routing.

2026-05-30
routing-calibration-loop
Softwareentwickler

Calibrates routing changes against a live router endpoint with executable probes, local DSL validation, versioned deploys, and structured failure review. Use when tuning signals, projections, decisions, or maintained route examples against a real apiserver.

2026-05-30
plugin-end-to-end
Softwareentwickler

Implements end-to-end plugin changes spanning router config, post-decision processing, optional CLI/UI exposure, and E2E test coverage. Use when adding a new plugin type, changing plugin config schema or execution semantics, updating plugin chain behavior, or modifying plugin-exposed metadata across surfaces.

2026-05-30
router-service-platform-change
Softwareentwickler

Modifies router-side API, authz, memory, provider, storage, or runtime service modules outside config, decision, selection, and extproc plugin chains. Use when changing apiserver endpoints, authz or rate-limit policy code, memory or response storage flows, provider adapters, or other router service-platform modules.

2026-05-30
Showing top 8 of 17 collected skills in this repository.
#002
vllm-skills
5 skills7622updated 2026-04-03
17% of creator
vllm-bench-random-synthetic
Softwareentwickler

Run vLLM performance benchmark using synthetic random data to measure throughput, TTFT (Time to First Token), TPOT (Time per Output Token), and other key performance metrics. Use when the user wants to quickly test vLLM serving performance without downloading external datasets.

2026-04-03
vllm-bench-serve
Softwareentwickler

Benchmark vLLM or OpenAI-compatible serving endpoints using vllm bench serve. Supports multiple datasets (random, sharegpt, sonnet, HF), backends (openai, openai-chat, vllm-pooling, embeddings), throughput/latency testing with request-rate control, and result saving. Use when benchmarking LLM serving performance, measuring TTFT/TPOT, or load testing inference APIs.

2026-04-03
vllm-deploy-k8s
Netzwerk- und Computersystemadministratoren

Deploy vLLM to Kubernetes (K8s) with GPU support, health probes, and OpenAI-compatible API endpoint. Use this skill whenever the user wants to deploy, run, or serve vLLM on a Kubernetes cluster, including creating deployments, services, checking existing deployments, or managing vLLM on K8s.

2026-04-03
vllm-deploy-simple
Netzwerk- und Computersystemadministratoren

Quick install and deploy vLLM, start serving with a simple LLM, and test OpenAI API.

2026-04-03
vllm-prefix-cache-bench
Softwareentwickler

This is a skill for benchmarking the efficiency of automatic prefix caching in vLLM using fixed prompts, real-world datasets, or synthetic prefix/suffix patterns. Use when the user asks to benchmark prefix caching hit rate, caching efficiency, or repeated-prompt performance in vLLM.

2026-04-03
#003
vllm-omni
4 skills4.9k1.0kupdated 2026-05-26
13% of creator
diffusion-perf-opt
Softwareentwickler

Diagnose and optimize vLLM Omni diffusion workloads, especially Wan/Qwen/Flux-style image and video generation. Use when Codex is asked to analyze profiling traces, choose parallel strategies, inspect torch profiler trace.json or trace.json.gz timelines, estimate optimization ROI, investigate GPU idle/free bubbles, compare USP/CFG/HSDP/VAE parallelism, or design operator/host/quantization optimizations for vLLM Omni.

2026-05-26
add-diffusion-model
Softwareentwickler

Add a new diffusion model (text-to-image, text-to-video, image-to-video, text-to-audio, image editing) to vLLM-Omni, including Cache-DiT acceleration and parallelism support (TP, SP/USP, CFG-Parallel, HSDP). Use when integrating a new diffusion model, porting a diffusers pipeline or a custom model repo to vllm-omni, creating a new DiT transformer adapter, adding diffusion model support, or enabling multi-GPU parallelism and cache acceleration for an existing model.

2026-05-11
add-tts-model
Softwareentwickler

Integrate a new text-to-speech model into vLLM-Omni from HuggingFace reference implementation through production-ready serving with streaming and CUDA graph acceleration. Use when adding a new TTS model, wiring stage separation for speech synthesis, enabling online voice generation serving, debugging TTS integration behavior, or building audio output pipelines.

2026-05-05
vllm-omni-npu-model-runner-upgrade
Softwareentwickler

Upgrade vllm-omni NPU model runners (OmniNPUModelRunner, NPUARModelRunner, NPUGenerationModelRunner) to align with the latest vllm-ascend NPUModelRunner while preserving omni-specific logic.

2026-04-18
#004
vllm-ascend
3 skills2.2k1.3kupdated 2026-05-22
10% of creator
5 von 5 Repositories angezeigt
Alle Repositories angezeigt