Run any Skill in Manus with one click

$pwd:

model-pr-history-knowledge

Name: Model Pr History Knowledge
Author: BBuf

// Use when an SGLang, vLLM, or TensorRT-LLM serving/model optimization task needs prior model-family PR evidence. Query and read the PR-driven history docs under model-pr-optimization-history before choosing source paths, fast paths, kernel/fusion ideas, regression risks, or validation lanes.

Run Skill in Manus

$ git log --oneline --stat

stars:483

forks:41

updated:May 26, 2026 at 12:53

File Explorer

100 files

SKILL.md

readonly

name	model-pr-history-knowledge
description	Use when an SGLang, vLLM, or TensorRT-LLM serving/model optimization task needs prior model-family PR evidence. Query and read the PR-driven history docs under model-pr-optimization-history before choosing source paths, fast paths, kernel/fusion ideas, regression risks, or validation lanes.

Model PR History Knowledge

This is a PR-driven knowledge base for model optimization history. It is not a set of per-model skills. Each model family keeps bilingual docs with inspected PR diffs, implementation file coverage, timelines, changed files, code excerpts, and validation/risk notes.

Use it before patching model-specific serving paths, choosing an SGLang SOTA optimization target, or explaining why a framework already has a faster path.

Query

Run commands from this directory:

python3 scripts/query.py --list
python3 scripts/query.py --framework sglang --model qwen3-core --paths-only
python3 scripts/query.py --framework sglang --model qwen3-core "fused qk norm rope"
python3 scripts/query.py --framework vllm "DeepSeek-V4 fused norm router" --limit 5

Useful options:

--framework sglang|vllm: restrict to one serving framework.
--model <slug>: restrict to one model family directory.
--lang en|zh|both: select English, Chinese, or both docs.
--paths-only: print the exact docs to read without snippets.
--limit N: bound search results.

Workflow

Infer the model-family slug from the user's model id, checkpoint path, or SGLang source path. If unsure, run scripts/query.py "<model name>".
Read the matching SGLang history first for SGLang patch work. Read the vLLM history too when vLLM is the leading competitor or its trace suggests a missing SGLang fast path.
Extract only actionable evidence:
- model implementation files and symbols
- PRs that changed the hot source path
- prior fusions, overlap work, quantization, MoE, attention, cache, sampler, or loader changes
- open/watch PRs that may explain a known gap or pending support issue
- validation lanes and regression risks implied by the PR cards
Save a short note in the active run artifacts, for example history/model-pr-history-notes.md, with paths read, PR numbers, source files, and the decision each item influenced.
Do not copy long PR cards into the final answer. Cite paths and summarize the relevant implementation/risk.

Model Slugs

Current frameworks:

sglang
vllm

Current model-family slugs include:

deepseek-ocr, deepseek-ocr-2, deepseek-v3-r1, deepseek-v31, deepseek-v32,
deepseek-v4, ernie45, gemma4, glm-vlm-ocr, glm45, glm46-glm47, glm5-glm51,
gpt-oss, intern-s1, internvl35, jina-reranker-m0, kimi, ling25, llada21,
llama31, llama33-70b, llama4, mimo-v2-flash, minimax, mistral-small-4,
mixtral-quark-int4fp8-moe, nemotron-super, qwen-vlm-omni-asr, qwen3-coder,
qwen3-core, qwen3-next, qwen35, ring25, step35

SOTA Loop Contract

For sglang-sota-humanize-loop, this knowledge base is an early context source:

Read it after model identification and before patch planning.
Include the history paths and key PR evidence in analysis/root-cause.md or history/model-pr-history-notes.md.
If the profiler points at a known model path, check whether the history has prior changes on that file before writing a new patch.
If a competitor is faster, search that competitor's model history for the same model family and stage before assuming the gap is kernel-local.

related-skills.json

same repository

vllm-sota-humanize-loop.md

from "BBuf/AI-Infra-Auto-Driven-SKILLS"

Run an autonomous Humanize-governed vLLM SOTA performance loop for one LLM model: first perform the fixed fair vLLM/SGLang/TensorRT-LLM deployment search and benchmark, then start one RLCR loop that repeatedly decides the gap, profiles the current bottleneck, runs layer/kernel pipeline analysis, patches vLLM code, optionally uses ncu-report-skill for kernel evidence, and revalidates until vLLM matches or beats the best observed framework under the same workload and SLA.

2026-05-26483

sglang-sota-humanize-loop.md

from "BBuf/AI-Infra-Auto-Driven-SKILLS"

Run an autonomous Humanize-governed SGLang SOTA performance loop for one LLM model: first perform the fixed fair SGLang/vLLM/TensorRT-LLM deployment search and benchmark, then start one RLCR loop that repeatedly decides the gap, profiles the current bottleneck, runs layer/kernel pipeline analysis, patches SGLang code, optionally uses ncu-report-skill for kernel evidence, and revalidates until SGLang matches or beats the best observed framework under the same workload and SLA.

2026-05-26483

llm-pipeline-analysis.md

from "BBuf/AI-Infra-Auto-Driven-SKILLS"

Inspect LLM torch profiler traces at forward-pass, layer, and kernel level. Use when you need layer timings, anchor-kernel boundaries, representative kernel flows, or Perfetto time ranges.

2026-05-26483

sglang-humanize-review.md

from "BBuf/AI-Infra-Auto-Driven-SKILLS"

Perform SGLang code review in the style of human maintainers by consulting the 2024-2025 non-agent PR review corpus, including inline code snippets, original multilingual comments, and discussion threads. Use when reviewing SGLang PRs, diffs, patches, or local changes for correctness, tests, performance, GPU/runtime risks, API compatibility, and maintainability.

2026-05-20483

llm-serving-capacity-planner.md

from "BBuf/AI-Infra-Auto-Driven-SKILLS"

Parse SGLang/vLLM startup logs to explain GPU memory use and request capacity. Use for KV cache budget, mem-fraction-static comparisons, OOM triage, and max-concurrency estimates.

2026-05-20483

model-compute-simulation.md

from "BBuf/AI-Infra-Auto-Driven-SKILLS"

Build an operator-level compute template for an LLM and estimate FLOPs/MFU for a serving shape. Use when you need tensor shapes, per-op FLOPs, kernel-to-op MFU mapping, or parallelism what-if analysis.

2026-05-20483

package.json

"author": "BBuf"

"repository": "BBuf/AI-Infra-Auto-Driven-SKILLS"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Data ScientistsComputer and Mathematical Occupations15-2051L4

name	model-pr-history-knowledge
description	Use when an SGLang, vLLM, or TensorRT-LLM serving/model optimization task needs prior model-family PR evidence. Query and read the PR-driven history docs under model-pr-optimization-history before choosing source paths, fast paths, kernel/fusion ideas, regression risks, or validation lanes.

Model PR History Knowledge

Use it before patching model-specific serving paths, choosing an SGLang SOTA optimization target, or explaining why a framework already has a faster path.

Query

Run commands from this directory:

python3 scripts/query.py --list
python3 scripts/query.py --framework sglang --model qwen3-core --paths-only
python3 scripts/query.py --framework sglang --model qwen3-core "fused qk norm rope"
python3 scripts/query.py --framework vllm "DeepSeek-V4 fused norm router" --limit 5

Useful options:

--framework sglang|vllm: restrict to one serving framework.
--model <slug>: restrict to one model family directory.
--lang en|zh|both: select English, Chinese, or both docs.
--paths-only: print the exact docs to read without snippets.
--limit N: bound search results.

Workflow

Infer the model-family slug from the user's model id, checkpoint path, or SGLang source path. If unsure, run scripts/query.py "<model name>".
Read the matching SGLang history first for SGLang patch work. Read the vLLM history too when vLLM is the leading competitor or its trace suggests a missing SGLang fast path.
Extract only actionable evidence:
- model implementation files and symbols
- PRs that changed the hot source path
- prior fusions, overlap work, quantization, MoE, attention, cache, sampler, or loader changes
- open/watch PRs that may explain a known gap or pending support issue
- validation lanes and regression risks implied by the PR cards
Save a short note in the active run artifacts, for example history/model-pr-history-notes.md, with paths read, PR numbers, source files, and the decision each item influenced.
Do not copy long PR cards into the final answer. Cite paths and summarize the relevant implementation/risk.

Model Slugs

Current frameworks:

sglang
vllm

Current model-family slugs include:

deepseek-ocr, deepseek-ocr-2, deepseek-v3-r1, deepseek-v31, deepseek-v32,
deepseek-v4, ernie45, gemma4, glm-vlm-ocr, glm45, glm46-glm47, glm5-glm51,
gpt-oss, intern-s1, internvl35, jina-reranker-m0, kimi, ling25, llada21,
llama31, llama33-70b, llama4, mimo-v2-flash, minimax, mistral-small-4,
mixtral-quark-int4fp8-moe, nemotron-super, qwen-vlm-omni-asr, qwen3-coder,
qwen3-core, qwen3-next, qwen35, ring25, step35

SOTA Loop Contract

For sglang-sota-humanize-loop, this knowledge base is an early context source:

Read it after model identification and before patch planning.
Include the history paths and key PR evidence in analysis/root-cause.md or history/model-pr-history-notes.md.
If the profiler points at a known model path, check whether the history has prior changes on that file before writing a new patch.
If a competitor is faster, search that competitor's model history for the same model family and stage before assuming the gap is kernel-local.

model-pr-history-knowledge

Model PR History Knowledge

Query

Workflow

Model Slugs

SOTA Loop Contract

More from this repository

More from this repository

Model PR History Knowledge

Query

Workflow

Model Slugs

SOTA Loop Contract