一键导入
refresh-tpu-vllm-forks
Refresh Marin TPU-vLLM forks from a tpu-inference release/LKG pair, update exact SHA pins, run TPU smokes, and open the Marin PR.
用 Codex 或 Claude 帮你安装 复制这段 Prompt,粘贴到 Codex、Claude 或其他助手里,让它检查 Skill 页面并帮你完成安装。
菜单
Refresh Marin TPU-vLLM forks from a tpu-inference release/LKG pair, update exact SHA pins, run TPU smokes, and open the Marin PR.
用 Codex 或 Claude 帮你安装 复制这段 Prompt,粘贴到 Codex、Claude 或其他助手里,让它检查 Skill 页面并帮你完成安装。
基于 SOC 职业分类
Lint, run the pre-PR checks, commit, push, and author or update the branch's pull request in the required plain-text format. Use when committing, pushing, or creating/updating a PR.
Modify or upstream a Grug/Grugformer experiment variant.
Run a perf gate on a PR that touches lib/zephyr internals.
Curate the experiment report index at docs/reports/index.md.
Triage a failed canary ferry run (CI-invoked).
Profile JAX training and analyze hotspots. Use when profiling or optimizing training throughput.
| name | refresh-tpu-vllm-forks |
| description | Refresh Marin TPU-vLLM forks from a tpu-inference release/LKG pair, update exact SHA pins, run TPU smokes, and open the Marin PR. |
Read first:
@AGENTS.md
Marin maintains forks of vllm and tpu-inference with required patches.
Update those forks to the latest tested upstream pair, reconcile Marin overlay
commits, then open the Marin PR that pins the refreshed fork tips.
Example run: marin-community/marin#6453.
Use the same algorithm in CI and local runs. In local/manual mode, ask before external mutations: pushing fork branches, publishing the logs Gist, opening the Marin PR, or filing/updating a GitHub issue. Do not ask before required TPU smoke tests.
| Repo | Role | Upstream |
|---|---|---|
marin-community/vllm | Marin vLLM overlay branches. | vllm-project/vllm |
marin-community/tpu-inference | Marin TPU inference overlay branches. | vllm-project/tpu-inference |
marin-community/marin | Pins fork branch tips and receives the only PR. | n/a |
marin-community/marin
after required smoke tests pass, and request @yonromai as reviewer.uv.lock, and reports bases,
branches/tips, carried/dropped/fixed overlays, validation, and residual risk.main; fork review happens via
pushed branches and compare links from the Marin PR.If a real external blocker remains after repair attempts, do not open a Marin
PR. Create or update one marin-community/marin issue assigned to @yonromai,
titled TPU-vLLM fork refresh blocked: <short reason>, with current pins,
selected release, branch names/SHAs if created, attempted fixes, remaining
failure, artifacts, and the logs Gist.
marin-community/marin repo.${GITHUB_RUN_ID}-${GITHUB_RUN_ATTEMPT}./tmp/marin-fork-refresh/<run-id>.git clone https://github.com/marin-community/vllm.git vllm
git -C vllm remote add upstream https://github.com/vllm-project/vllm.git
git -C vllm fetch --tags origin upstream
git clone https://github.com/marin-community/tpu-inference.git tpu-inference
git -C tpu-inference remote add upstream https://github.com/vllm-project/tpu-inference.git
git -C tpu-inference fetch --tags origin upstream
notes-summary.md: major decisions, selected bases, branch SHAs,
validation outcomes, final no-op/PR/issue result.sharp-edges.md: surprising failures, compatibility traps, memorable
fixes, open operational risks.vllm and tpu-inference SHAs from
root pyproject.toml tool.uv.sources; check uv.lock against them.git merge-base <fork-sha> upstream/main and include repaired compare comments in the Marin change.vllm-tpu==... /
tpu-inference==... in lib/marin/pyproject.toml, treat this as a one-time
bootstrap migration: record the package versions, do not require old fork SHAs
or compare comments, and migrate to exact fork SHA pins after validation.vllm-project/tpu-inference; do not use raw tags,
branches, or standalone latest vLLM releases as the selection signal.draft == false, prerelease == false, and
the tag is exactly vMAJOR.MINOR.PATCH.tpu-inference commit SHA. If it matches the
current Marin tpu-inference upstream base, exit no-op unless only repairing
pin metadata..buildkite/vllm_lkg.version at the selected tpu-inference release.
That exact SHA is the vLLM base; verify it resolves in vllm-project/vllm.requirements/tpu.txt, pyproject.toml, and setup.py.tpu-inference releases when the latest eligible
release fails; fix the refresh or file a blocking issue.Create one branch per fork from the selected upstream base:
auto-refresh/<YYYYMMDD>/<base-id>-<shortsha>
Use the selected tpu-inference release tag as <base-id> for
tpu-inference, and lkg for vLLM. Keep the same date prefix for the pair.
Sanitize names. Never rewrite an existing remote refresh branch; on collision,
use the next -rN suffix.
For each fork:
old_base from the current compare-link comment, old_tip from the
current Marin pin, and new_base from selected upstream metadata.git log --reverse old_base..old_tip.new_base, still required by Marin, or broken by new upstream APIs/deps.
Use patch comparison and targeted diffs for the touched files.carry: behavior is still needed and not upstreamed;drop: upstream absorbed it, it is obsolete, or it was only temporary;fix: intent is still needed, but implementation must change.carry and fix deltas onto new_base in old logical order.
Use clean cherry-picks for carries; rewrite fixes as new commits that
reference the original commit SHA(s).git range-diff old_base..old_tip new_base..<new_tip> as the replay
audit and explain every dropped or rewritten delta in the notes/PR.drop.For bootstrap migrations without pin-derived old_base..old_tip, create the
first managed branches from the selected upstream bases and replay only Marin
fork deltas whose source and intent are explicit.
Push the finished branch to the corresponding marin-community fork.
Update Marin root pyproject.toml so tool.uv.sources pins exact fork branch
tip SHAs. Add adjacent compare comments that show the retained overlay commits
against the selected upstream base:
# https://github.com/marin-community/vllm/compare/<vllm-upstream-base-sha>...<vllm-branch-tip-sha>
vllm = { git = "https://github.com/marin-community/vllm.git", rev = "<vllm-branch-tip-sha>" }
# https://github.com/marin-community/tpu-inference/compare/<tpu-inference-upstream-base-sha>...<tpu-inference-branch-tip-sha>
tpu-inference = { git = "https://github.com/marin-community/tpu-inference.git", rev = "<tpu-inference-branch-tip-sha>" }
Also make only fork-stack update changes needed in Marin:
vllm-tpu==0.19.0 path;marin-core[vllm] own the TPU-vLLM runtime stack;tpu and vllm
extras, unless refreshed-stack validation proves they must change;VLLM_TARGET_DEVICE=tpu for TPU source-build workers.Do not bundle unrelated usability, cleanup, or refactor work. Log those separately if found.
Run before PR creation:
marin cluster, always with interactive
priority, targeting v6e-4 in GCP region europe-west4;v6e-4 / europe-west4 hardware before resubmitting Iris workloads;vllm.LLM.generate TPU smoke;experiments/evals/served_qwen3_humaneval.py over writing a new smoke.Run the brokered smoke with a bounded HumanEval sample unless a better existing brokered test is already closer to the touched code:
uv run python experiments/evals/served_qwen3_humaneval.py \
--limit 8 \
--region europe-west4 \
--tpu-type v6e-4 \
--priority interactive \
--job-name served-qwen3-humaneval-<run-id> \
--output-path /tmp/served-qwen3-humaneval-<run-id>
Inspect the Iris parent, broker, and worker logs; confirm the proxy served completions, lm-eval wrote HumanEval metrics and sample outputs, and no TPU/vLLM build, import, or runtime tracebacks occurred.
When a workload smoke fails, rerun the same workload against Marin's current pins on the old fork stack, using the same Iris target/priority. Fix only failures that pass on the old stack and fail on the refreshed stack. If the old stack is already broken, record it as a baseline failure; do not rewrite that workload or smoke test as part of this refresh.
Do a PR-review-style pass over the fork commits and Marin diff. Use
.agents/skills/review-pr/ as a checklist, then run
./infra/pre-commit.py --review before opening the PR and fix or respond to
every finding.
Check that:
After required smoke tests pass, publish the logs Gist, push the Marin branch,
and open one draft marin-community/marin PR. Request @yonromai as reviewer.
PR body:
tpu-inference release, selected vLLM
LKG, fork branch/tip SHAs, smoke-test outcome, logs Gist link, unresolved
risks, and a one-line dropped-overlay summary.<details> blocks: base-selection evidence, carry/drop/fix table,
explicit dropped-overlay reasons, smoke artifacts, and baseline-failure notes.main branches are unchanged and no fork PRs are opened.pyproject.toml includes overlay compare links and uv.lock is
refreshed.@yonromai.