mit einem Klick
scrub-issue
// Fetch, analyze, reproduce, and minimize GitHub issue reproductions. Use when asked to check if an issue reproduces, minimize a repro, analyze a bug report, or scrub/triage an issue for reproducibility.
// Fetch, analyze, reproduce, and minimize GitHub issue reproductions. Use when asked to check if an issue reproduces, minimize a repro, analyze a bug report, or scrub/triage an issue for reproducibility.
Fix bugs reported in PyTorch GitHub issues by reproducing, root-causing, and implementing a fix in the local working tree. Use when the user asks to fix a PyTorch GitHub issue.
Write Metal/MPS kernels for PyTorch operators. Use when adding MPS device support to operators, implementing Metal shaders, or porting CUDA kernels to Apple Silicon. Covers native_functions.yaml dispatch, host-side operators, and Metal kernel implementation.
Triages GitHub issues by routing to oncall teams, applying labels, and closing questions. Use when processing new PyTorch issues or when asked to triage an issue.
Sub-triages issues in the oncall:distributed queue by assigning distributed module labels, routing to sub-oncalls, and marking triaged. Use when an issue has been routed to oncall:distributed and needs second-level triage.
Migrate a file to use stricter Pyrefly type checking with annotations required for all functions, classes, and attributes.
Review PyTorch pull requests for code quality, test coverage, security, and backward compatibility. Use when reviewing PRs, when asked to review code changes, or when the user mentions "review PR", "code review", or "check this PR".
| name | scrub-issue |
| disable-model-invocation | true |
| description | Fetch, analyze, reproduce, and minimize GitHub issue reproductions. Use when asked to check if an issue reproduces, minimize a repro, analyze a bug report, or scrub/triage an issue for reproducibility. |
Fetch a GitHub issue, evaluate whether it has a reasonable repro, check if it still reproduces, and systematically minimize the repro to the smallest possible self-contained script.
Assume the current environment is correct and run python directly. Only use
conda run -n <env> for version bisection (step 5a) where you need to
temporarily use a different environment. Use the Bash tool's timeout
parameter to enforce timeouts when running repro scripts.
gh issue view <NUMBER> --repo pytorch/pytorch to fetch the issue bodygh issue view --comments <NUMBER> --repo pytorch/pytorch to fetch commentspython <script> to run repro scriptsgh issue comment <NUMBER> --repo pytorch/pytorch --body <BODY> to commentgh issue edit <NUMBER> --repo pytorch/pytorch --add-label <LABEL> to add labelsMultiple gh issue edit flags can be combined in a single command (e.g.
--add-label "bug,help wanted" --add-assignee "@me"). Prefer batching
edits into one command to minimize API calls and reduce the chance of
auto-subscribing to notifications.
Modifying an issue (commenting, adding labels) auto-subscribes you to
notifications. Use tools/stale_issues.py to save and restore subscription
state:
gh issue edit, gh issue comment, or
gh issue close command is executed:
python tools/stale_issues.py subscription save <NUMBER>
gh issue close), in which case skip the restore
so the user stays subscribed to follow any responses to the closure. The
restore must be the very last GitHub API call — do not run it in the
background or in parallel with any gh issue edit or gh issue comment
commands:
python tools/stale_issues.py subscription restore <NUMBER>
If either the save or restore command fails, warn the user and continue without the save/restore mechanism.
Important: Never run gh issue edit, gh issue comment, gh issue close,
or subscription save/restore commands in the background. These must all run in
the foreground so their completion can be verified before proceeding.
If commenting or editing fails because the issue is locked, report this to the
user and skip the modification.
Before running any repro code, check for the following concerns:
/tmp/subprocess used to launch torchrun or mp.spawn for distributed repros
is expected and not a concerntorch.load on untrusted .pt/.pth filesMASTER_ADDR, MASTER_PORT, RANK, WORLD_SIZE,
CUDA_VISIBLE_DEVICES, or other standard PyTorch/CUDA env vars is expected
and not a concernIf any of these are present, explain the concern to the user and ask whether
to proceed, skip, or modify the repro to remove the risky parts. If the user
chooses to skip, still refresh the triaged label timestamp (remove and
re-add, or just add if not present) before reporting that the analysis is
finished.
Even if the repro passes the checklist above, check whether the author of the
repro code is a PyTorch collaborator by running
python tools/stale_issues.py collaborator-check <username>. If the command
exits with a non-zero status (user is not a collaborator), show the repro code
to the user and ask them to verify it is safe to run before executing it.
Fetch the issue body and comments in parallel. Identify the reported repro script and error. If multiple repros are present, prefer the most recent one from the original poster. If a commenter has posted a strictly shorter and more self-contained repro that doesn't require additional context from the issue description, prefer that one. Note which repro you selected. If the repro code is only present in screenshots or images rather than copyable text, stop and report this to the user.
Before investing effort in reproduction, check whether the issue is actionable. Do not proceed to later steps if any of the following apply:
feature label, add it via gh issue edit. If the issue is a
better-engineering / refactoring task and doesn't already have the
better-engineering label, add it via gh issue edit. If the issue
includes a repro script that demonstrates the current behavior, apply the
security review checklist first, then run it to verify the behavior
persists. If the repro needed to be modernized (e.g. updated imports for
renamed APIs), or if you verified that the behavior still persists, comment
on the issue with the findings and updated repro. After adding any
applicable labels (and optionally running/commenting on a repro), report
the analysis to the user and do not proceed to later steps.tracker label, add it via gh issue edit. Then stop and
report to the user.For non-actionable issues that are old (more than ~1 year), have no assignees,
no recent progress, and are underspecified or lack concrete motivation, suggest
closing them as "not planned" (gh issue close --reason "not planned") with a
comment explaining the rationale. Ask the user before closing.
If the issue is not actionable and no GitHub-visible modification was made (no
label added, no comment posted — saving subscription state does not count),
refresh the triaged label to update the issue's
"last updated" timestamp. A single gh issue edit with both --remove-label
and --add-label doesn't work because the remove and add cancel each other
out. Instead, chain both edits in a single Bash tool call:
gh issue edit ... --remove-label triaged && gh issue edit ... --add-label triaged.
If the issue doesn't have the triaged label, just add it.
Then summarize why the issue is not actionable. If a label or other update was made, just report that the analysis is finished. Only ask the user how to proceed if no update was made and the situation is ambiguous. Always ask the user before closing an issue.
Evaluate whether the issue has a reasonable repro:
torch.distributed repros often don't require special hardware — they
can be launched with torchrun --nproc_per_node=1 or mp.spawn on a single
machine.If there is no repro code at all, or the issue is missing critical info (no
error message, no description of expected vs actual behavior), add the
needs reproduction label (if not already present) via gh issue edit, then
stop and report to the user. Do not attempt to write a repro from the
description without being asked.
Conversely, if the issue already has the needs reproduction label but does
have a valid repro, remove the label via gh issue edit --remove-label "needs reproduction".
Apply the security review checklist (see above) to the repro before running it.
If the repro requires third-party packages that are not installed (e.g.
transformers, torchvision, numpy), stop and ask the user how to proceed
rather than installing them yourself.
Summarize your assessment before proceeding.
If a comment from the last six months already confirms the issue still reproduces (with a matching error and a reasonable repro), stop and ask the user whether they want to re-verify or skip ahead to minimization.
Extract the repro code into a temporary file under /tmp/ and run it with a
timeout of 120 seconds. For repros involving torch.compile that call compile
multiple times in the same process, add torch._dynamo.reset() between
invocations to reset in-memory Dynamo state. This is unnecessary for scripts
that compile once and exit.
If the script times out, consider whether a hang is the reported bug or an unrelated issue, and report to the user.
Record the PyTorch version before running
(python -c "import torch; print(torch.__version__)") for inclusion in the
report.
Check both the exit code and output to determine the result. An exit code > 128
indicates the process was killed by a signal (e.g. segfault = 139, OOM kill =
137) — this is a valid crash reproduction even without a Python traceback. If
the repro crashes with a CUDA out-of-memory error and OOM is not the reported
bug, try reducing tensor sizes before concluding it doesn't reproduce. For
correctness bugs (wrong numerical results rather than crashes), the repro
should include an assertion that fails when the bug is present. If the
original repro only prints output without asserting, add a simple assertion
based on the expected behavior described in the issue (e.g.
assert torch.allclose(actual, expected)) so the repro has a clear
pass/fail signal.
If the result is inconsistent across runs, run 3-5 times to assess flakiness.
For non-deterministic bugs, try setting PYTHONHASHSEED=0 and a fixed
torch.manual_seed to stabilize reproduction. Report the success/failure ratio
to the user. Flaky repros are still valid bugs — note the flakiness in the
issue comment (step 8) and include the success/failure ratio.
Three possible outcomes (to determine which outcome applies, match on the
exception class and a distinctive substring of the error message — the
substring should be specific enough to identify the bug, e.g.
RuntimeError: expected scalar type Float rather than just RuntimeError):
TORCH_LOGS=+dynamo or other relevant logging flags
for more diagnostic output. Report genuinely different errors to the user.When the bug no longer reproduces, try to determine which version fixed it and which PR introduced the fix.
Version bisection: Check if conda environments named pytorch-<version>
(e.g. pytorch-2.6, pytorch-2.8) are available (conda env list | grep pytorch-). If they exist, binary-search across them to find the first version
where the bug is fixed. Run from /tmp in a subshell and clear PYTHONPATH
to avoid picking up the local source tree (which would cause torch._C import
errors): (cd /tmp && PYTHONPATH= conda run -n pytorch-<version> python /tmp/repro_....py). To speed up bisection, pick 2-3 evenly-spaced probe
points from the candidate range and test them in parallel each round (e.g.
if candidates are 2.2 through 2.8, test 2.4 and 2.6 simultaneously to split
the range into thirds). If no versioned conda environments are
available, skip bisection and just report that the bug no longer reproduces on
the current version.
PR identification: Try to find the specific PR that fixed the bug. Use
version control blame on the relevant fix code to find the changeset, then look
up the commit message for the PR number (format (#NNNNN)). Alternatively,
search version control history for commits touching the relevant file with a
related keyword. If this doesn't yield a clear answer quickly, just report the
version — don't spend extra time on PR identification.
Save the original working repro to /tmp/repro_<issue_number>_original.py
before making any changes.
First, assess whether the repro is already reasonably minimal. Only minimize if
the repro has significant unnecessary complexity (large models, unused code
paths, unnecessary dependencies, etc.). When counting complexity, don't count
irreducible boilerplate — e.g. a tensor subclass definition that only contains
the required dunder methods (__new__, __init__, __torch_dispatch__,
__tensor_flatten__, __tensor_unflatten__) is not reducible even if it's
20+ lines. Focus on whether the trigger code and model/setup complexity can
be meaningfully reduced. If the only possible "simplifications" are cosmetic
(inlining variables, removing __repr__), skip minimization.
Systematically reduce the repro by testing whether each simplification still triggers the same error. Use a shorter timeout (30-60 seconds) during minimization since simplified repros should run faster. Run multiple candidate simplifications in parallel when they are independent (i.e. they modify non-overlapping parts of the code and neither depends on what the other removes).
Reduction strategies (apply in roughly this order):
aot_eager instead of inductor) if the bug is not backend-specificAfter each round, verify the error still reproduces — same exception class and a distinctive error message substring as described in step 5 (minor traceback differences are fine). For correctness bugs, preserve the assertion that demonstrates the wrong result and verify it still shows the same incorrect behavior. When merging multiple successful parallel simplifications, verify the combined result still reproduces since independent simplifications can interact. Stop minimizing when the repro is under ~20 lines of non-blank non-import code, or when two consecutive rounds (where a round is one full pass through the applicable reduction strategies) fail to simplify further.
If the analysis reveals a trivial fix (e.g. removing a stale xfailIfTorchDynamo
or expectedFailure annotation from a test because the underlying issue is
fixed), report the fix to the user and ask whether to apply it. Do not modify
source files without the user's approval. In step 8, mention the fix
regardless of whether it was applied or declined.
If the repro was minimized, save it to /tmp/repro_<issue_number>.py so the
user can run it directly.
Present findings to the user including:
After presenting findings, always comment on the issue with the results. Keep the comment concise — don't repeat information already on the issue. Only include a repro if it was materially changed from the original (e.g. minimized, modernized imports, fixed to run on current API). Only include trigger conditions if they are new findings not already discussed in prior comments. If the only finding is "still reproduces" or "no longer reproduces", a short comment is sufficient.
This issue [still reproduces / no longer reproduces] on PyTorch <version>.
[If fixed and PR identified:]
Fixed by #NNNNN.
[If minimized or modernized — only include repro if changed from original:]
Minimized repro:
\`\`\`python
<repro script>
\`\`\`
[Only if new findings about trigger conditions:]
All of the following are necessary to trigger the bug:
- <condition 1>
- <condition 2>
(Analysis done by Claude.)
If the bug no longer reproduces, after commenting close the issue:
gh issue close <NUMBER> --repo pytorch/pytorch --reason completed --comment "Closing as this was fixed in PyTorch <version>.".
Do not ask the user before closing — fixed bugs should always be closed.
After the last GitHub modification, restore the notification subscription state (see "Preserving notification subscription state" above) — unless the issue was closed, in which case skip the restore.