ワンクリックでManusで任意のスキルを実行

$pwd:

rota-bench-regression-analysis

Name: Rota Bench Regression Analysis
Author: oracle

// Analyze recent GraalPy benchmark regressions on `master` as part of the weekly rota. Use when asked to analyze benchmarks for rota.

Manusで実行

$ git log --oneline --stat

stars:1,597

forks:152

updated:2026年5月6日 08:08

SKILL.md

readonly

name	rota-bench-regression-analysis
description	Analyze recent GraalPy benchmark regressions on `master` as part of the weekly rota. Use when asked to analyze benchmarks for rota.

Bench Regression Analysis

Use This Skill For

Recent regression summaries from scripts/compare_bench_regressions.py --rota.
Follow-up inspection of unattributed plausible change points.

Core Workflow

Run:

scripts/compare_bench_regressions.py --rota --json-out /tmp/compare_bench_regressions_rota.json

Use the text output for the human summary and the JSON for precise inspection.
Focus on plausible regressions. Ignore flaky and inconclusive items unless they help explain a plausible shift.
Split the summary into:

Attributed
To bisect
To watch

Show the current summary
Execute the bisect script for each "to bisect" entry in parallel, then wait for all of them to finish. The builds can take many hours without the script showing any output, make sure you wait for them with a long timeout. If running in codex: round-robin poll the processes with write_stdin and 1 hour timeout (the configuration might cap this at a lower timeout in practice)
Collect the bisect results and move any benchmarks that were attributed by the bisections.
Show the final summary. Note any failed bisects.

Useful JSON Queries

jq '.direct_suspects[] | {good_commit, bad_commit, bad_author_email, bad_subject}' \
  /tmp/compare_bench_regressions_rota.json

jq '[.change_points[] | select(.classification == "plausible")]' \
  /tmp/compare_bench_regressions_rota.json

Attributed Regressions

Start from direct_suspects in /tmp/compare_bench_regressions_rota.json.
For each suspect, keep the abbreviated bad commit ID, full author email, full commit subject, and the worst example benchmarks per suite, not the full list.
Prefer one worst example per affected suite such as micro, meso, macro...

Unattributed Regressions

Start from plausible change_points whose (good_commit, bad_commit] pair is not already covered by direct_suspects.
Inspect the range with:

git log --first-parent --reverse --format='%H%x09%ae%x09%s' GOOD..BAD
git show --stat --summary --format=fuller BAD
git diff --stat GOOD..BAD

If needed, inspect individual commits in the range with git show --stat --summary --format=fuller COMMIT.

Attribution Rules

If the change point is an exact single-parent GraalPy commit and mx.graalpython/suite.py imports did not change, it can usually be attributed to that commit.
Changes to imports in mx.graalpython/suite.py can never be confidently attributed without bisecting Graal. Keep those unattributed and say so explicitly (including the graal commit range)
If an unattributed first-parent range contains one plausible GraalPy code change and the rest are documentation, tests, retags, or other non-performance changes, attribute it to that one code change.
If the series is already shifted by an earlier attributed commit and a later unattributed range only preserves the new level, fold the later item into the earlier attribution.
Cross-configuration correlation matters.
If native shows an exact jump on one commit and jvm later shows the same benchmark shifted upward through a range containing that commit, treat them as likely the same cause unless the later range has a better candidate.
If both jvm and native jump at or immediately after a Graal import update, keep both under the same unattributed Graal-side cause.

Flakiness Check

Use Bench Server data when the unattributed item is small or suspicious.
Query the benchmark series with bench-cli run - and check whether the change is a clean step up that stays high, a one-point spike that immediately falls back, or already present before the reported range.
A stable step change is a real regression candidate.
An isolated last-point bump with no supporting related regressions is usually watch-and-rerun material, not a strong attribution.

Bench Server Checks

Prefer querying only the specific benchmark and configuration under investigation.
Typical filters: branch = master, target benchmark, host-vm = graalvm-ee, target host-vm-config, guest-vm = graalpython, target guest-vm-config, metric.name = time, commit.committer-ts last-n 30d.
Reduce output to commit.rev and average metric value so the step pattern is easy to inspect.
bench-cli sometimes fails with 404 when the server is overloaded. If that happens, wait for a minute and try again

Output Contract

List findings first, not process notes.
Keep three top-level sections (if not empty): Attributed and To bisect and To watch
In the attributed section, use this header format: abcd1234efgh | author@oracle.com | Full subject
Unattributed changes that look plausible go to "to bisect", flaky ones go to "to watch"
In the "to bisect" section, add an invocation (don't execute yet) of scripts/bisect_benchmark_regression.py that can bisect it (use unabbreviated commits in this case)
In the "to watch" section, say whether the item looks flaky, or likely the same cause as another attributed item.
Do not abbreviate commit subjects.
Keep author emails.
Abbreviate commit IDs to 12 characters.
Do not list every benchmark if there are many; only the worst examples from each affected suite. If you didn't list all, say "and X others".

Guardrails

If the script or you can't find bench-cli, ask the user to provide it from the bench-server repo.
- You may offer to clone the repo and create the cli for the user. The repo is on the same bitbucket as graalpython, the project is INFRA and the repo is called bench-server
Don't submit more than 5 bisect jobs. If there are more in the "to bisect" list, pick 5 that look the most serious and leave the rest as "to bisect".

related-skills.json

同じリポジトリ

third-party-package-patches.md

from "oracle/graalpython"

Create or update GraalPy third-party package compatibility patches under graalpython/lib-graalpython/patches, including PyPI source preparation, rebasing existing patches, metadata.toml updates, license checks, version-range validation, and verify_patches.py validation.

2026-05-111.6k

rota-check-periodic-jobs.md

from "oracle/graalpython"

Analyze current GraalPy periodic job failures for ROTA. Use when asked to triage, summarize, or plan work for current periodic job failures, starting from scripts/rota_ci_failures.py output, validating linked Jira issues, inspecting logs, forming hypotheses, reproduction commands, and implementation order.

2026-04-291.6k

github-pr-mirror.md

from "oracle/graalpython"

Mirror an external GitHub pull request into the internal GraalPython Bitbucket review flow, including OCA label checks, Jira creation or reuse, preserving PR commits, pre-commit cleanup, and handoff to the shared GraalPython Bitbucket PR flow for PR creation, Graal Bot tasks, gates, and fixes.

2026-04-291.6k

graalpython-bitbucket-pr.md

from "oracle/graalpython"

Create or continue a GraalPython Bitbucket pull request and drive it through Graal Bot tasks, gate start, gate monitoring, failure investigation, fixes, pushes, and gate reruns. Use after a branch is ready for internal GraalPython review, or when an automation command has already created the PR and the remaining work is task cleanup and gate follow-up.

2026-04-291.6k

rota-update-import.md

from "oracle/graalpython"

Run the GraalPy ROTA import update workflow. Use when asked to refresh imports, create the standard Graal import update pull request, inspect generated commits, and hand off to the shared GraalPython Bitbucket PR flow for tasks, gates, and failure fixes.

2026-04-291.6k

pr-gate-check.md

from "oracle/graalpython"

Check gate status for a Bitbucket PR by resolving the PR head commit, finding the gate merge commit, inspecting builds on that merge commit, and summarizing root-cause failures with actionable next steps.

2026-04-141.6k

package.json

"author": "oracle"

"repository": "oracle/graalpython"

GitHub リポジトリを開く Creator のリポジトリを見る

$ install --global

$ download --local

Manusで実行

$ useful --forSOC

ソフトウェア開発者コンピュータ・数学職15-1252L4

name	rota-bench-regression-analysis
description	Analyze recent GraalPy benchmark regressions on `master` as part of the weekly rota. Use when asked to analyze benchmarks for rota.

Bench Regression Analysis

Use This Skill For

Recent regression summaries from scripts/compare_bench_regressions.py --rota.
Follow-up inspection of unattributed plausible change points.

Core Workflow

Run:

scripts/compare_bench_regressions.py --rota --json-out /tmp/compare_bench_regressions_rota.json

Use the text output for the human summary and the JSON for precise inspection.
Focus on plausible regressions. Ignore flaky and inconclusive items unless they help explain a plausible shift.
Split the summary into:

Attributed
To bisect
To watch

Show the current summary
Execute the bisect script for each "to bisect" entry in parallel, then wait for all of them to finish. The builds can take many hours without the script showing any output, make sure you wait for them with a long timeout. If running in codex: round-robin poll the processes with write_stdin and 1 hour timeout (the configuration might cap this at a lower timeout in practice)
Collect the bisect results and move any benchmarks that were attributed by the bisections.
Show the final summary. Note any failed bisects.

Useful JSON Queries

jq '.direct_suspects[] | {good_commit, bad_commit, bad_author_email, bad_subject}' \
  /tmp/compare_bench_regressions_rota.json

jq '[.change_points[] | select(.classification == "plausible")]' \
  /tmp/compare_bench_regressions_rota.json

Attributed Regressions

Start from direct_suspects in /tmp/compare_bench_regressions_rota.json.
For each suspect, keep the abbreviated bad commit ID, full author email, full commit subject, and the worst example benchmarks per suite, not the full list.
Prefer one worst example per affected suite such as micro, meso, macro...

Unattributed Regressions

Start from plausible change_points whose (good_commit, bad_commit] pair is not already covered by direct_suspects.
Inspect the range with:

git log --first-parent --reverse --format='%H%x09%ae%x09%s' GOOD..BAD
git show --stat --summary --format=fuller BAD
git diff --stat GOOD..BAD

If needed, inspect individual commits in the range with git show --stat --summary --format=fuller COMMIT.

Attribution Rules

If the change point is an exact single-parent GraalPy commit and mx.graalpython/suite.py imports did not change, it can usually be attributed to that commit.
Changes to imports in mx.graalpython/suite.py can never be confidently attributed without bisecting Graal. Keep those unattributed and say so explicitly (including the graal commit range)
If an unattributed first-parent range contains one plausible GraalPy code change and the rest are documentation, tests, retags, or other non-performance changes, attribute it to that one code change.
If the series is already shifted by an earlier attributed commit and a later unattributed range only preserves the new level, fold the later item into the earlier attribution.
Cross-configuration correlation matters.
If native shows an exact jump on one commit and jvm later shows the same benchmark shifted upward through a range containing that commit, treat them as likely the same cause unless the later range has a better candidate.
If both jvm and native jump at or immediately after a Graal import update, keep both under the same unattributed Graal-side cause.

Flakiness Check

Use Bench Server data when the unattributed item is small or suspicious.
Query the benchmark series with bench-cli run - and check whether the change is a clean step up that stays high, a one-point spike that immediately falls back, or already present before the reported range.
A stable step change is a real regression candidate.
An isolated last-point bump with no supporting related regressions is usually watch-and-rerun material, not a strong attribution.

Bench Server Checks

Prefer querying only the specific benchmark and configuration under investigation.
Typical filters: branch = master, target benchmark, host-vm = graalvm-ee, target host-vm-config, guest-vm = graalpython, target guest-vm-config, metric.name = time, commit.committer-ts last-n 30d.
Reduce output to commit.rev and average metric value so the step pattern is easy to inspect.
bench-cli sometimes fails with 404 when the server is overloaded. If that happens, wait for a minute and try again

Output Contract

List findings first, not process notes.
Keep three top-level sections (if not empty): Attributed and To bisect and To watch
In the attributed section, use this header format: abcd1234efgh | author@oracle.com | Full subject
Unattributed changes that look plausible go to "to bisect", flaky ones go to "to watch"
In the "to bisect" section, add an invocation (don't execute yet) of scripts/bisect_benchmark_regression.py that can bisect it (use unabbreviated commits in this case)
In the "to watch" section, say whether the item looks flaky, or likely the same cause as another attributed item.
Do not abbreviate commit subjects.
Keep author emails.
Abbreviate commit IDs to 12 characters.
Do not list every benchmark if there are many; only the worst examples from each affected suite. If you didn't list all, say "and X others".

Guardrails

If the script or you can't find bench-cli, ask the user to provide it from the bench-server repo.
- You may offer to clone the repo and create the cli for the user. The repo is on the same bitbucket as graalpython, the project is INFRA and the repo is called bench-server
Don't submit more than 5 bisect jobs. If there are more in the "to bisect" list, pick 5 that look the most serious and leave the rest as "to bisect".

rota-bench-regression-analysis

Bench Regression Analysis

Use This Skill For

Core Workflow

Useful JSON Queries

Attributed Regressions

Unattributed Regressions

Attribution Rules

Flakiness Check

Bench Server Checks

Output Contract

Guardrails

このリポジトリの他の Skills

このリポジトリの他の Skills

Bench Regression Analysis

Use This Skill For

Core Workflow

Useful JSON Queries

Attributed Regressions

Unattributed Regressions

Attribution Rules

Flakiness Check

Bench Server Checks

Output Contract

Guardrails