원클릭으로
close-flaky-issues
// Audit open "flaky test" GitHub issues and close those whose tests are no longer failing on master. Cross-references CI history from play.clickhouse.com with git log to attribute fixes.
// Audit open "flaky test" GitHub issues and close those whose tests are no longer failing on master. Cross-references CI history from play.clickhouse.com with git log to attribute fixes.
Create a ClickHouse git worktree with submodules hardlinked from the main repo. Use when the user wants to create a new worktree for ClickHouse development.
Review a ClickHouse Pull Request for correctness, safety, performance, and compliance. Use when the user wants to review a PR or diff.
Continue work on an existing PR - resolve conflicts, fix CI failures, address reviewer feedback, and push updates. Use when the user wants to pick up and advance a pull request.
Generate PR descriptions for ClickHouse/ClickHouse that match maintainer expectations. Use when creating or updating PR descriptions.
Analyze ClickHouse Keeper stress-test results from play.clickhouse.com / keeper_stress_tests data warehouse. Use whenever the user asks about Keeper performance, validates Keeper PRs against stress dashboards, investigates regressions or improvements in Keeper nightlies, asks about specific date windows / SHAs / PR-sets in Keeper stress tests, wants per-PR or window-vs-window comparisons, asks "did this PR break Keeper", asks "what changed in Keeper between dates", or wants a summary report of Keeper stress runs. Triggers on terms like "keeper stress", "keeper PR", "keeper p99", "keeper memory", "keeper rps", "keeper nightly", "keeper-stress-tests", "keeper validation", "keeper regression", or any question referencing the keeper-stress Grafana dashboard. ALWAYS prefer this skill over re-deriving the workflow from scratch — it captures hard-learned lessons about cgroup-vs-Keeper memory, bench-harness confounds, noise floors, and per-PR attribution limits.
Edit an auto-generated ClickHouse release changelog into the form that gets committed to CHANGELOG.md. Use when the user has the output of `utils/changelog/changelog.py` and wants it cleaned up and re-categorized for a release.
| name | close-flaky-issues |
| description | Audit open "flaky test" GitHub issues and close those whose tests are no longer failing on master. Cross-references CI history from play.clickhouse.com with git log to attribute fixes. |
| argument-hint | [threshold-days] [dry-run] |
| disable-model-invocation | false |
| allowed-tools | Bash, Read, Grep, Glob |
$0 (optional): number of days the test must be failure-free on master before it is eligible for closing. Default: 14.$1 (optional): pass dry-run to print the table of close candidates and stop without calling gh issue close. Any other value (or none) runs the full sweep.Sweep open flaky test issues and close ones where:
These issues are auto-generated by the CI bot — closing one is safe because the bot reopens automatically if the test fails again on master.
mkdir -p tmp/flaky
gh issue list --repo ClickHouse/ClickHouse --label "flaky test" --state open \
--limit 200 --json number,title,body > tmp/flaky/issues.json
Each body has the format:
Test name: <test_name>
Failure reason: <reason>
CI report: <url>
Failing test history: <play.clickhouse.com link with base64 SQL>
Extract (number, test_name) pairs:
jq -r '.[] | [.number, ((.body | capture("Test name: (?<n>.+)")? | .n) // "")] | @tsv' tmp/flaky/issues.json
The trailing ? after capture is required: without it, an issue with a null body or one that doesn't contain Test name: raises an error that aborts the whole pipeline and silently drops every subsequent issue. With ?, those rows produce an empty string instead, and you can fall back to the title or to the base64 SQL after # in the Failing test history URL — some old issues use the title rather than Test name: in the body.
Run one batch query against play.clickhouse.com for all extracted test names. The checks table is publicly readable via the play user.
Before running the query, guard against an empty list. If jq extracted no non-empty test names (zero open flaky test issues, or all bodies missing Test name:), test_name IN () is invalid SQL and aborts the sweep. Skip the query in that case and report "nothing to close".
Escape single quotes in test names before interpolation. Per ClickHouse SQL rules, a literal ' inside a string literal is written as ''. Names that contain a quote (e.g. parametrized integration tests like test_foo[a'b]) will otherwise produce invalid SQL or, worse, parse as a different list. Apply s/'/''/g to each extracted name before joining them with , .
curl -sS 'https://play.clickhouse.com/?user=play' --data-binary "
SELECT test_name,
countIf(test_status IN ('FAIL','ERROR')) AS failures_90d,
countIf(test_status IN ('FAIL','ERROR') AND check_start_time >= now() - INTERVAL 30 DAY) AS failures_30d,
countIf(test_status IN ('FAIL','ERROR') AND check_start_time >= now() - INTERVAL 14 DAY) AS failures_14d,
countIf(test_status IN ('FAIL','ERROR') AND check_start_time >= now() - INTERVAL 7 DAY) AS failures_7d,
maxIf(check_start_time, test_status IN ('FAIL','ERROR')) AS last_fail
FROM checks
WHERE check_start_time >= now() - INTERVAL 90 DAY
AND test_name IN ( '<test1>', '<test2>', ... )
AND (pull_request_number = 0 OR base_ref IN ('master',''))
GROUP BY test_name
ORDER BY failures_7d DESC, failures_14d DESC
FORMAT TabSeparatedWithNames
"
Important:
pull_request_number = 0 OR base_ref IN ('master','') filters to master commits and PRs targeting master — fork-PR noise is dropped. The empty-string case covers older rows where base_ref was not populated for direct-master runs.last_fail defaults to 1970-01-01 if the test never failed.For each issue, decide based on failures_<threshold-days>d:
0 failures in the window → close candidate>= 1 failure in the window → keep openDefault threshold is 14 days. Use 7 days for an aggressive sweep, 30 days for a conservative one.
For each close candidate, search git log:
# Look for "fix flaky" commits naming the test
git log origin/master --since="<date 90 days ago>" --format="%H %s" \
--grep="<test_name_fragment>" -i
# Look at recent direct modifications to the test file
git log origin/master --since="<date 90 days ago>" --format="%H %s" -- "<test_path>"
Not every close candidate has an explicit fix commit — some stabilize via surrounding infra changes. That's fine; close those too, just without a commit reference.
For integration tests (test_X/test.py::...), search the directory: tests/integration/test_X/.
For stateless tests (NNNNN_name), search: tests/queries/0_stateless/NNNNN_name.*.
Before issuing gh issue close, print a table showing:
The user already authorized closing when they invoked the skill, so do not ask follow-up questions — proceed to step 6 immediately after printing. The table is for after-the-fact spot-checking, not for interactive approval. If the user wants a preview without changes, they can pass dry-run as a second argument and the skill should stop here.
For each closeable issue, post a closing comment that names what fixed it and how recent the last failure was:
# With fix commit linked
gh issue close <N> --repo ClickHouse/ClickHouse \
--comment "Fixed by <SHA> (\`<commit subject>\`). No failures on master in the last <D> days (last seen <YYYY-MM-DD>). Closing."
# Without a specific fix
gh issue close <N> --repo ClickHouse/ClickHouse \
--comment "No failures on master in the last <D> days (last seen <YYYY-MM-DD>). Closing as no longer flaky; will reopen automatically by CI bot if it fails again."
Use full 40-char SHAs in comments — short SHAs can become ambiguous later.
Wrap test names, commit subjects, and identifiers in backticks per the project's CLAUDE.md style rule.
End-of-task summary: how many were closed (split into "with fix linked" vs "stabilized without fix"), how many remain open, and whether any high-frequency offenders are left (e.g. tests still failing >5 times per week — those may need an owner ping rather than a sweep).
DO NOT modify the body content. DO NOT remove labels. because CI matches against them. Closing with a comment is the only allowed action.tmp/flaky/ for working files, never /tmp (per CLAUDE.md).failures_14d == 0. Tests can have multi-week dormancy and then reappear — 14 days is a reasonable balance between cleaning up cruft and avoiding premature closure.