원클릭으로
원클릭으로
| name | fix-flaky-tests |
| description | Use this when asked to fix flaky bazel tests. |
This guide explains how to find flaky tests to fix and how to debug them. Flaky tests are bazel tests that run on GitHub workflows that pass after having failed in a previous attempt.
Make sure you're on an up-to-date master branch to ensure you're using and reading the latest code:
git checkout master && git pull
Run gh auth status to check if gh is authenticated with github.com using Git operations protocol: ssh.
If not run:
gh auth login --hostname github.com --git-protocol ssh --skip-ssh-key --web
This prints a one-time device code and a URL. Instruct the user to open the URL in their browser and enter the code.
Do not use the bare gh auth login command, as the interactive prompts are unreliable when run from an AI agent.
If not instructed to fix a test with a specified label determine which test to fix by picking the most flaky test in the last week which has not yet been fixed. To do this:
Run the following command to get the top 100 tests ordered descendingly by how much percent of their total runs they flaked in the last week, showing only tests which flaked 1% or more of their runs:
bazel run //ci/githubstats:query -- top 100 flaky% --ge 1 --week
Pick the label of the top most test which doesn't have an open PR or git commit in the last week mentioning its <test_name> which is the part of the label after the :.
<test_name> might be suffixed with _head_nns or _colocate which are variants of the same test. Strip those suffixes when checking for open PRs or commits to avoid missing matches.
To check if there is an open PR mentioning the test, run the following command (replace underscores with spaces because GitHub search doesn't match underscored compound words):
gh pr list --search "$(echo '<test_name>' | tr '_' ' ')" --state open
To check if there is a git commit mentioning the test, run the following command:
git log --oneline --since 'last week' | grep "$(echo '<test_name>' | tr '_' '.')"
Continue with the next test if you find an open PR or commit mentioning <test_name>
even if it seems the commit is not about fixing flakiness.
It's better to pick a test which has no other work being done on it to avoid conflicts.
Get the last flaky runs of the test named label in the last week by running the following command, replacing <label> with the label of the test:
bazel run //ci/githubstats:query -- last --flaky --week --download-ic-logs --download-console-logs <label>
Note the command will print Downloading logs to: <LOG_DIR>.
Read <LOG_DIR>/README.md to understand how the logs are organized.
Analyze the source code of label and the logs in <LOG_DIR> to determine the root causes of the flakiness.
Ignore failures containing the error:
Retried too many times: sending a request to Farm
as those are due to infrastructure issues unrelated to the test code.
When asked to just figure out the root causes without fixing them, document the root causes in a markdown file and finish without running the following steps.
Once the root causes have been determined, pick the most common or recent one
and fix the test by addressing that root cause taking .claude/CLAUDE.md into account.
Don't address multiple root causes in the same fix. Prefer making separate PRs for each root cause to make it easier to review and revert if needed.
Verify the test still passes by running:
bazel test --test_output=errors --runs_per_test=3 --jobs=3 <label>
This executes 3 runs of the test in parallel to increase the chances of reproducing the flakiness. If it fails, analyze the failure and fix it until it passes reliably.
Make a draft Pull Request with the fix, following these steps:
From the root of the repository, create a new git branch named ai/deflake-<test_name>-<date>,
replacing <test_name> with the name of the test
and <date> with the current date in YYYY-MM-DD format,
and commit your fix to that branch.
Push the branch to origin (assuming it's git@github.com:dfinity/ic.git) using:
git push --set-upstream origin HEAD
Submit a draft PR using gh with the fix.
Name it: fix: deflake <label>.
Include the root cause analysis in the PR description
and mention the PR was created following the steps in .claude/skills/fix-flaky-tests/SKILL.md.