Run any Skill in Manus with one click

$pwd:

flaky-test-detector

Name: Flaky Test Detector
Author: dotnet

// Detect flaky tests by scanning recent AzDo CI builds for test failures recurring across multiple unrelated PRs. Use when investigating intermittent failures, CI instability, deciding which tests to quarantine, or checking if RunTestCasesInSequence no-ops are causing parallel-safety issues.

Run Skill in Manus

$ git log --oneline --stat

stars:4,288

forks:859

updated:March 16, 2026 at 10:26

File Explorer

2 files

SKILL.md

readonly

name	flaky-test-detector
description	Detect flaky tests by scanning recent AzDo CI builds for test failures recurring across multiple unrelated PRs. Use when investigating intermittent failures, CI instability, deciding which tests to quarantine, or checking if RunTestCasesInSequence no-ops are causing parallel-safety issues.
metadata	{"author":"fsharp-team","version":"1.0"}

Flaky Test Detector

Identifies tests that fail intermittently across unrelated PRs — a strong signal of flakiness rather than a genuine regression. Also cross-references with existing fix PRs.

When to Use

Investigating CI instability ("is this test failure my fault or flaky?")
Periodic hygiene: finding tests to quarantine or fix
Before marking a test as Skip = "Flaky" — confirm it actually is flaky
Checking if RunTestCasesInSequence (a no-op in xUnit 2) is masking parallelism bugs

How It Works

Queries Azure DevOps builds API directly for recent failed fsharp-ci PR builds
Extracts test failures from each build via Get-BuildErrors.ps1
Aggregates by test name across distinct PRs
Cross-references with GitHub PRs that may address the flaky tests
Tests failing in 3+ distinct PRs are flagged as flaky

Usage

Quick scan (last 14 days, 50 builds, threshold = 3)

pwsh .github/skills/flaky-test-detector/scripts/Get-FlakyTests.ps1

Custom parameters

# More aggressive: 2+ PRs over 7 days
pwsh .github/skills/flaky-test-detector/scripts/Get-FlakyTests.ps1 -MinPRFailures 2 -DaysBack 7

# Wider net: 100 builds over 30 days  
pwsh .github/skills/flaky-test-detector/scripts/Get-FlakyTests.ps1 -MaxBuilds 100 -DaysBack 30

Parameters

Parameter	Default	Description
`-MaxBuilds`	50	Maximum number of failed builds to scan from AzDo
`-MinPRFailures`	3	Min distinct PRs a test must fail in to be flagged
`-DaysBack`	14	Only consider builds within this time window
`-DefinitionId`	90	AzDo pipeline definition ID (90 = fsharp-ci)
`-Org`	dnceng-public	Azure DevOps organization
`-Project`	public	Azure DevOps project

Output

The script produces:

Console report with ranked flaky tests, PR numbers, job names, and sample errors
Structured objects (PowerShell) for programmatic consumption

Interpreting Results

DistinctPRs ≥ 5: Almost certainly flaky — consider quarantining immediately
DistinctPRs = 3–4: Likely flaky — investigate root cause
DistinctPRs = 2: Possibly flaky or a shared dependency issue — monitor

Follow-up Actions

After identifying a flaky test:

Check if there's already a GitHub issue for it
If not, file one with the Area-flaky-test label
Consider marking with [<Fact(Skip = "Flaky: #ISSUE")>] if it blocks CI
Fix the root cause (timing, file locking, thread safety, etc.)

related-skills.json

same repository

reviewing-compiler-prs.md

from "dotnet/fsharp"

Performs multi-agent, multi-model code review of F# compiler PRs across 19 dimensions including type checking, IL emission, binary compatibility, and IDE performance. Dispatches parallel assessment agents per dimension, consolidates with cross-model agreement scoring, and filters false positives. Invoke when reviewing compiler changes, requesting expert feedback, or performing pre-merge quality checks.

2026-05-204.3k

fsharp-diagnostics.md

from "dotnet/fsharp"

Always invoke after editing .fs files. Provides fast parse/typecheck feedback without a full dotnet build. Prefer this over dotnet build for iterative changes. Also finds symbol references and inferred type hints.

2026-04-014.3k

vsintegration-ide-debugging.md

from "dotnet/fsharp"

Fix F# debugging issues (breakpoints, .pdb, sequence points). Build, run VS integration tests, inspect IL/PDB.

2026-03-134.3k

pr-build-status.md

from "dotnet/fsharp"

Retrieve and analyze Azure DevOps build failures for GitHub PRs. Use when CI fails. CRITICAL: Collect ALL errors from ALL platforms FIRST, write hypotheses to file, then fix systematically.

2026-03-024.3k

hypothesis-driven-debugging.md

from "dotnet/fsharp"

Investigate compiler failures, test errors, or unexpected behavior through systematic minimal reproduction, 3-hypothesis testing, and verification. Always re-run builds and tests after changes.

2026-02-194.3k

ilverify-failure.md

from "dotnet/fsharp"

Fix ILVerify baseline failures when IL shape changes (codegen, new types, method signatures). Use when CI fails on ILVerify job.

2026-02-124.3k

package.json

"author": "dotnet"

"repository": "dotnet/fsharp"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software Quality Assurance Analysts and TestersComputer and Mathematical Occupations15-1253L4

name	flaky-test-detector
description	Detect flaky tests by scanning recent AzDo CI builds for test failures recurring across multiple unrelated PRs. Use when investigating intermittent failures, CI instability, deciding which tests to quarantine, or checking if RunTestCasesInSequence no-ops are causing parallel-safety issues.
metadata	{"author":"fsharp-team","version":"1.0"}

Flaky Test Detector

Identifies tests that fail intermittently across unrelated PRs — a strong signal of flakiness rather than a genuine regression. Also cross-references with existing fix PRs.

When to Use

Investigating CI instability ("is this test failure my fault or flaky?")
Periodic hygiene: finding tests to quarantine or fix
Before marking a test as Skip = "Flaky" — confirm it actually is flaky
Checking if RunTestCasesInSequence (a no-op in xUnit 2) is masking parallelism bugs

How It Works

Queries Azure DevOps builds API directly for recent failed fsharp-ci PR builds
Extracts test failures from each build via Get-BuildErrors.ps1
Aggregates by test name across distinct PRs
Cross-references with GitHub PRs that may address the flaky tests
Tests failing in 3+ distinct PRs are flagged as flaky

Usage

Quick scan (last 14 days, 50 builds, threshold = 3)

pwsh .github/skills/flaky-test-detector/scripts/Get-FlakyTests.ps1

Custom parameters

# More aggressive: 2+ PRs over 7 days
pwsh .github/skills/flaky-test-detector/scripts/Get-FlakyTests.ps1 -MinPRFailures 2 -DaysBack 7

# Wider net: 100 builds over 30 days  
pwsh .github/skills/flaky-test-detector/scripts/Get-FlakyTests.ps1 -MaxBuilds 100 -DaysBack 30

Parameters

Parameter	Default	Description
`-MaxBuilds`	50	Maximum number of failed builds to scan from AzDo
`-MinPRFailures`	3	Min distinct PRs a test must fail in to be flagged
`-DaysBack`	14	Only consider builds within this time window
`-DefinitionId`	90	AzDo pipeline definition ID (90 = fsharp-ci)
`-Org`	dnceng-public	Azure DevOps organization
`-Project`	public	Azure DevOps project

Output

The script produces:

Console report with ranked flaky tests, PR numbers, job names, and sample errors
Structured objects (PowerShell) for programmatic consumption

Interpreting Results

DistinctPRs ≥ 5: Almost certainly flaky — consider quarantining immediately
DistinctPRs = 3–4: Likely flaky — investigate root cause
DistinctPRs = 2: Possibly flaky or a shared dependency issue — monitor

Follow-up Actions

After identifying a flaky test:

Check if there's already a GitHub issue for it
If not, file one with the Area-flaky-test label
Consider marking with [<Fact(Skip = "Flaky: #ISSUE")>] if it blocks CI
Fix the root cause (timing, file locking, thread safety, etc.)

flaky-test-detector

Flaky Test Detector

When to Use

How It Works

Usage

Quick scan (last 14 days, 50 builds, threshold = 3)

Custom parameters

Parameters

Output

Interpreting Results

Follow-up Actions

More from this repository

More from this repository

Flaky Test Detector

When to Use

How It Works

Usage

Quick scan (last 14 days, 50 builds, threshold = 3)

Custom parameters

Parameters

Output

Interpreting Results

Follow-up Actions