| name | test-anti-patterns |
| description | Audits existing test code in any language for anti-patterns and quality issues — produces a severity-ranked report (Critical / Warning / Info) with concrete code-level fixes. Polyglot: .NET (MSTest/xUnit/NUnit/ TUnit), Python (pytest/unittest), TS/JS (Jest/Vitest/Mocha/node:test), Java (JUnit/TestNG), Go, Ruby (RSpec/Minitest), Rust, Swift, Kotlin (JUnit/Kotest), PowerShell (Pester), C++ (GoogleTest/Catch2). INVOKE when asked to audit, review, rank, or find problems in existing tests — "audit my tests", "test smell audit", "rank by severity", tests that pass but verify nothing, no/missing assertions, swallowed exceptions, always-true / self-comparing / tautological assertions, broad exception types, flakiness (sleep/Date.now/time.sleep), ordering dependency, shared global state, duplicated tests, magic values, missing await on async assertions. DO NOT USE FOR: writing new tests (use code-testing-agent, or writing-mstest-tests for MSTest); running tests (use run-tests); framework migration.
|
| license | MIT |
Test Anti-Pattern Detection
Quick, pragmatic analysis of test code in any supported language for anti-patterns and quality issues that undermine test reliability, maintainability, and diagnostic value.
Language-specific guidance: Call the test-analysis-extensions skill to discover available extension files, then read the file matching the target codebase (e.g., extensions/dotnet.md, extensions/python.md, extensions/typescript.md, extensions/go.md). The extension file tells you which sleep / time / random / skip / setup-teardown / mystery-guest APIs to look for in that language.
When to Use
- User asks to review test quality or find test smells
- User wants to know why tests are flaky or unreliable
- User asks "are my tests good?" or "what's wrong with my tests?"
- User requests a test audit or test code review
- User wants to improve existing test code
When Not to Use
- User wants to write new tests from scratch (use
code-testing-agent for any language, or writing-mstest-tests for MSTest specifically)
- User wants direct implementation fixes rather than a diagnostic review (use the relevant write/edit skill)
- User asks to fix swapped
Assert.AreEqual argument order in MSTest (use writing-mstest-tests)
- User asks to convert MSTest
DynamicData from IEnumerable<object[]> to ValueTuple (use writing-mstest-tests)
- User wants to run or execute tests (use
run-tests for .NET)
- User wants to migrate between test frameworks or versions (use migration skills)
- User wants to measure code coverage (out of scope)
- User wants a deep formal test smell audit with academic taxonomy and extended catalog (use
test-smell-detection)
Inputs
| Input | Required | Description |
|---|
| Test code | Yes | One or more test files or classes to analyze |
| Production code | No | The code under test, for context on what tests should verify |
| Specific concern | No | A focused area like "flakiness" or "naming" to narrow the review |
Workflow
Step 1: Detect language and load extension
Identify the target codebase's language and test framework. Call the test-analysis-extensions skill and read the matching extension file. The extension file documents framework-specific anti-pattern markers — what counts as a sleep/wait, a test marker, a skip, a setup/teardown, a shared-state hot spot, and an integration boundary — so this skill stays language-neutral.
Step 2: Gather the test code
Read the test files the user wants reviewed. If the user points to a directory or project, scan for all test files using the discovery markers in the loaded language extension file (e.g., [TestClass]/[Fact]/[Test] for .NET, test_*.py / def test_* for pytest, *.test.ts / it() for Jest, *Test.java / @Test for JUnit, *_test.go / func TestXxx for Go, *_spec.rb for RSpec, #[test] for Rust, *.Tests.ps1 / Describe for Pester, TEST(...) for GoogleTest, TEST_CASE(...) for Catch2/doctest).
If production code is available, read it too -- this is critical for detecting tests that are coupled to implementation details rather than behavior.
Step 3: Scan for anti-patterns
Check each test file against the anti-pattern catalog below. Report findings grouped by severity. The examples are .NET-centric but the patterns generalize — use the loaded language extension file to map each pattern to the framework you are auditing.
Critical -- Tests that give false confidence
| Anti-Pattern | What to Look For |
|---|
| No assertions | Test methods that execute code but never assert anything. A passing test without assertions proves nothing. In .NET look for missing Assert.*; in pytest a function with no assert and no pytest.raises; in Jest no expect(...); in JUnit no assert*/assertThat; in Go a test that never calls t.Error*, t.Fatal*, or testify; in RSpec a block with no expect; in Pester no Should. Mock-call verifications (verify(mock), expect(mock).toHaveBeenCalled, Should -Invoke) are real assertions. |
| Missing await on async assertions (JS/TS, .NET, Python, Kotlin, Swift) | expect(promise).resolves.toBe(x) without await/return, pytest-asyncio test with un-awaited coroutine, async Task xUnit test calling Assert.ThrowsAsync without await, Kotest suspending test without runTest, Swift Testing async test without await. These tests silently pass even when the underlying assertion would have failed. |
| Coverage touching | Test class that methodically calls every public member on a type — often in alphabetical or declaration order — without asserting meaningful outcomes. Each test typically does var result = sut.MethodName(...) (or result = sut.method_name(...), sut.methodName(), sut.MethodName(t)) with no assertion, or only a trivial null/None/nil check. The intent is to inflate code-coverage metrics rather than verify behavior. Distinct from a single assertion-free test: the pattern is systematic coverage of the surface area with no real verification. |
| Self-referential assertion | Asserts that the output of an operation equals its input when the operation is expected to be an identity or no-op, e.g. Assert.AreEqual(input, Parse(input.ToString())), assert input == parse(str(input)), expect(parse(input.toString())).toBe(input), assert.Equal(t, input, parse(input)). Also flags Assert.AreEqual(dto.Name, dto.Name) / assert dto.name == dto.name / expect(dto.name).toBe(dto.name) (asserting a field against itself). The test is tautological — it can only fail if the round-trip is broken, but never verifies that a transformation actually happened. |
| Swallowed exceptions | try { ... } catch { }, catch (Exception) without rethrowing or asserting (.NET); bare except: or except Exception: with pass (Python); try { ... } catch (e) {} (JS/TS/Java); defer recover() without re-panic and no assertion (Go); rescue StandardError with no assertion (Ruby); Result::unwrap_or(...) swallowing errors in a test (Rust); empty catch block (Kotlin/Swift). |
| Assert in catch block only | try { Act(); } catch (Exception ex) { Assert.Fail(ex.Message); } (and equivalents in other languages) -- use Assert.ThrowsException / pytest.raises / expect(fn).toThrow / assertThrows / assert.Error(t, err) / #[should_panic] / Should -Throw / EXPECT_THROW instead. The test passes when no exception is thrown even if the result is wrong. |
| Always-true assertions | Assert.IsTrue(true), Assert.AreEqual(x, x), assert True, expect(true).toBe(true), assert.True(t, true), assert!(true), or conditions that can never fail. |
| Commented-out assertions | Assertions that were disabled but the test still runs, giving the illusion of coverage. |
High -- Tests likely to cause pain
| Anti-Pattern | What to Look For |
|---|
| Flakiness indicators | Wall-clock sleeps/waits used for synchronization: Thread.Sleep / Task.Delay (.NET), time.sleep (Python), setTimeout / await new Promise(r => setTimeout(...)) (JS/TS), Thread.sleep (Java/Kotlin), time.Sleep (Go), sleep (Ruby/Bash), std::thread::sleep (Rust), Start-Sleep (Pester), std::this_thread::sleep_for (C++). Wall-clock reads without abstraction: DateTime.Now/UtcNow, datetime.now()/datetime.utcnow(), Date.now() / new Date(), System.currentTimeMillis(), time.Now(), Time.now, Instant::now(), Date()/Date.now, Get-Date, std::chrono::system_clock::now. Unseeded randomness: new Random(), random.random()/random.randint(), Math.random(), new Random() (Java/Kotlin), rand.Int() without seed, rand (Ruby), rand::random() (Rust). Environment-dependent paths (hard-coded C:\..., /tmp/..., network hosts). |
| Test ordering dependency | Static/global mutable state modified across tests; setup that doesn't fully reset state ([TestInitialize], setUp, beforeEach, before(:each), BeforeEach, t.Cleanup); tests that fail when run individually but pass in suite (or vice versa). Examples per language: static fields (.NET/Java), module-level globals (Python), top-level let/const in test file (JS/TS), var package globals (Go), class variables (Ruby), static mut/lazy_static!/OnceCell (Rust), $script: variables (PowerShell). |
| Over-mocking | More mock setup lines than actual test logic. Verifying exact call sequences on mocks rather than outcomes. Mocking types the test owns. Per language: Moq/NSubstitute/FakeItEasy (.NET), unittest.mock / pytest-mock (Python), Jest auto-mocks / Sinon (JS/TS), Mockito/PowerMock (Java), gomock/testify mock (Go), RSpec mocks/mocha (Ruby), mockall (Rust), MockK (Kotlin), Mock cmdlet (Pester), gmock (C++). For a deep mock audit in .NET, use exp-mock-usage-analysis. |
| Implementation coupling | Testing private methods via reflection (MethodInfo.Invoke, getattr in Python, (thing as any) in TS, Field.setAccessible(true) in Java, Object#send in Ruby, internal pub(crate) access in Rust). Asserting on internal state instead of observable behavior. Verifying exact method call counts on collaborators instead of business outcomes. |
| Broad exception assertions | Assert.ThrowsException<Exception>(...) (.NET) / pytest.raises(Exception) / expect(fn).toThrow(Error) without a message matcher / assertThrows(Exception.class, ...) (Java) / assert.Error(t, err) without checking the kind / expect { ... }.to raise_error without class (RSpec) / #[should_panic] without expected = "..." / Should -Throw without -ExpectedMessage / EXPECT_ANY_THROW instead of EXPECT_THROW(stmt, SpecificType). |
Medium -- Maintainability and clarity issues
| Anti-Pattern | What to Look For |
|---|
| Poor naming | Test names like Test1, TestMethod, test, names that don't describe the scenario or expected outcome. Good naming differs by language convention — see the loaded language extension file (e.g., Add_NegativeNumber_ThrowsArgumentException for .NET, test_add_negative_number_raises_value_error for pytest, addNegativeNumber_throwsArgumentException for Java, 'adds negative number throws' for Jest descriptions, TestAdd_NegativeNumber_ReturnsError for Go). |
| Magic values | Unexplained numbers or strings in arrange/assert: Assert.AreEqual(42, result) / assert result == 42 / expect(result).toBe(42) -- what does 42 mean? |
| Duplicate tests | Three or more test methods with near-identical bodies that differ only in a single input value. Should be parametrized: [DataRow]/[Theory]/[TestCase] (.NET), @pytest.mark.parametrize (pytest), test.each / it.each (Jest/Vitest), @ParameterizedTest + @ValueSource (JUnit 5), @DataProvider (TestNG), Go table-driven tests, where / shared examples (RSpec), #[rstest] (Rust), @ParameterizedTest + @MethodSource (Kotlin), -ForEach / -TestCases (Pester), INSTANTIATE_TEST_SUITE_P (GoogleTest), SECTION / GENERATE (Catch2), TEST_CASE_TEMPLATE (doctest). For a detailed duplication analysis in .NET, use exp-test-maintainability. Note: Two tests covering distinct boundary conditions (e.g., zero vs. negative) are NOT duplicates -- separate tests for different edge cases provide clearer failure diagnostics and are a valid practice. |
| Giant tests | Test methods exceeding ~30 lines or testing multiple behaviors at once. Hard to diagnose when they fail. |
| Assertion messages that repeat the assertion | Assert.AreEqual(expected, actual, "Expected and actual are not equal") / assert x == y, "x is not equal to y" / assertEquals(x, y, "values not equal") add no information. Messages should describe the business meaning. |
| Missing AAA / Given-When-Then separation | Arrange/Act/Assert (or Given/When/Then for BDD frameworks like RSpec, Kotest behavior specs, Pester) phases are interleaved or indistinguishable. |
Low -- Style and hygiene
| Anti-Pattern | What to Look For |
|---|
| Unused test infrastructure | Setup/teardown hooks that do nothing — [TestInitialize]/[SetUp]/[BeforeEach], setUp/@BeforeEach/@BeforeAll, beforeEach/beforeAll, before(:each)/before(:all), BeforeEach/BeforeAll (Pester), setUpWithError (XCTest) — and test helper methods that are never called. |
| Unmanaged resources | Test creates disposable/closeable resources without cleanup: HttpClient/Stream without using (.NET), file/connection without with block or try/finally (Python), FileInputStream without try-with-resources (Java), defer file.Close() missing (Go), connection without ensure (Ruby), Drop not relied on / forgotten close (Rust), missing teardown for temp files / DBs in any language. |
| Print debugging | Leftover Console.WriteLine / Debug.WriteLine / print() / console.log / System.out.println / fmt.Println / puts / dbg! / Write-Host / std::cout statements used during test development. |
| Inconsistent naming convention | Mix of naming styles in the same test class/module/file (e.g., some use Method_Scenario_Expected, others use ShouldDoSomething). |
Step 4: Calibrate severity honestly
Before reporting, re-check each finding against these severity rules:
- Critical/High: Only for issues that cause tests to give false confidence or be unreliable. A test that always passes regardless of correctness is Critical. Flaky shared state is High. Missing-await on async assertions is Critical (silent pass).
- Medium: Only for issues that actively harm maintainability -- 5+ nearly-identical tests, truly meaningless names like
Test1 / test / it1.
- Low: Cosmetic naming mismatches, minor style preferences, assertion messages that could be better. When in doubt, rate Low.
- Not an issue (per-language nuance):
- Go and Rust table-driven loops with sub-tests (
t.Run / for case in cases { ... }) are idiomatic, not "Conditional Test Logic". Do NOT flag.
- pytest bare
assert is the canonical assertion form, not a missing assertion library. Do NOT flag.
- Go tests use
if got != want { t.Errorf(...) } as canonical equality. Do NOT flag as ad-hoc.
- Separate tests for distinct boundary conditions (zero vs. negative vs. null). Do NOT flag as duplicates.
- Explicit per-test setup instead of
[TestInitialize] / beforeEach (this improves isolation).
- Tests that are short and clear but could theoretically be consolidated.
IMPORTANT: If the tests are well-written, say so clearly up front. Do not inflate severity to justify the review. A review that finds zero Critical/High issues and only minor Low suggestions is a valid and valuable outcome. Lead with what the tests do well.
Step 5: Report findings
Present findings in this structure:
- Summary -- Total issues found, broken down by severity (Critical / High / Medium / Low). If tests are well-written, lead with that assessment.
- Critical and High findings -- List each with:
- The anti-pattern name
- The specific location (file, method name, line)
- A brief explanation of why it's a problem
- A concrete fix (show before/after code when helpful)
- Medium and Low findings -- Summarize in a table unless the user wants full detail
- Positive observations -- Call out things the tests do well (sealed class, specific exception types, data-driven tests, clear AAA structure, proper use of fakes, good naming). Don't only report negatives.
Step 6: Prioritize recommendations
If there are many findings, recommend which to fix first:
- Critical -- Fix immediately, these tests may be giving false confidence
- High -- Fix soon, these cause flakiness or maintenance burden
- Medium/Low -- Fix opportunistically during related edits
Validation
Common Pitfalls
| Pitfall | Solution |
|---|
| Reporting style issues as critical | Naming and formatting are Medium/Low, never Critical |
| Suggesting rewrites instead of targeted fixes | Show minimal diffs -- change the assertion, not the whole test |
| Flagging intentional design choices | If Thread.Sleep / time.sleep / time.Sleep is in an integration test testing actual timing, that's not an anti-pattern. Consider context. |
| Inventing false positives on clean code | If tests follow best practices, say so. A review finding "0 Critical, 0 High, 1 Low" is perfectly valid. Don't inflate findings to justify the review. |
| Flagging separate boundary tests as duplicates | Two tests for zero and negative inputs test different edge cases. Only flag as duplicates when 3+ tests have truly identical bodies differing by a single value. |
| Rating cosmetic issues as Medium | Naming mismatches (e.g., method name says ArgumentException but asserts ArgumentOutOfRangeException) are Low, not Medium -- the test still works correctly. |
| Ignoring the test framework | Use correct terminology per the loaded language extension: xUnit [Fact]/[Theory], NUnit [Test]/[TestCase], MSTest [TestMethod]/[DataRow], pytest def test_* / @pytest.mark.parametrize, Jest it.each / describe, JUnit @Test / @ParameterizedTest, Go func TestXxx(t *testing.T) + table-driven, RSpec describe/it, Pester Describe/It, Rust #[test] / #[rstest], Catch2 TEST_CASE/SECTION. |
| Treating idiomatic patterns as smells | Go/Rust table-driven loops are idiomatic. Pytest bare assert is canonical. Go's if got != want { t.Errorf(...) } is canonical. JS/TS expect(mock).toHaveBeenCalledWith(...) is a real assertion, not an over-mock. Do NOT flag these. |
| Missing async-test pitfalls | A Jest test that calls expect(promise).resolves.toBe(x) without returning/awaiting the promise silently passes; a TUnit/xUnit async Task test calling Assert.ThrowsAsync without await silently passes; pytest-asyncio tests with un-awaited coroutines silently pass. Always flag as Critical. |
| Missing the forest for the trees | If 80% of tests have no assertions, lead with that systemic issue rather than listing every instance |