| name | reviewing-test-quality |
| description | Review unit and integration test quality in the FitnessApp project. Use when the user asks to review tests, check test quality, analyze test coverage, assess testability, find test gaps, or improve test structure. Covers Swift Testing (@Test/@Suite) and FitnessTestSupport conventions. |
Reviewing Test Quality
Context
- Test framework: Swift Testing (
@Test, @Suite, #expect, Issue.record)
- Shared test utilities:
FitnessTestSupport package โ makeExercise(), MockAnalyticsStorage, StubAnalyticsStorage, waitUntil()
- Package structure: Tests live under
Packages/<PackageName>/Tests/<PackageName>Tests/
- Build system: Always use
xcodebuild with DEVELOPER_DIR and -skipMacroValidation (see build-and-test rule)
- For domain models, services, and project structure see architecture-documentation.md
Context Management
For reviews of multiple test files (3+), use a subagent/Task to perform the analysis in an isolated context. Return only the summary to the main conversation.
Review Process
- Identify scope โ which test files to review (user-specified or all changed test files via
git diff)
- Read the test files and the production code they test
- Check each category below and report findings with severity and line numbers
- Suggest fixes with concrete code snippets
- Fix, don't just mention. When reviewing your own tests, fix issues immediately.
What to Check
A: Test Style
| Pattern | Expected | Severity |
|---|
| Test function without descriptive name | @Test func <verb>_<scenario>_<expected>() or @Test("description") | Warning |
Tests not grouped in @Suite | Group related tests in @Suite struct <Subject>Tests | Warning |
XCTest patterns (XCTAssert*) in Swift Testing files | Use #expect, #require, Issue.record | Critical |
Missing @MainActor on tests touching @MainActor types | Add @MainActor to @Suite or individual @Test | Critical |
| Arrange-Act-Assert not clearly separated | Use blank lines or // Arrange / Act / Assert markers | Info |
B: DRY โ Shared Utilities
| Pattern | Expected | Severity |
|---|
Local makeExercise() factory | Use FitnessTestSupport.makeExercise() | Critical |
Local MockAnalyticsStorage / StubAnalyticsStorage | Use from FitnessTestSupport | Critical |
Local waitUntil() helper | Use FitnessTestSupport.waitUntil() | Critical |
| Duplicated mock/stub across test targets | Move to FitnessTestSupport | Warning |
Missing import FitnessTestSupport when utilities are available | Add import | Warning |
C: Assertion Strength
| Pattern | Expected | Severity |
|---|
#expect(value != nil) | let unwrapped = try #require(value) then assert on unwrapped | Warning |
#expect(array.count > 0) | #expect(array.count == <exact>) or assert specific elements | Warning |
waitUntil without Issue.record on timeout | Use shared waitUntil() which includes Issue.record | Critical |
| No negative test (only happy path) | Add test for invalid input, empty state, error case | Warning |
#expect(true) or #expect(false) as placeholder | Replace with meaningful assertion | Critical |
D: Coverage Gaps
Check the production code that the tests cover:
- List all
public / internal methods on ViewModels, Coordinators, UseCases, and Services
- For each method, check if at least one test exercises it
- Flag untested methods, prioritized by:
- Critical: State-mutating methods (e.g.
completeExercise, editMore, resetExercise)
- Warning: Query methods (e.g.
getDailyWeightProgression, totalWeight)
- Info: Convenience/delegation methods
Also check for untested edge cases:
- Empty collections (no exercises, no analytics entries)
- Boundary values (weight = 0, sets = 0, reps = 0)
- Concurrent state changes (exercise completion during edit)
E: Testability Hints (Production Code Issues)
Tests often reveal production code problems. Flag these:
| Pattern in Test | Production Code Problem | Severity |
|---|
Test uses Container.shared to set up dependencies | Production code has hard Container coupling โ consider protocol injection | Warning |
| Test cannot mock a dependency (concrete class, no protocol) | Production type needs a protocol extraction | Warning |
| Test requires complex setup (> 10 lines of Arrange) | Production type has too many responsibilities โ consider splitting | Info |
Test uses @Suite(.serialized) without clear reason | Production code may have shared mutable state โ investigate thread safety | Warning |
| Test initializes type with two different init signatures | Dual-Init pattern โ consider consolidating or documenting intent | Info |
Test uses real Task.sleep to exercise time-dependent production code | Production type lacks a clock/timer abstraction โ introduce a protocol (e.g. TimerClock) + fake for deterministic tests | Warning |
Test drives the production tick loop with a real-time sleep + tolerance (>= N) | Tick cadence is hardcoded โ extract as initializer parameter so tests can shorten it | Warning |
E.1 โ Diagnosing .serialized
A @Suite(.serialized) trait is legitimate only when production code shares mutable state across tests in that suite. Before keeping it, perform this check:
- Grep the suite for
Container.shared usage.
- Grep the suite for
UserDefaults.standard or other process-global mutables.
- Grep for file-backed state under a shared path.
If none of these are present, .serialized is legacy and can be removed. Parallel execution then gives a 3โ5ร run-time speedup within that suite at zero risk. If any are present, do NOT remove .serialized without first migrating the production code to constructor-injected dependencies (see reviewing-code-changes section 13c).
E.2 โ Callback Fidelity (Mock-Vertragsbruch)
When tests instantiate a service with closure parameters directly (instead of via the production factory/cache), the test closures must perform all side-effects the production wiring performs. Otherwise tests "pass" while the real write path is never exercised โ exactly the bug class we are looking for.
Smell
let coordinator = TrainingCoordinator(
findCategory: { _ in group },
onExerciseUpdate: { _, _ in storage.bumpVersion() },
onExerciseReset: { _, _ in }
)
let coordinator = TrainingCoordinator(
findCategory: { _ in group },
onExerciseUpdate: { exercise, category in
managementService.updateExercise(exercise, category: category)
},
onExerciseReset: { exercise, category in
managementService.resetExercise(exercise, category: category)
}
)
The test "proves" the UI reacts to bumpVersion โ not that updateExercise is ever called. The "missing write path" bug class stays invisible. Real example: MuscleCategorySelectionViewModelTests.swift.
Search Pattern
rg -n 'TrainingCoordinator\(|XStorageService\(' Packages/*/Tests --glob '*.swift' -A5 \
| rg -B1 'on\w+: \{'
rg -n 'TrainingCoordinator\(' Packages/*/Sources --glob '*.swift' -A8
If the test closure makes fewer service calls than production โ smell.
Fix Options
- Shared test fixture โ extract production wiring into a test-helper module (e.g.
TrainingCoordinatorCache(storage: testStorage, management: testManagement)) and use it instead of hand-rolled closures.
- Real coordinator-cache in tests โ when the cache is
@MainActor and has no heavy dependencies, use it directly.
- Integration via real container โ in-memory
ModelContainer + production ExerciseManagementService + TrainingCoordinatorCache.
Acceptance Criterion
Every on*-closure in test setup must either:
- make exactly the same service calls as the production counterpart, or
- include a comment
// MARK: mirrors X.swift line Y referencing the production wiring, or
- run via a shared fixture that guarantees parity.
E.3 โ State Pre-Priming (test writes what production should write)
Smell
for _ in 0..<exercise.sets { coordinator.completeSet() }
var completed = exercise
completed.isCompleted = true
mock.exercisesByCategory[.arms] = [completed]
coordinator.finishExercise()
#expect(cardVM.exercise.isCompleted)
The test pre-primes the expected end state before the call under test runs. Production may be entirely broken and the test still passes.
Real example (caught in this codebase)
Packages/FitnessExercise/Tests/FitnessExerciseTests/MuscleCategoryViewModelTests.swift:232โ237:
completed.isCompleted = true
storage.savedExercises[.arms] = [completed]
vm.refreshExercises()
#expect(vm.exercises.first?.isCompleted == true)
What the test claims to verify: "after refreshExercises(), the VM reflects completion."
What it actually verifies: "refreshExercises() reads from savedExercises" โ which was hand-set one line above. If the production path that should have written [completed] to storage (e.g. FinishExerciseUseCase) is broken, this test stays green. This is one of the test-shapes that let Bug 2 (category progress not refreshing) ship.
The fix below shows the corrected shape for this exact case.
Search Pattern
rg -n 'exercisesByCategory\[\..+\]\s*=' Packages/*/Tests --glob '*.swift' -B2 -A5 \
| rg -B5 'finishExercise\(\)|completeSet\(\)'
Any sequence mock.X = expectedEndState directly before coordinator.action() is suspect.
Fix
Tests must not "help" the domain path. Real setup:
mock.exercisesByCategory[.arms] = [activeExercise]
coordinator.finishExercise()
#expect(mock.updateExerciseCalls.contains { $0.exercise.id == activeExercise.id })
#expect(mock.updateExerciseCalls.last?.exercise.isCompleted == true)
F: Test Maintenance
| Pattern | Expected | Severity |
|---|
Container.shared.reset() not called in init/deinit | Add to init() of test suite to ensure isolation | Critical |
| Tests depend on execution order | Use .serialized trait explicitly or make tests independent | Critical |
| Magic numbers in assertions without context | Use named constants or comment explaining expected value | Warning |
| Test file > 300 lines without suite splitting | Split into focused @Suite structs | Info |
G: Performance Smells
Slow tests are not just an inconvenience โ they are a signal. Before accepting a slow test or "just making CI faster", diagnose the root cause using this matrix.
G.1 โ The three legitimate reasons for Task.sleep in a test
| Purpose | Example | Verdict |
|---|
| Negative assertion โ "after N ms, state has NOT changed" | sleep(200ms); #expect(!isCompleted) | Keep. The sleep IS the test. |
Observation-setup race โ @Observable subscription activates only after first read | Subscribe / sleep / mutate / wait | Keep with comment. Document why the sleep cannot be replaced. Ideally fix the observation-setup in the VM. |
| Waiting for async propagation after an event | event(); sleep(100ms); #expect(state) | Remove. Use waitUntil { state == expected } instead. |
Redundant sleep after waitUntil is always removable:
coordinator.startTraining(for: ex)
try await waitUntil { coordinator.activeSessions[ex.id] != nil }
try await Task.sleep(for: .milliseconds(100))
#expect(vm.exercises.count == 2)
coordinator.startTraining(for: ex)
try await waitUntil { coordinator.activeSessions[ex.id] != nil }
#expect(vm.exercises.count == 2)
coordinator.startTraining(for: ex)
try await waitUntil { vm.exercises.count == 2 }
G.2 โ Clock abstraction for time-dependent code
When production code reads Date() or drives a loop with Task.sleep, introduce a protocol (see TimerService / TimerClock). Tests inject a FakeClock that advances synchronously. Never test a 1 s Task.sleep with a 1 s real-time wait โ that measures Foundation, not your code.
Pattern:
public protocol TimerClock: Sendable { func now() -> Date }
public struct SystemTimerClock: TimerClock {
public init() {}
public func now() -> Date { Date() }
}
public init(clock: any TimerClock = SystemTimerClock(), tickInterval: Duration = .seconds(1)) { โฆ }
let clock = FakeClock()
let sut = Service(clock: clock, tickInterval: .milliseconds(5))
clock.advance(by: 1)
try await waitUntil { sut.publishedValue == 1 }
This gives deterministic coverage of the same production path, orders of magnitude faster than real-time sleeps.
G.3 โ Parallelisation diagnostic
Swift Testing parallelises @Test functions by default unless suppressed by .serialized. A healthy suite has a parallelisation ratio (sum of per-test durations รท wall-clock) of 20โ40ร on Apple Silicon. Compute it from the package log:
sum=$(grep -E "^โ Test .* passed after [0-9.]+ seconds" LOG | \
sed -E 's/.*after ([0-9.]+) seconds.*/\1/' | awk '{s+=$1} END {print s}')
wall=$(grep -E "^โ Test run with" LOG | sed -E 's/.*after ([0-9.]+) seconds.*/\1/')
echo "ratio=$(echo "$sum / $wall" | bc -l)"
Ratio interpretation:
- >20ร โ healthy, nothing to do.
- 5โ15ร โ suppressed parallelism somewhere. Look for
.serialized, shared ModelContainer, or process-global state.
- <5ร โ either too few tests for meaningful signal, or a single slow test dominates (look at the top-5 slowest tests in the log).
H: Snapshot Tests
Snapshot tests guard the visual contract of public Views in FitnessUI and FitnessPersistenceUI. They live in:
Packages/FitnessUI/Tests/FitnessUITests/SnapshotTests.swift
Packages/FitnessPersistenceUI/Tests/FitnessPersistenceUITests/IdleCardSnapshotTests.swift
Recorded baselines are PNGs under each package's Tests/<Package>Tests/__Snapshots__/<TestFileName>/.
H.1 โ Conventions
@Suite("<ViewName> โ Snapshots", .tags(.snapshot))
@MainActor
struct <ViewName>SnapshotTests {
@Test func <variant>() {
let view = <ViewName>()
assertSnapshot(of: view, named: "<file-friendly-variant>", size: CGSize(width: 100, height: 100))
}
}
- Suite name pattern:
"<ViewName> โ Snapshots" (em-dash separator). Tag every snapshot suite with .tags(.snapshot) so they can be filtered/skipped as a group.
@MainActor on the suite, not the test โ Views must be constructed on the main actor.
named: argument is the file-name fragment for the PNG. Use kebab-case to match the convention seen in __Snapshots__/SnapshotTests/*.png (e.g. idle-styled-reset, with-weight, expanded).
size: โ 100ร100 for small leaf components (icons, buttons, chips), CGSize(width: 360, height: <natural>) for cards or full-width components. Avoid arbitrary frames that hide overflow.
- One
@Test per meaningful visual variant, not per prop combination. Three to five variants (default, edge case, interaction state) is healthy; ten variants of the same View is a smell.
H.2 โ Where snapshot tests are required vs optional
| View kind | Snapshot required? |
|---|
New public struct โฆ: View in FitnessUI/Sources/ | Yes โ at least one default-state snapshot. |
New public struct โฆ: View in FitnessPersistenceUI/Sources/ | Yes โ typically a full-card snapshot via the existing IdleCardSnapshotTests.swift patterns. |
| Thin wrapper (1:1 delegation to a covered View) | Optional. Note exception in the diff. |
internal / fileprivate Views | Optional โ covered transitively by the public View that embeds them. |
| Pure non-View types (modifiers, helpers, data) | No. |
H.3 โ When to re-record a baseline
Re-record when and only when the visual change is intentional:
- new color/gradient/font token
- changed layout proportions (sizes, paddings, offsets)
- added/removed visual layer (halo, ring, edge indicator, gradient)
- corner radius / opacity / shadow change
Do not re-record to "make a failing test pass" without auditing the diff. A snapshot failure on a refactor that should be visually equivalent is a real regression โ investigate before re-recording.
Re-record flow:
DEVELOPER_DIR=โฆ RECORD_SNAPSHOTS=1 xcodebuild test -scheme FitnessUI -destination 'โฆ' โฆ
git add Packages/FitnessUI/Tests/FitnessUITests/__Snapshots__/
Or, for a single suite, temporarily flip record: true on the failing assertSnapshot(...) call, run, commit the new png, revert the record: change.
H.4 โ Smells
| Smell | Why it hurts | Fix |
|---|
| Snapshot test exists but no PNG checked in | First run records on the developer's machine and fails CI. | Run with RECORD_SNAPSHOTS=1 once; commit the baseline PNG alongside the test. |
@Test per prop combination (12+ tests for one View) | Every visual tweak forces re-recording dozens of baselines. | Collapse to 3โ5 meaningful variants; let unit tests cover prop logic. |
| Re-recorded baseline in a "no-visual-change" refactor PR | The refactor was not pixel-equivalent โ a real regression hidden as a re-record. | Stop, diff the old/new PNG visually (git diff -- '**/*.png' or open both), explain in PR description, or revert. |
| Snapshot of a View that owns dynamic data (date, random, animated) | Flaky baseline. | Inject the dynamic input as a parameter; pass a fixed value in the snapshot. |
Missing @MainActor โ compiler / runtime error | n/a | Add @MainActor on the struct. |
Report Format
Group findings by category. For each finding:
**[Category:Severity]** `TestFile.swift:LINE`
Found: <what the test does wrong>
Fix: <specific replacement with code snippet>
End with a summary:
## Summary
- A (Test Style): N findings (N critical, N warning, N info)
- B (DRY): N findings
- C (Assertion Strength): N findings
- D (Coverage Gaps): N untested methods
- E (Testability Hints): N production code issues
- F (Test Maintenance): N findings
- G (Performance Smells): N findings (parallelism ratio per package)
Overall: PASS / NEEDS WORK / CRITICAL ISSUES
Running Tests After Review
After fixing test issues, run the affected package tests to verify:
cd ~/Documents/repo/FitnessApp/Packages/<Package> && \
DEVELOPER_DIR=/Users/jose.nunez/Downloads/Xcode.app/Contents/Developer \
PATH="/Users/jose.nunez/Downloads/Xcode.app/Contents/Developer/usr/bin:$PATH" \
xcodebuild test -scheme <Package> -destination 'platform=iOS Simulator,name=iPhone 17 Pro Max' \
-skipMacroValidation 2>&1 | tail -30
Write a test stamp to .claude/hooks/state/test-execution.stamp.md after successful execution.