con un clic
hypothesis-driven-debugging
// Investigate compiler failures, test errors, or unexpected behavior through systematic minimal reproduction, 3-hypothesis testing, and verification. Always re-run builds and tests after changes.
// Investigate compiler failures, test errors, or unexpected behavior through systematic minimal reproduction, 3-hypothesis testing, and verification. Always re-run builds and tests after changes.
Performs multi-agent, multi-model code review of F# compiler PRs across 19 dimensions including type checking, IL emission, binary compatibility, and IDE performance. Dispatches parallel assessment agents per dimension, consolidates with cross-model agreement scoring, and filters false positives. Invoke when reviewing compiler changes, requesting expert feedback, or performing pre-merge quality checks.
Always invoke after editing .fs files. Provides fast parse/typecheck feedback without a full dotnet build. Prefer this over dotnet build for iterative changes. Also finds symbol references and inferred type hints.
Detect flaky tests by scanning recent AzDo CI builds for test failures recurring across multiple unrelated PRs. Use when investigating intermittent failures, CI instability, deciding which tests to quarantine, or checking if RunTestCasesInSequence no-ops are causing parallel-safety issues.
Fix F# debugging issues (breakpoints, .pdb, sequence points). Build, run VS integration tests, inspect IL/PDB.
Retrieve and analyze Azure DevOps build failures for GitHub PRs. Use when CI fails. CRITICAL: Collect ALL errors from ALL platforms FIRST, write hypotheses to file, then fix systematically.
Fix ILVerify baseline failures when IL shape changes (codegen, new types, method signatures). Use when CI fails on ILVerify job.
| name | hypothesis-driven-debugging |
| description | Investigate compiler failures, test errors, or unexpected behavior through systematic minimal reproduction, 3-hypothesis testing, and verification. Always re-run builds and tests after changes. |
A systematic, rigorous approach to debugging failures in the F# compiler codebase.
Use this skill when:
Before forming hypotheses, create the smallest possible reproduction:
Extract the failure:
# For test failures - run just the failing test
dotnet test -- --filter-method "*YourTest*"
# For build failures - try to isolate the problematic file
# Create a minimal .fs file that reproduces the issue
Reduce to essentials:
Document the repro:
## Minimal Reproduction
File: test-case.fs
Command: dotnet test -- --filter-method "*TestName*"
Expected: <expected behavior>
Actual: <actual behavior>
Always form at least 3 competing hypotheses about the root cause:
## Hypothesis 1: [Brief description]
**Theory**: The failure occurs because...
**How to verify**: Run/change X and observe Y
**Verification result**: [To be filled]
**Implications**: If true, this means...
## Hypothesis 2: [Brief description]
**Theory**: The failure occurs because...
**How to verify**: Add instrumentation/logging at point Z
**Verification result**: [To be filled]
**Implications**: If true, this means...
## Hypothesis 3: [Brief description]
**Theory**: The failure occurs because...
**How to verify**: Check assumption A by running test B
**Verification result**: [To be filled]
**Implications**: If true, this means...
For each hypothesis, use one or more verification methods:
// Add temporary debugging output
printfn "DEBUG: Value at checkpoint: %A" someValue
printfn "DEBUG: Entering function X with args: %A %A" arg1 arg2
// Create focused test to verify specific behavior
[<Test>]
let ``Hypothesis 1 verification test`` () =
let result = functionUnderTest input
result |> should equal expectedValue
# Try different configurations
./build.sh -c Debug
./build.sh -c Release
# Compare outputs
diff debug-output.log release-output.log
# Enable verbose logging for specific component
export FSHARP_COMPILER_VERBOSE=1
dotnet build
Maintain a HYPOTHESIS.md file in the working directory:
# Hypothesis Investigation
## Issue Summary
Brief description of the failure/bug being investigated.
## Minimal Reproduction
[Code/commands to reproduce]
## Hypotheses
### Hypothesis 1: Token position tracking issue
**Theory**: The warning check compares line numbers but lastNonCommentTokenLine is not being updated correctly.
**How to verify**: Add printfn debugging in LexFilter.fs to log every token and its line number.
**Verification result**: ✅ CONFIRMED - Logging showed LBRACE tokens were updating the tracking when they shouldn't.
**Implications**: Need to exclude LBRACE and potentially other structural tokens from tracking.
### Hypothesis 2: Lexer pattern matching order
**Theory**: The /// pattern might be matched after other patterns, losing context.
**How to verify**: Check lex.fsl pattern order and add logging in the /// rule.
**Verification result**: ❌ DENIED - Pattern order is correct; /// is matched specifically.
**Implications**: Issue is not in the lexer pattern matching.
### Hypothesis 3: Test expectations wrong
**Theory**: The test expectations might not match actual compiler behavior.
**How to verify**: Manually compile test code and check actual warning positions.
**Verification result**: ⚠️ PARTIAL - Some tests had wrong expectations, but underlying issue still exists.
**Implications**: Fixed test expectations, but still need to address token tracking.
## Resolution
[Final solution and verification]
## Lessons Learned
- What worked well
- What to do differently next time
- Patterns to remember
ABSOLUTELY REQUIRED: After implementing any fix:
Build from scratch:
./build.sh -c Release
# Record: Time, exit code, number of errors
Run affected tests:
# For targeted testing
dotnet test -- --filter-class "*AffectedTestSuite*"
# Record: Passed, Failed, Skipped, Time
Verify the fix:
Document results:
## Verification Results
Build:
- Command: ./build.sh -c Release
- Time: 4m 23s
- Errors: 0
Tests:
- Command: dotnet test -- --filter-class "*XmlDocTests*"
- Total: 61
- Passed: 56
- Failed: 0
- Skipped: 5
- Time: 2.1s
Minimal Repro:
- Status: ✅ PASSING
# 1. Observe failure
dotnet test -- --filter-class "*XmlDocTests*"
# Result: 15 tests failing
# 2. Create minimal repro
cat > test-case.fs <<EOF
type R = { /// field doc
Field: int
}
EOF
dotnet fsc test-case.fs
# Observe: Warning FS3879 incorrectly triggered
# 3. Form hypotheses (in HYPOTHESIS.md)
# - H1: LBRACE token incorrectly tracked
# - H2: Lexer pattern issue
# - H3: Test expectations wrong
# 4. Verify H1
# Add: printfn "DEBUG: Token %A at line %d" token lineNum
./build.sh -c Release && dotnet test ...
# Result: Confirms LBRACE is being tracked
# 5. Implement fix
# Exclude LBRACE from tracking in LexFilter.fs
# 6. CRITICAL: Re-run everything
./build.sh -c Release
# 4m 44.9s, 0 errors
dotnet test -- --filter-class "*XmlDocTests*"
# 61 total, 56 passed, 0 failed, 5 skipped, 2s
# 7. Verify minimal repro
dotnet fsc test-case.fs
# No warning - ✅ FIXED
# 8. Update HYPOTHESIS.md with results
# 9. Commit with evidence
❌ Don't:
✅ Do:
After using this skill:
HYPOTHESIS.mddocs/DEVGUIDE.mddocs/testing.md