| name | hypothesis-driven-debugging |
| description | Investigate compiler failures, test errors, or unexpected behavior through systematic minimal reproduction, 3-hypothesis testing, and verification. Always re-run builds and tests after changes. |
Hypothesis-Driven Debugging
A systematic, rigorous approach to debugging failures in the F# compiler codebase.
When to Use This Skill
Use this skill when:
- Investigating test failures (unit tests, integration tests, end-to-end tests)
- Debugging build errors or compilation failures
- Analyzing unexpected runtime behavior
- Troubleshooting performance regressions
- Examining warning/error message issues
Related: for a build / compile / restore failure, run the binlog-analysis skill first — it fetches the build's MSBuild binary log and analyzes it live via the binlog-mcp MCP (structured errors + root-cause diagnosis), a fast way to scope the minimal reproduction below.
Core Principles
- Always start with a minimal reproduction
- Form multiple competing hypotheses
- Design verification for each hypothesis
- Document findings rigorously
- Re-run builds and tests after every change
Process
Step 1: Create Minimal Reproduction
Before forming hypotheses, create the smallest possible reproduction:
-
Extract the failure:
dotnet test -- --filter-method "*YourTest*"
-
Reduce to essentials:
- Remove unrelated code
- Simplify to the core issue
- Verify the minimal case still fails
-
Document the repro:
## Minimal Reproduction
File: test-case.fs
Command: dotnet test -- --filter-method "*TestName*"
Expected: <expected behavior>
Actual: <actual behavior>
Step 2: Form 3 Hypotheses
Always form at least 3 competing hypotheses about the root cause:
## Hypothesis 1: [Brief description]
**Theory**: The failure occurs because...
**How to verify**: Run/change X and observe Y
**Verification result**: [To be filled]
**Implications**: If true, this means...
## Hypothesis 2: [Brief description]
**Theory**: The failure occurs because...
**How to verify**: Add instrumentation/logging at point Z
**Verification result**: [To be filled]
**Implications**: If true, this means...
## Hypothesis 3: [Brief description]
**Theory**: The failure occurs because...
**How to verify**: Check assumption A by running test B
**Verification result**: [To be filled]
**Implications**: If true, this means...
Step 3: Verification Methods
For each hypothesis, use one or more verification methods:
Code Instrumentation
// Add temporary debugging output
printfn "DEBUG: Value at checkpoint: %A" someValue
printfn "DEBUG: Entering function X with args: %A %A" arg1 arg2
Minimal Test Cases
// Create focused test to verify specific behavior
[<Test>]
let ``Hypothesis 1 verification test`` () =
let result = functionUnderTest input
result |> should equal expectedValue
Build with Different Flags
./build.sh -c Debug
./build.sh -c Release
diff debug-output.log release-output.log
Targeted Logging
export FSHARP_COMPILER_VERBOSE=1
dotnet build
Step 4: Document Findings
Maintain a HYPOTHESIS.md file in the working directory:
# Hypothesis Investigation
## Issue Summary
Brief description of the failure/bug being investigated.
## Minimal Reproduction
[Code/commands to reproduce]
## Hypotheses
### Hypothesis 1: Token position tracking issue
**Theory**: The warning check compares line numbers but lastNonCommentTokenLine is not being updated correctly.
**How to verify**: Add printfn debugging in LexFilter.fs to log every token and its line number.
**Verification result**: ✅ CONFIRMED - Logging showed LBRACE tokens were updating the tracking when they shouldn't.
**Implications**: Need to exclude LBRACE and potentially other structural tokens from tracking.
### Hypothesis 2: Lexer pattern matching order
**Theory**: The /// pattern might be matched after other patterns, losing context.
**How to verify**: Check lex.fsl pattern order and add logging in the /// rule.
**Verification result**: ❌ DENIED - Pattern order is correct; /// is matched specifically.
**Implications**: Issue is not in the lexer pattern matching.
### Hypothesis 3: Test expectations wrong
**Theory**: The test expectations might not match actual compiler behavior.
**How to verify**: Manually compile test code and check actual warning positions.
**Verification result**: ⚠️ PARTIAL - Some tests had wrong expectations, but underlying issue still exists.
**Implications**: Fixed test expectations, but still need to address token tracking.
## Resolution
[Final solution and verification]
## Lessons Learned
- What worked well
- What to do differently next time
- Patterns to remember
Step 5: Critical - Always Re-run Tests
ABSOLUTELY REQUIRED: After implementing any fix:
-
Build from scratch:
./build.sh -c Release
-
Run affected tests:
dotnet test -- --filter-class "*AffectedTestSuite*"
-
Verify the fix:
- Run the minimal reproduction - confirm it passes
- Run related tests - confirm no regressions
- Build the full project - confirm no new errors
-
Document results:
## Verification Results
Build:
- Command: ./build.sh -c Release
- Time: 4m 23s
- Errors: 0
Tests:
- Command: dotnet test -- --filter-class "*XmlDocTests*"
- Total: 61
- Passed: 56
- Failed: 0
- Skipped: 5
- Time: 2.1s
Minimal Repro:
- Status: ✅ PASSING
Example Workflow
dotnet test -- --filter-class "*XmlDocTests*"
cat > test-case.fs <<EOF
type R = { /// field doc
Field: int
}
EOF
dotnet fsc test-case.fs
./build.sh -c Release && dotnet test ...
./build.sh -c Release
dotnet test -- --filter-class "*XmlDocTests*"
dotnet fsc test-case.fs
Anti-Patterns to Avoid
❌ Don't:
- Skip the minimal reproduction
- Form only one hypothesis
- Make changes without verification
- Forget to re-run tests after fixes
- Claim "fixed" without build evidence
✅ Do:
- Start with smallest possible repro
- Consider multiple explanations
- Verify each hypothesis systematically
- Always re-run build and tests
- Document commands, timings, and results
Integration with Development Workflow
After using this skill:
- Clean up temporary debugging code
- Remove or archive
HYPOTHESIS.md
- Update documentation with lessons learned
- Add regression tests if appropriate
- Consider whether findings reveal deeper issues
References