// Performs multi-agent, multi-model code review of F# compiler PRs across 19 dimensions including type checking, IL emission, binary compatibility, and IDE performance. Dispatches parallel assessment agents per dimension, consolidates with cross-model agreement scoring, and filters false positives. Invoke when reviewing compiler changes, requesting expert feedback, or performing pre-merge quality checks.
Performs multi-agent, multi-model code review of F# compiler PRs across 19 dimensions including type checking, IL emission, binary compatibility, and IDE performance. Dispatches parallel assessment agents per dimension, consolidates with cross-model agreement scoring, and filters false positives. Invoke when reviewing compiler changes, requesting expert feedback, or performing pre-merge quality checks.
Reviewing Compiler PRs
Full dimension definitions and CHECK rules live in the expert-reviewer agent.
When to Invoke
PR touches src/Compiler/ — invoke the expert-reviewer agent
PR touches src/FSharp.Core/ — focus on FSharp.Core Stability, API Surface, Backward Compat, XML Docs
PR touches vsintegration/ or LanguageServer/ — focus on IDE Responsiveness, Concurrency, Memory
FSharp.Core Stability, Backward Compat, XML Docs, RFC Process
vsintegration/
IDE Responsiveness, Memory Footprint, Cross-Platform
eng/, setup/, build scripts
Build Infrastructure, Cross-Platform
Subagent Dispatch
For each selected dimension from the table above, the expert-reviewer agent MUST launch an independent subagent (background task) to assess that dimension. This is not optional — a single agent doing all dimensions sequentially produces shallow analysis and wall-of-text summaries.
Each subagent receives:
The dimension's CHECK rules (from expert-reviewer.md)
The relevant file diffs (filtered by the dimension's hotspot paths)
Instructions to produce a structured finding: {file, line, severity, dimension, issue, suggestion} or LGTM if no findings
The expert-reviewer consolidates subagent results, deduplicates, applies assessment gates, and posts as inline comments per Wave 5.
Assessment gates — apply before flagging:
Understand execution context before judging (test harness ≠ compiler runtime)
Classify as regression, improvement, or unclear — only regressions are findings
Require a concrete failing scenario — no hypotheticals
"Correct convention" for the context in use → discard, not a finding
"Unexplained" ≠ "wrong" — missing rationale in a commit message is a doc gap, not a defect