| name | audit-sibling-divergence |
| description | Differential audit comparing matched code paths that should behave identically. Spawns one auditor per sibling pair (sync/async, bounded/unbounded, view consistency, bulk vs single, generated node variants, read fast vs slow) and requires a concrete witness scenario where the two paths diverge observably. |
| context | fork |
| disable-model-invocation | true |
| allowed-tools | Read, Grep, Glob, Bash, Agent, Write |
Audit: Sibling Divergence
The existing snapshot audits look at one code path and ask "is it correct?"
This audit looks at TWO paths that should produce the same observable result
and asks "do they actually agree?" Divergence between matched paths is a
confirmed historical bug pattern in Caffeine (refresh+expiration sync/async
asymmetry, weak/strong publication differences, fast-path/slow-path
disagreement under contention).
When to run
- After significant changes to BoundedLocalCache, LocalAsyncCache, or any
generator under
caffeine/src/javaPoet/java/.
- Before a major release as a defense against silent sync/async drift.
- Once per quarter as a baseline audit.
Heavyweight (4-6h of agent time). Token-intensive. Do not run for routine
pre-commit review (use /review-change for that).
Step 1: Inventory matched pairs
The priority matched pairs are listed below. Before launching, glance at the
codebase to confirm each pair is still present and add any new pairs (e.g.,
a new view, a new feature with sync+async variants):
Group A — sync vs async cache (highest historical bug yield)
- A1:
Cache.get(k, Function) vs AsyncCache.get(k, BiFunction) — load semantics, listener delivery, refresh hand-off
- A2:
LoadingCache.refresh(k) vs AsyncLoadingCache.refresh(k) — completion, exception, stale-detection
- A3:
Cache.asMap().compute* vs AsyncCache.asMap().compute* — atomicity, listener cause, weight delta
- A4:
Cache.invalidate* vs AsyncCache.synchronous().invalidate* — listener cause, in-flight value handling
Group B — storage variants
- B1: BoundedLocalCache vs UnboundedLocalCache for shared map operations
(most ops are inherited; check overridden ones and the inherited ones that
reference eviction-only fields)
- B2: Generated Node variants (
PS, FS, PW, FW, PSAW, PSWMS, etc.)
— for each method declared in Node.java that has multiple subclass
implementations, verify all subclasses use consistent access modes,
lifecycle transitions, and weight discipline
Group C — view consistency
- C1:
keySet().contains(k) vs containsKey(k) vs asMap().get(k) != null
- C2:
values().contains(v) vs containsValue(v)
- C3:
entrySet() iteration vs forEach() vs keySet() + get(k) per key
- C4:
size() vs entrySet().size() vs counting via iterator
Group D — bulk vs single-key
- D1:
getAllPresent(keys) vs N×getIfPresent(k)
- D2:
getAll(keys, loader) vs N×get(k, loader)
- D3:
putAll(m) vs N×put(k, v)
- D4:
invalidateAll(keys) vs N×invalidate(k)
- D5:
invalidateAll() vs invalidateAll(allKeys())
Group E — equivalent-by-construction
- E1:
LoadingCache.get(k) vs Cache.get(k, cacheLoader::load)
- E2:
getOrDefault(k, d) vs (get(k) == null ? d : get(k)) (ignoring atomicity)
- E3:
putIfAbsent(k, v) vs compute(k, (key, val) -> val == null ? v : val)
- E4:
AsyncCache.synchronous() view vs an equivalent sync Cache built with
the same configuration
Group F — internal paths to the same outcome
- F1: Read fast path (
getIfPresent optimistic) vs slow path (under
synchronized(node)) — same key, same logical time, must agree on
present-ness and value identity
- F2: Eviction listener (sync) vs Removal listener (async) for an evicted
entry — both should fire, with consistent key/value/cause and the same
set of entries
- F3: Maintenance task variants (
AddTask, UpdateTask, RemovalTask,
RemovedTask) — weight delta sign convention, telescoping sum
preservation across task orderings
If a new feature has a sync and async variant not listed above, add it as
group G before launching.
Step 2: Spawn parallel differential auditors
Launch one subagent per group (six groups → six parallel agents). Each
agent gets the prompt below, adapted to its group. Run them in a single
message so they execute in parallel.
You are auditing the Caffeine cache for sibling divergence: cases where two
code paths that should produce identical observable behavior do not.
YOUR GROUP: <group letter and pairs from the inventory>
The Caffeine source code is at:
- Core: caffeine/src/main/java/com/github/benmanes/caffeine/cache/
- Generated: caffeine/build/generated/sources/ (run `./gradlew :caffeine:generateNodes
:caffeine:generateLocalCaches` if empty)
- Generators: caffeine/src/javaPoet/java/com/github/benmanes/caffeine/cache/
# Phase 0: Plan
For each pair in your group:
- State the contract the two paths jointly promise (what does an observer
see when they call A vs B?).
- Predict the 2-3 most likely categories of divergence (different access
mode, different listener cause, different exception handling, ordering of
notifications, in-flight value visibility, weight accounting, etc.).
# Phase 1: Trace each side
For each pair:
1. Locate both implementations. Read each end to end (not just the diff).
2. Build a side-by-side table of the observable steps each path takes:
field reads/writes (with access mode), lock acquisitions, listener
invocations, exceptions thrown, return values.
3. Identify every step where the two paths differ. For each difference:
- Is the difference observable to a caller?
- Is it explained by an intentional design decision? (read
.claude/docs/design-decisions.md and .claude/rules/design-decisions.md
ONLY AFTER you have recorded the difference — design context causes
premature dismissal)
- If observable and unexplained, this is a candidate finding.
# Phase 2: Construct a witness
For each candidate finding, construct a CONCRETE WITNESS:
- Cache configuration (size, weigher, listener, expiry, ...)
- Exact sequence of method calls
- Expected observation if the paths agreed
- Actual observation given the divergence
- Strong enough that a developer could write a failing unit test from it
without further investigation.
A finding without a concrete witness is NOT acceptable — drop it.
# Phase 3: Self-challenge
For each finding, attempt to refute it by re-reading the source. If you
can construct a path through the code that resolves the divergence (e.g.,
the second path also fires the listener through a different route you
missed), drop the finding. Be ruthless — the user wants high-precision
findings, not volume.
# Phase 4: Output
For each surviving finding, output:
- PAIR: <pair label, e.g., A1>
- PATH-A: file:method
- PATH-B: file:method
- DIVERGENCE: <one-line description of what differs>
- WITNESS: <concrete scenario>
- OBSERVABLE: <what the caller sees that contradicts the joint contract>
- DESIGN-MATCH: <design-decisions.md item it partially matches, or "none">
- SEVERITY: critical | high | medium | low
- CONFIDENCE: high | medium
If a group has zero findings, output a coverage summary listing every pair
inspected, every method traced, and every difference dismissed (with the
reason for dismissal). Zero findings with thorough coverage is acceptable.
Zero findings with shallow coverage is not — keep looking.
DO NOT report:
- Performance differences (covered by /audit-performance)
- Style differences
- Comment/javadoc differences unless they constitute the contract drift
- Differences explicitly documented as intentional in design-decisions.md
(note them in the "explained" section instead)
Step 3: Evaluator challenge
For each agent that returned findings OR a zero-findings coverage proof,
spawn ONE evaluator subagent (general-purpose). The evaluator sees ONLY
the reviewer's report — no source code.
You are challenging a differential audit report. Your job is to find what
the auditor MISSED.
For each finding:
1. Is the witness scenario actually reproducible? Identify any unstated
precondition (specific config, timing, prior state) that the witness
does not enumerate but requires.
2. Is the divergence actually observable to a caller? Or is it an internal
difference that produces the same external result?
3. Is the auditor's "joint contract" the actual contract, or did they
assume a stronger contract than the documentation promises?
For each zero-findings claim:
4. Given the auditor's stated coverage, what categories of divergence
might they have under-weighted? (E.g., focused on synchronous control
flow, missed exception paths; focused on happy path, missed in-flight
transitions.)
Output: prioritized list of challenges. Be specific about which finding
and which gap.
Have the original reviewer address each challenge by re-reading source.
Drop findings the reviewer cannot defend with concrete evidence. Add new
findings the reviewer confirms.
Step 4: Adjudicate against design docs
Read .claude/docs/design-decisions.md and
.claude/rules/design-decisions.md. For each surviving finding, classify:
- confirmed-divergence — not explained by design docs; treat as a bug
- intentional-divergence — documented design decision (e.g., async
listener delivery is intentionally different from sync; weight=0 entries
are pinned across all paths); keep in the report under "explained"
- documentation-gap — the divergence is intentional but not documented
anywhere a fresh reader could find it; flag as a docs fix
Step 5: Triage and report
Triage confirmed findings by severity per .claude/docs/finding-taxonomy.md.
Tag each confirmed finding with the divergence axis:
- sync-async — the two paths represent the sync and async variants
- storage-variant — bounded vs unbounded or generated node variant
- view-consistency — view methods disagreeing with each other
- bulk-vs-single — aggregate operation diverging from per-element
- equivalent-by-construction — two APIs that should be interchangeable
- internal-path — fast vs slow path or maintenance task variants
Write the full report to .claude/reports/audit-sibling-divergence.md.
Format:
# Sibling Divergence Audit
[N] differential auditors compared [M] sibling pairs across [G] groups.
[K] findings survived self-challenge and evaluator challenge.
## Confirmed divergences (likely bugs)
#1 [severity] [axis] PAIR — one-line summary
- PATH-A: file:method
- PATH-B: file:method
- DIVERGENCE: ...
- WITNESS: ...
- OBSERVABLE: ...
## Intentional divergences (documented)
- [pair] — link to design doc that explains the difference
## Documentation gaps (intentional but undocumented)
- [pair] — what a fresh reader would conclude vs the actual intent
## Coverage summary
- Group A: [pairs inspected, methods traced, dismissals]
- Group B: ...
- ...
## Evaluator challenges
- [N] challenges received across [M] groups; [K] led to new findings;
[J] confirmed the original conclusion with additional evidence.
## Residual risk
What was not inspected and why.
Notes
- The auditor agent's 4-phase methodology applies here too, but per-pair
rather than per-method. Each subagent runs Phase 0 through Phase 3 for
its assigned pairs.
- Generator pairs (B2) are special: read both the generator and the
generated output. A divergence might be in the generator's emit logic
(one feature combination emits the wrong access mode) or in a missing
generator case (a feature combination emits no method at all).
- The synchronous() view of an async cache (E4) is the highest-yield pair
in group E because it goes through the async code path internally but
promises sync semantics. Historical bugs here include
synchronous()
exposing in-flight CompletableFuture state in unexpected ways.