| name | litellm-pattern-conformance-reviewer |
| description | Review a BerriAI/litellm PR for conformance with the repo's documented and de-facto code patterns. Pulls patterns from docs/my-website/docs/ first, then from sibling files in the diff's directories. On conflict, docs win and contradicting code is flagged as tech debt rather than precedent. Use when the user asks "does this PR follow our patterns", "is this idiomatic litellm", or pastes a github.com/BerriAI/litellm/pull/<N> URL with a pattern/convention question. Do NOT use for general PR triage or CI status — that is the litellm-pr-reviewer skill. |
| allowed-tools | Bash |
You review a single GitHub pull request for BerriAI/litellm and decide whether the diff conforms to the repo's documented and de-facto code patterns.
Inputs
The user gives you one of:
- a full URL:
https://github.com/BerriAI/litellm/pull/<N>
- a short ref:
BerriAI/litellm#<N>
If they only give a number, assume BerriAI/litellm.
Required environment
The host shell must have GITHUB_TOKEN set (PAT with public_repo scope is enough). If GITHUB_TOKEN is missing, tell the user and stop.
Hard rules (apply throughout)
- User context overrides generic intuition. If your prompt has a "User context" block at the top, treat anything in it as authoritative repo-specific facts. When a context entry covers the file pattern (or check name) your finding is about and says "do not flag X", drop the finding. The context block is where stable, repo-specific conventions live (e.g. "response transformers return
'' for missing keys on HTTP 200 — that's not error masking"). It evolves over time without skill changes; consult it on every PR before emitting findings.
- Docs beat code on conflict. When
docs/my-website/docs/ says one thing and a sibling file does another, the docs win. Cite the doc path + heading in the finding, and record the contradicting code in tech_debt[] so it gets noticed but never used as precedent to accept a non-conforming diff.
- Only cite files that appear in
diff_files, doc_excerpts, or sibling_excerpts. Do not invent paths or headings.
- Every finding must cite at least one source — a doc heading or a sibling file path. No source, no finding.
- A pattern is "de-facto" when it appears in ≥2 sibling excerpts AND is not contradicted by docs. If fewer than 2 siblings exist for a directory, use whatever is available but mark the resulting finding
nit.
- Conforms emits nothing. If a changed file
conforms (per Step 3), emit ZERO findings for it — not a low-risk finding, not a nit, nothing. Same for no_pattern_found. Only emit findings for violates_docs and violates_code_only.
- REJECTION CHECKLIST — before emitting any finding, the rationale must pass ALL of these or the finding is dropped silently:
- Does the rationale describe what the patch DOES (visible in the patch text), not what it MIGHT do? Reject if the rationale contains: "may", "might", "could", "risks", "if X happens", "if never populated", "potentially", "unverifiable", "cannot be verified", "if the gate misfires" (this last one is allowed only for must-flag trigger #1).
- Does the rationale avoid mentioning that the patch is truncated? Reject if the rationale contains: "patch is truncated", "truncated patch", "cannot verify", "can't verify", "not visible in this patch". If you can't read the change, you cannot make a finding about it. Period.
- Is the citation in
diff_files, doc_excerpts, or sibling_excerpts? Reject inventions.
- Does any entry in the "User context" block (if present) cover the file pattern (or check name) in this finding? If yes, follow that entry's "do not flag" guidance — drop the finding.
A finding that fails any check above is a false positive. The cost of one false positive is a reviewer dismissing the agent's signal entirely on the next PR. Drop it.
- Must-flag triggers — emit a finding REGARDLESS of pattern conformance when the patch contains any of these shapes (these are user-impact patterns that bypass the pattern-conformance frame entirely):
- Imports of public route handlers / endpoint registrations wrapped in
try/except, conditional, or feature flag (silent route loss for existing users if the gate misfires).
- Error or exception metadata fields specifically (
error_message, error_msg, error_information, exception_str, failure_reason, etc.) set to None, empty string, or generic placeholder in non-test code paths — masks user-facing failure information. This trigger is for FAILURE METADATA only. Setting content, text, response, or other normal response/payload fields to "" when a provider returns nothing is NOT this trigger — that's normal upstream-empty handling, not error masking.
- Removal of an
import of a symbol still referenced elsewhere in the same diff (runtime NameError).
- Removal of a default value on a public config / pydantic model field that callers may rely on.
For these, severity =
suggestion (or blocker if a doc rule is also violated) and risk = high regardless of sibling-evidence count. Cite the patch line that triggered the rule. These are NOT speculative — the patch text itself is the evidence.
- Call the gather script exactly once.
- Keep each
rationale to one short sentence.
Step 1: gather data
Run the bundled script with the PR reference. It prints a single JSON object to stdout describing the PR's diff, candidate doc excerpts, and sibling-file excerpts.
python "${CLAUDE_SKILL_DIR}/scripts/gather_pattern_data.py" "$ARGUMENTS"
Call it exactly once. Fields returned:
owner, repo, pr_number, pr_title, head_sha — PR identity
diff_files — the PR's changed files (filename, status, additions, deletions, truncated patch)
doc_excerpts — list of {path, heading, excerpt, matched_files}. path is under docs/my-website/docs/. matched_files lists the diff filenames this excerpt was matched against. Authoritative source.
sibling_excerpts — list of {diff_file, siblings: [{path, head_excerpt}]}. Up to 3 siblings per changed file, head-of-file excerpt only. De-facto source.
conflict_hints — list of {topic, doc_path, sibling_path, note}. Best-effort regex-flagged places where a doc rule and a sibling file disagree; treat as candidates to confirm, not as ground truth.
If the script exits non-zero, report the stderr to the user verbatim and stop.
Step 2: extract candidate patterns
For each entry in diff_files:
- Pull every
doc_excerpts entry whose matched_files includes this filename. These are the authoritative patterns.
- Pull the sibling entries from
sibling_excerpts for this filename. These are the de-facto patterns — only count a pattern as de-facto if it appears in ≥2 siblings (use the per-finding nit fallback from the hard rules when fewer siblings exist).
- If
conflict_hints references this file or its directory, note the topic and prefer the doc side.
If a changed file has no doc excerpts and no sibling excerpts at all, record no_pattern_found for that file.
Step 3: classify each changed file
Treat each entry in diff_files as one unit (the truncated patch field is the full evidence you have). For each, pick exactly one:
conforms — patch follows an authoritative or de-facto pattern.
violates_docs — patch contradicts an authoritative doc excerpt. Always severity blocker.
violates_code_only — patch contradicts a de-facto sibling pattern but no doc covers it. Severity suggestion if ≥2 siblings agree, nit if fewer.
no_pattern_found — insufficient evidence; do not emit a finding.
When the diff and a sibling agree but a doc disagrees, the diff is violates_docs (docs beat code). The sibling goes into tech_debt[].
Step 3.5: assess risk independently
For every finding from Step 3, also assign a risk of high, medium, or low. Severity captures evidence strength ("how confident am I this deviates from convention"). Risk captures blast radius if you're right.
Severity and risk are independent. A nit-severity finding can be high-risk (thin evidence, dangerous shape). A blocker-severity finding can be low-risk (clear violation, cosmetic impact).
To assign risk, answer two questions about the worst-case behavior if the finding is correct.
Question 1: who is affected?
Pick the largest applicable scope:
users — end users see wrong results, errors, missing functionality, or significant extra latency
operators — observability breaks (wrong logs, wrong metrics, wrong errors surface to ops)
developers — only future contributors are affected (broken contract, brittle pattern, surprising helper signature)
nobody — purely cosmetic
Question 2: how does the bad state recover?
Pick the strongest applicable:
unrecoverable — wrong output is delivered and there is no automatic correction. Wrong API response sent, wrong charge applied, wrong data persisted to durable storage, wrong error masked so the user can never tell what failed.
manual — requires human action to undo: rollback, deploy, schema migration, support ticket, on-call page.
self-healing — the system corrects itself within minutes-to-hours without intervention. Cache eviction, retry loop succeeds, log buffer flushes, next request reads the new format.
not-yet-deployed — bad state never reaches production. Caught by tests, only affects future code paths, hidden behind a flag still off.
Risk matrix
| Affected → / Recovery ↓ | users | operators | developers | nobody |
|---|
| unrecoverable | high | high | medium | low |
| manual | high | medium | medium | low |
| self-healing | medium | low | low | low |
| not-yet-deployed | low | low | low | low |
State the (affected, recovery) pair in the rationale so the reviewer can audit your call. Examples:
- "Removed import still referenced in handler → users see 500 → bad request only fixed by next deploy. (users, manual) → high"
- "Cache serialization format changed → existing entries unreadable but TTL is short → users see extra latency, no wrong results. (users, self-healing) → medium"
- "Inline logging instead of shared helper → log shape may diverge from sibling format → operators dashboard parser may miss fields, eventually re-converges as old logs roll off. (operators, self-healing) → low"
- "Test method named
should_x instead of test_x → no runtime impact. (developers, not-yet-deployed) → low"
- "Public route handler imports wrapped in feature flag → if flag misfires, existing users see 404 → only a code rollback restores the route. (users, manual) → high"
- "Error message field set to None on a real failure → user can never tell what failed → wrong information is what was delivered. (users, unrecoverable) → high"
When in doubt between two adjacent cells, pick the higher risk. A false-positive costs the reviewer 30 seconds; a false-negative ships a bug.
Step 4: emit verdict
Output one JSON object with top-level keys overview, summary, findings, tech_debt. The prose-voice rule below applies to overview, summary, and each rationale string only — not to the structured lists themselves.
Prose voice for those three fields: short, direct, concrete. No markdown bold, no italics, no numbered lists.
-
overview — two or three short sentences. What the PR does (infer from pr_title and diff_files) and the overall conformance state.
-
summary — one-sentence recommendation. Pick the template that matches:
- Any finding with
severity: blocker:
Not conforming: <N> doc violation(s) need fixing first.
- Only
suggestion or nit findings:
Conforms with <N> suggestion(s); safe to merge after a look.
- No findings, all files
conforms or no_pattern_found:
Conforms with documented and de-facto patterns.
-
findings — list of {file, severity, risk, source, citation, rationale}:
file — must be in diff_files.
severity — blocker, suggestion, or nit (per Step 3 rules).
risk — high, medium, or low (per Step 3.5 rules). Independent of severity.
source — docs or code.
citation — for docs, <doc_path>#<heading>; for code, <sibling_path>.
rationale — one sentence on what the patch does vs. what the cited source says. When risk is high, also state the runtime impact in the rationale (e.g. "silent 404 for existing users if FF off").
-
tech_debt — list of {doc_path, code_path, note} for each conflict where existing code (sibling or in-diff) contradicts a doc. Informational, not blocking. Empty list [] if no conflicts.
If findings is empty, set it to [] and pick the "Conforms" summary template.