// Code quality standards for PR review agents including try/except rules, logging patterns, type hints, and verification requirements. Use when reviewing code, spawning PR review agents, or validating code quality.
| name | pr-review-standards |
| description | Code quality standards for PR review agents including try/except rules, logging patterns, type hints, and verification requirements. Use when reviewing code, spawning PR review agents, or validating code quality. |
| allowed-tools | ["Read"] |
Purpose: Critical standards for code review agents to ensure consistent, evidence-based PR reviews with proper verification requirements.
Use when:
| Principle | Description | Why It Matters |
|---|---|---|
| NO Assumptions | If uncertain, flag as "needs_verification" | Prevents false positives and wasted effort |
| PROOF Required | Every finding needs evidence (code snippet, trace, test) | Enables actionable fixes, not vague suggestions |
| Read IN CONTEXT | Entire function, not just flagged line | Prevents missing validation/handling elsewhere |
| Trace Execution | Confirm issue actually occurs | Distinguishes ACTUAL bugs from POTENTIAL risks |
| Context-Aware | External APIs need higher scrutiny than internal utils | Focuses effort on real risk areas |
| Allowed For | NEVER Wrap | Evidence Pattern |
|---|---|---|
Network calls (requests.*, http.*) | dict.get() | Connection errors only |
Database ops (pymongo.*, sqlalchemy.*, redis.*) | json.loads() | DB connection failures |
| External APIs (cache, message queues) | File I/O (files auto-close) | API timeouts |
| Type conversions | ||
List operations (list[0], list.append()) | ||
| String operations |
Exception wrapping security: try/except can hide auth failures
try: verify_token() except: pass (silent auth bypass)if not verify_token(): return abort(401) (explicit)Location: templates/pr-review-code-analysis.md:69-73
| Requirement | Pattern | Anti-Pattern |
|---|---|---|
| Import | import logging; LOG = logging.getLogger(__name__) | import logging (root logger) |
| Usage | LOG.info('msg', extra={'key': val}) | print() or logging.info() |
| NO PII | extra={'user_id': id} | f"User {email}" |
| NO Secrets | Never log passwords, tokens, API keys | LOG.debug(f"API key: {key}") |
Location: templates/pr-review-code-analysis.md:20
| Standard | Requirement | Exception |
|---|---|---|
| Type hints | Required for new code | Legacy code not being modified |
| Line length | 80 chars max | Long strings, URLs (with # noqa + reason) |
# noqa | Must document reason | Never use without comment |
Location: templates/pr-review-code-analysis.md:21-22
| Check | Look For | Severity |
|---|---|---|
| Single-line wrappers | Functions that just call another function | Medium |
| Commented code | Code blocks in comments (should be deleted) | Medium |
| Error messages | Include context for debugging | High |
| Error handling | Errors logged with sufficient context | High |
Location: templates/pr-review-code-analysis.md:23-24
| Finding Type | Required Evidence | Example |
|---|---|---|
| Logic bug | Code snippet + reproduce scenario | average = total / count # count can be 0 if list empty |
| Security | Vulnerable code + exploitation scenario | query = f"SELECT * WHERE id={user_id}" + user_id="1 OR 1=1" |
| Performance | Before/after + quantified impact | 1000 users = 1000 queries (was 1 batch) |
| Breaking change | Changed code + affected callers | def func(a, b): → def func(a, b, c): + caller list |
Location: templates/pr-review-code-analysis.md:26-31, pr-review-security.md:28-32
Location: All templates have variations, consolidated here
critical: WILL break production (with proof)
TypeError with inputs, SQL injection with payload, crash on empty listhigh: WILL cause user-visible problems (with proof)
medium: Should fix, increases bug risk
low: Nice to have, not urgent
DO NOT ASSIGN SEVERITY (don't flag at all):
THE THRESHOLD: Can you write a failing test? If NO, don't flag it.
| Severity | Code Analysis | Security | Performance | Tests |
|---|---|---|---|---|
| Critical | Crashes, data corruption, breaking changes | Exploitable vulnerabilities (injection, auth bypass, secrets) | 100x+ regression (O(n²), GB RAM) | Critical functions untested (payment, auth) |
| High | Logic bugs with user impact, missing error handling | Security weaknesses (missing validation, PII in logs) | 10x+ regression (N+1 queries, removed caching) | Modified functions tests not updated, incorrect tests |
| Medium | Code quality issues, improper try/except, missing edge cases | Defense-in-depth issues (missing size limits, weak crypto) | 2-5x regression (suboptimal algorithm) | Missing edge cases, test organization issues |
| Low | Style issues, optimization opportunities | Best practice violations (not immediate risk) | <2x regression, micro-optimizations | Minor improvements (better names, fixtures) |
| Uncertain | Can't confirm without human review | Can't confirm exploit without context | Can't measure without profiling | Test looks wrong but can't confirm |
Location: templates/pr-review-code-analysis.md:178-184, pr-review-security.md:263-269, pr-review-performance.md:201-207, pr-review-tests.md:231-237
Use "uncertain" severity when:
DON'T use "uncertain" for:
Location: All templates in needs_verification sections
| Level | Applies To | Additional Checks | Skip Checks |
|---|---|---|---|
| HIGH | Public APIs, payment logic, PII handling, auth/authz | Rate limiting, CSRF, input size limits, output encoding, SQL parameterization, audit logging | None |
| MEDIUM | Internal APIs, backend services, no direct user input | Input validation (less paranoid), SQL parameterization, no secrets in logs | Rate limiting, CSRF, output encoding |
| LOW | Pure utilities, data transformations, no I/O | Logic correctness, edge cases | Most security checks (no attack surface) |
Detection patterns:
@app.route, @api.route, FastAPI decoratorsstripe, payment, transaction, billingemail, ssn, phone, address in user modelsinternal, rpc, service-to-serviceLocation: templates/pr-review-security.md:34-79
| Standard | Requirement | Location |
|---|---|---|
| 1:1 mapping | One test file per production file | tests/unit/test_<module>.py for src/<module>.py |
| Coverage | 95%+ line, 100% function | Unit tests |
| Pattern | AAA (Arrange-Act-Assert) | All tests |
| Mocking | Mock EVERYTHING external + OTHER INTERNAL FUNCTIONS | Use @patch() |
| Parametrization | Use @pytest.mark.parametrize for multiple cases | When >3 similar test cases |
| Integration | 2-4 files per module (ADD to existing) | Don't create new files |
Location: templates/pr-review-tests.md:16-27, TESTING_STANDARDS.md (referenced)
| Always Mock | NEVER Mock |
|---|---|
Database calls (db.*, cursor.*) | Function being tested |
API calls (requests.*, external services) | Pure functions (no side effects) |
File I/O (open(), write()) | Constants/config |
Cache operations (redis.*, memcache.*) | |
| Other internal functions called by function under test |
WHY: Test function in ISOLATION, not with dependencies
Location: templates/pr-review-tests.md:22-24
| Rule | Why | Applies To |
|---|---|---|
| Every finding MUST have file:line reference | No vague claims | All reviews |
| Every finding MUST have evidence | Actionable fixes | All reviews |
| Be specific and actionable | Exact fix, not "fix this" | All reviews |
| If uncertain, flag as needs_verification | Prevent false positives | All reviews |
| Context matters - read entire function | Catch validation elsewhere | All reviews |
| Check ALL changed files | Don't skip tests/config | All reviews |
Location: templates/pr-review-code-analysis.md:186-194
Progressive investigation:
If >10 issues in one file: Flag for architectural discussion (not nitpicking)
Location: templates/pr-review-code-analysis.md:196-207
Required fields:
status: "COMPLETE"[category]_issues: Array of findingsneeds_verification: Array of uncertain findingspositive_findings: Array of good patterns foundEach finding MUST include:
file: File pathline: Line number (or range)issue/vulnerability/problem: What's wrongevidence: Code snippet showing issueseverity: critical/high/medium/low/uncertainfix: Specific actionable solutionOptional but recommended:
impact: What happens if not fixedbefore: Previous code (for comparisons)after: Current codeLocation: All templates include JSON response format sections
| Category | Cases to Test | Example |
|---|---|---|
| None/Null | Does function handle None gracefully? | if value is None: return default |
| Empty | Empty list, dict, string | if not items: return [] |
| Zero | Numeric 0 (division by zero) | if count == 0: return 0.0 |
| Negative | Negative numbers where positive expected | if amount < 0: raise ValueError |
| Large | Very large numbers, long strings | Overflow, memory issues |
| Unicode | Non-ASCII characters, emojis | "test 🎉".encode('utf-8') |
| Duplicates | Duplicate items in list | Deduplication logic |
| Boundary | Min/max values (INT_MAX, 1-item list) | Off-by-one errors |
Location: templates/pr-review-tests.md:283-293
| Pattern | Evidence | Impact Calculation |
|---|---|---|
| N+1 queries | DB query inside loop | N items × 1 query = N queries (should be 1) |
| Algorithm regression | Nested loops replacing single pass | O(n) → O(n²): 1000 items = 1M ops |
| Removed caching | @lru_cache deleted | 1 calc + cache → N calcs (1000x) |
| No batching | for item: db.insert(item) | 100 items = 100 round-trips (should be 1) |
| Memory bloat | list(db.find()) on large collection | 1M records × 1KB = 1GB RAM |
Location: templates/pr-review-performance.md:40-67, 219-247
Always quantify performance impact:
Location: templates/pr-review-performance.md:210-217
| Verdict | When to Use | Evidence Required |
|---|---|---|
| CONFIRMED | Issue is real | Additional proof, execution trace |
| CONFIRMED + DOWNGRADED | Issue real but less severe | Mitigating factors found |
| FALSE_POSITIVE | Issue doesn't exist | Code preventing issue, explanation why |
| UNCERTAIN | Can't determine | What's needed to resolve, why uncertain |
Location: templates/pr-review-verification.md:17-24
1. Read code in full context
↓
2. Can you find evidence issue exists?
YES → Continue to 3
NO → Continue to 6
↓
3. Trace execution - can issue actually be triggered?
YES → CONFIRMED
NO → Continue to 4
↓
4. Is there code preventing issue that agent missed?
YES → FALSE_POSITIVE
NO → Continue to 5
↓
5. Unclear if issue can be triggered?
YES → UNCERTAIN
↓
6. Can you prove issue doesn't exist?
YES → FALSE_POSITIVE
NO → UNCERTAIN
Location: templates/pr-review-verification.md:232-256
| Pattern | What Agent Missed | Resolution |
|---|---|---|
| Validation in caller | Agent checked function, not caller | FALSE_POSITIVE |
| Error handling | try/except or guard clause | FALSE_POSITIVE |
| Type checking | Type hints guarantee non-None | FALSE_POSITIVE |
| Business logic | "Missing auth" is intentional (public endpoint) | FALSE_POSITIVE |
Location: templates/pr-review-verification.md:258-304
Include these standards in agent prompt:
Code Analysis Agent:
LOG = logging.getLogger(__name__)# noqa needs reasonSecurity Agent:
Performance Agent:
Test Agent:
Verification Agent:
Relationship: pr-review-standards provides the WHAT (standards to check), agent-prompting provides the HOW (how to write prompts).
When to load both:
Typical usage:
Can you prove the issue exists with concrete evidence? If not, flag as "needs_verification".
Evidence-based reviews that produce actionable findings, minimize false positives, and respect context.
Remember: Better to flag as uncertain than to claim something is wrong without proof.