| name | backend-code-review |
| description | Review backend code for quality, security, maintainability, and best practices based on established checklist rules. Use when the user requests a review, analysis, or improvement of backend files (e.g., `.py`) under the `src/backend/` directory. Do NOT use for frontend files (e.g., `.tsx`, `.ts`, `.js`). Supports pending-change review, code snippets review, and file-focused review. |
Backend Code Review
When to use this skill
Use this skill whenever the user asks to review, analyze, or improve backend code (e.g., .py) under the src/backend/ directory. Supports the following review modes:
- Pending-change review: when the user asks to review current changes (inspect staged/working-tree files slated for commit to get the changes).
- Code snippets review: when the user pastes code snippets (e.g., a function/class/module excerpt) into the chat and asks for a review.
- File-focused review: when the user points to specific files and asks for a review of those files (one file or a small, explicit set of files, e.g.,
src/backend/base/langflow/api/v1/flows.py).
Do NOT use this skill when:
- The request is about frontend code or UI (e.g.,
.tsx, .ts, .js, src/frontend/).
- The user is not asking for a review/analysis/improvement of backend code.
- The scope is not under
src/backend/ (unless the user explicitly asks to review backend-related changes outside src/backend/).
How to use this skill
Follow these steps when using this skill:
- Identify the review mode (pending-change vs snippet vs file-focused) based on the user's input. Keep the scope tight: review only what the user provided or explicitly referenced.
- Follow the rules defined in Checklist to perform the review. If no Checklist rule matches, apply General Review Rules as a fallback to perform the best-effort review.
- Compose the final output strictly following the Required Output Format.
Notes when using this skill:
- Always include actionable fixes or suggestions (including possible code snippets).
- Use best-effort
File:Line references when a file path and line numbers are available; otherwise, use the most specific identifier you can.
Checklist
- db schema design: if the review scope includes code/files under
src/backend/base/langflow/services/database/models/ or Alembic migrations under src/backend/base/langflow/alembic/versions/, follow references/db-schema-rule.md to perform the review
- architecture: if the review scope involves route/service/model layering, dependency direction, or moving responsibilities across modules, follow references/architecture-rule.md to perform the review
- service abstraction: if the review scope contains table/model operations (e.g.,
select(...), session.execute(...), joins, CRUD) and is not already inside a service under src/backend/base/langflow/services/, follow references/repositories-rule.md to perform the review
- sqlalchemy patterns: if the review scope involves SQLAlchemy/SQLModel session/query usage, db transaction/crud usage,
session_scope() usage, or raw SQL usage, follow references/sqlalchemy-rule.md to perform the review
General Review Rules
1. Security Review
Check for:
- SQL injection vulnerabilities (especially raw
text() queries with string interpolation). Consequence: attacker can read/modify/delete any data in the database.
- Server-Side Request Forgery (SSRF) in component HTTP calls. Consequence: attacker uses the server to scan internal networks or access cloud metadata endpoints.
- Command injection (especially in subprocess or shell-executing components). Consequence: attacker gains shell access to the server.
- Insecure deserialization (pickle, yaml.load without SafeLoader). Consequence: arbitrary code execution on the server.
- Hardcoded secrets/credentials. Consequence: secrets leak via git history and are impossible to fully revoke.
- Improper authentication/authorization (missing
CurrentActiveUser dependency). Consequence: unauthenticated users can access protected endpoints.
- Insecure direct object references (missing
user_id scoping on queries). Consequence: user A can read/modify user B's flows, variables, API keys.
- Path traversal in file storage operations. Consequence: attacker reads arbitrary server files (e.g.,
/etc/passwd, .env).
2. Performance Review
Check for:
- N+1 queries (especially in loops calling
session.execute()). Consequence: 100 flows = 101 DB queries instead of 2; page load goes from 50ms to 5s.
- Missing database indexes on frequently queried columns. Consequence: full table scans on large datasets; queries degrade from O(log n) to O(n).
- Memory leaks (unbounded caches, retained references in long-lived services). Consequence: server OOM after hours of operation; pods restart in production.
- Blocking operations in async code (
time.sleep(), synchronous I/O, CPU-bound work without run_in_executor). Consequence: entire event loop stalls; all concurrent requests hang until the blocking call completes.
- Missing caching opportunities for expensive computations. Consequence: repeated computation of the same result on every request.
- Large result sets loaded entirely into memory without pagination. Consequence: memory spike + slow response when user has 10K+ flows.
3. Code Quality Review
Check for:
- Code forward compatibility with Python 3.10-3.13
- Code duplication (DRY violations — extract when the exact same business rule is duplicated in 3+ places)
- Functions doing too much (SRP violations — if you need "and" to describe it, split it)
- Deep nesting / complex conditionals (prefer early returns and guard clauses)
- Magic numbers/strings (extract to named constants or enums)
- Poor naming: unclear abbreviations, misleading names, generic names (
data, result, obj, temp). Functions should use verbs (get, create, validate). Booleans should use prefixes (is_, has_, can_, should_).
- Missing error handling (bare
except, swallowed exceptions, silent failures)
- Incomplete type coverage (use strong typing, avoid
Any where a concrete type is known)
- Use Python 3.10+ union syntax (
X | Y not Union[X, Y], X | None not Optional[X])
- Use
TYPE_CHECKING guard for imports only needed for type annotations (prevents circular imports)
- Use
Annotated[Type, Depends(...)] with project aliases (CurrentActiveUser, DbSession, DbSessionReadOnly) for FastAPI DI
- Google-style docstrings (enforced by Ruff):
Args:, Returns:, Raises: sections for public functions
- Violations of SOLID principles
- YAGNI violations (code that anticipates future needs without a present requirement)
- Line length exceeding 120 characters (project Ruff config)
- Comments that explain WHAT instead of WHY (comments should only explain reasoning, not restate code)
- Commented-out code (use version control instead)
- Boolean parameters that switch function behavior (split into two named functions instead)
- Mutable shared state where immutable alternatives exist (prefer returning new objects over mutation)
4. File Structure Review
Check for:
- Production files exceeding ~500 lines of code (excluding imports, types, and docstrings). Files above 600 lines are a red flag and should be split by responsibility. Why: Files above 500 lines have statistically higher defect rates and take longer to review. They signal multiple responsibilities (SRP violation). In Langflow, services like
DatabaseService that grow beyond this limit should have their CRUD operations extracted to dedicated modules.
- Test files exceeding ~1000 lines. Split by logical grouping if exceeded.
- No more than 5 functions with different responsibilities in a single file (per AGENTS-example.md).
- Each file has a single reason to exist and a single reason to change (SRP).
- No generic file names:
utils.py, helpers.py, misc.py, common.py as standalone files. Why: A file named utils.py becomes a dumping ground for unrelated functions. Within months it has 50+ functions covering formatting, validation, parsing, and HTTP calls — violating SRP. Each function group should be in a file named after its responsibility (formatting.py, validation.py).
5. Testing Review
Check for:
- Missing test coverage for new code paths
- Tests that don't test behavior (testing implementation details)
- Flaky test patterns (time-dependent, order-dependent, external-service-dependent)
- Proper use of
pytest.mark.asyncio for async tests
- Excessive mocking (prefer real integrations per project conventions)
- Coverage target: 80% (minimum acceptable: 75%)
- Test anti-patterns: The Liar (passes but doesn't verify claimed behavior), The Mirror (asserts exactly what code does), The Giant (50+ lines setup), The Mockery (tests only mock setup), The Inspector (coupled to implementation), The Chain Gang (depends on execution order), The Flaky (inconsistent results)
Happy path tests are the foundation but are NOT enough. Tests MUST also challenge the code to find real defects:
- Unexpected inputs:
None, "", [], {}, 0, -1, UUID("00000000-0000-0000-0000-000000000000")
- Boundary values: max length strings, exactly at the limit, one past the limit, zero items, max items
- Malformed data: missing required fields, extra unexpected fields, wrong types, invalid formats
- Error states: what happens when the database is down? When an external API returns 500? When the user doesn't exist?
- What should NOT happen: verify that user A CANNOT access user B's flows. Verify that a deleted flow returns 404. Verify that invalid
endpoint_name is rejected with 422.
- Error messages and types: not just that it fails, but that it fails with the RIGHT exception and the RIGHT message
- Concurrency: what happens when two requests try to update the same flow simultaneously?
Write tests based on REQUIREMENTS/SPEC, not on what the source code currently does. This is how you catch bugs where the code diverges from expected behavior.
When a test fails: first ask if the CODE is wrong, not the test. Do NOT silently change a failing assertion to match the current code without understanding WHY.
6. Observability Review
Check for:
- Use the async logger from
lfx.log.logger with a-prefixed methods (adebug, ainfo, awarning, aerror, aexception). Never use print() or stdlib logging.
- Log at key decision points and boundaries, not inside tight loops
- Include: operation name, relevant IDs, outcome (success/failure), duration if relevant
- Correct log levels: ERROR (broken, needs attention), WARN (degraded but recoverable), INFO (significant events), DEBUG (diagnostic, off in prod)
- ZERO PII TOLERANCE: Never log email addresses, user names, phone numbers, tokens, passwords. Only approved identifiers:
user_id, flow_id, session_id
- No
print() statements — these go to production logs
- Use
{e!s} for string representation of exceptions in log messages
7. Pre-Commit Verification
For pending-change reviews, verify the author has run:
make format_backend (Ruff formatter) — inconsistent formatting creates noisy diffs that hide real changes in code review. Format first, review second.
make lint (MyPy type checking) — type errors caught at lint time are 10x cheaper to fix than runtime crashes in production. Langflow services use duck typing via Service base class; MyPy catches mismatches early.
make unit_tests (pytest) — a failing test means the change breaks existing behavior. Never merge with failing tests; investigate whether the code or the test is wrong.
Required Output Format
When this skill is invoked, the response must exactly follow one of the two templates:
Template A (any findings)
# Code Review Summary
Found <X> critical issues need to be fixed:
## 🔴 Critical (Must Fix)
### 1. <brief description of the issue>
FilePath: <path> line <line>
<relevant code snippet or pointer>
#### Explanation
<detailed explanation and references of the issue>
#### Suggested Fix
1. <brief description of suggested fix>
2. <code example> (optional, omit if not applicable)
---
... (repeat for each critical issue) ...
Found <Y> suggestions for improvement:
## 🟡 Suggestions (Should Consider)
### 1. <brief description of the suggestion>
FilePath: <path> line <line>
<relevant code snippet or pointer>
#### Explanation
<detailed explanation and references of the suggestion>
#### Suggested Fix
1. <brief description of suggested fix>
2. <code example> (optional, omit if not applicable)
---
... (repeat for each suggestion) ...
Found <Z> optional nits:
## 🟢 Nits (Optional)
### 1. <brief description of the nit>
FilePath: <path> line <line>
<relevant code snippet or pointer>
#### Explanation
<explanation and references of the optional nit>
#### Suggested Fix
- <minor suggestions>
---
... (repeat for each nits) ...
## ✅ What's Good
- <Positive feedback on good patterns>
- If there are no critical issues or suggestions or optional nits or good points, just omit that section.
- If the issue number is more than 10, summarize as "Found 10+ critical issues/suggestions/optional nits" and only output the first 10 items.
- Don't compress the blank lines between sections; keep them as-is for readability.
- If there is any issue that requires code changes, append a brief follow-up question to ask whether the user wants to apply the fix(es) after the structured output. For example: "Would you like me to use the Suggested fix(es) to address these issues?"
Template B (no issues)
## Code Review Summary
✅ No issues found.