name	optimizing-queries
description	Analyzes and optimizes SQL/NoSQL queries for performance. Use when reviewing query performance, optimizing slow queries, analyzing EXPLAIN output, suggesting indexes, identifying N+1 problems, recommending query rewrites, or improving database access patterns. Supports PostgreSQL, MySQL, SQLite, MongoDB, Redis, DynamoDB, and Elasticsearch.

Query Optimization

Analyze database queries and provide actionable optimization recommendations.

Workflow

Copy and track progress:

Query Optimization:
- [ ] Identify database type
- [ ] Analyze query structure
- [ ] Check for red flags
- [ ] Provide recommendations
- [ ] Suggest verification

Identify database type from syntax or user context
Analyze query structure - parse SELECT, FROM, WHERE, JOIN, ORDER BY, GROUP BY clauses
Check for red flags (see Quick Reference)
Provide concrete recommendations with copy-pasteable code
Suggest verification - EXPLAIN command or metric to confirm

For complex queries (3+ JOINs or nested subqueries):

Decompose into logical units
Analyze each unit's access pattern
Synthesize recommendations in priority order

Analysis Checklist

For each query, evaluate:

Index usage: Missing or inefficient indexes
Scan type: Full scans on large tables
Join efficiency: Join order, type, missing conditions
Filtering: WHERE placement, function usage on columns
Projection: SELECT * vs specific columns
Pagination: OFFSET vs keyset/cursor
N+1 detection: Loop patterns in code context

Quick Reference

SQL Red Flags

SELECT * on large tables
WHERE function(column) prevents index use
LIKE '%pattern' (leading wildcard)
OR conditions spanning different columns
Large OFFSET values
NOT IN with nullable subquery
Missing LIMIT on unbounded queries

NoSQL Red Flags

MongoDB: COLLSCAN, $where, unbounded arrays
Redis: KEYS *, missing TTL, wrong data structure
DynamoDB: Scan operations, hot partitions
Elasticsearch: Deep pagination, wildcard prefix

Examples

Example 1: Missing Index

Input:

SELECT * FROM orders WHERE customer_email = 'user@example.com';

Output:

## Issue [HIGH]
Full table scan on `orders` - no index on `customer_email`

## Impact
Query time grows linearly with table size; causes lock contention

## Solution
CREATE INDEX idx_orders_customer_email ON orders(customer_email);

## Verification
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_email = 'user@example.com';
-- Should show "Index Scan" instead of "Seq Scan"

Example 2: N+1 Query Pattern

Input:

users = db.query("SELECT * FROM users")
for user in users:
    orders = db.query(f"SELECT * FROM orders WHERE user_id = {user.id}")

Output:

## Issue [HIGH]
N+1 query pattern - executes 1 + N queries instead of 1

## Impact
100 users = 101 queries; network latency multiplied

## Solution
SELECT u.*, o.* FROM users u
LEFT JOIN orders o ON u.id = o.user_id;

## Verification
Check query count before/after; should reduce from N+1 to 1

Example 3: When NOT to Optimize

Input:

SELECT * FROM config WHERE key = 'app_version';

Output:

## No Issues Detected

This query on a small lookup table (typically <100 rows) is acceptable.
Adding an index would add write overhead without meaningful read benefit.

## When to Reconsider
- If table grows beyond 10,000 rows
- If query is called >100 times/second

Example 4: Incomplete Information

Input:

SELECT * FROM large_table WHERE status = ?;

Output:

## Issue [MEDIUM]
Preliminary analysis without schema information

## Potential Issues
1. `SELECT *` - may fetch unnecessary columns
2. `status` column - if low cardinality (e.g., 5 values), index may not help

## Information Needed
- Table row count
- Cardinality of `status` column
- Existing indexes: `SHOW INDEX FROM large_table;`

Example 5: MongoDB COLLSCAN

Input:

db.orders.find({ customer_email: "user@example.com" })

Output:

## Issue [HIGH]
COLLSCAN on `orders` - no index on `customer_email`

## Impact
Query scans entire collection; O(n) performance

## Solution
db.orders.createIndex({ customer_email: 1 })

## Verification
db.orders.find({ customer_email: "..." }).explain("executionStats")
// Should show "stage": "IXSCAN" instead of "COLLSCAN"

Response Format

Structure recommendations as:

## Issue [HIGH/MEDIUM/LOW]
[Specific problem identified]

## Impact
[Performance/resource cost]

## Solution
[Concrete fix with code]

## Verification
[EXPLAIN command or metric to confirm]

Confidence Levels

When uncertainty exists, indicate confidence:

HIGH: Pattern clearly matches known anti-pattern
MEDIUM: Likely issue, depends on data distribution/schema
LOW: Potential issue, needs EXPLAIN to confirm

Response Constraints

Max 5 recommendations per query - prioritize by impact
Order by severity: HIGH → MEDIUM → LOW
One primary fix per issue - mention alternatives briefly
Code must be copy-pasteable - no placeholders like <table_name>
Verification commands must be complete - include all flags

Prioritization Matrix

Issue Type	Impact	Priority
Missing JOIN index	Query blocks	P0
Full table scan (>100k rows)	Slow response	P0
N+1 pattern	Latency × N	P1
SELECT *	Memory/bandwidth	P1
Missing LIMIT	Resource exhaustion	P1
Large OFFSET	Slow pagination	P2
Suboptimal ORDER BY	Sort overhead	P2

Reference Files

Consult these for detailed patterns (use grep to find specific sections):

references/sql-patterns.md - SQL optimization
- Grep: ## Index, ## Query Rewrite, ## Join, ## Aggregation
references/nosql-patterns.md - MongoDB, Redis, DynamoDB, Elasticsearch
- Grep: ## MongoDB, ## Redis, ## DynamoDB, ## Elasticsearch
references/explain-analysis.md - Execution plans
- Grep: ## PostgreSQL, ## MySQL, ## MongoDB
references/common-scenarios.md - Common patterns
- Grep: ## Slow pagination, ## N+1, ## Dashboard, ## Bulk
references/orm-patterns.md - ORM fixes
- Grep: ### Django, ### SQLAlchemy, ### ActiveRecord, ### Prisma

Utility Scripts

All scripts use Python standard library only (no pip install required).

Run these for automated analysis:

Query Analysis

analyze_query.py - Detect anti-patterns and score complexity:

python scripts/analyze_query.py "SELECT * FROM orders WHERE ..."
python scripts/analyze_query.py --file query.sql --json

suggest_index.py - Generate index recommendations:

python scripts/suggest_index.py "SELECT ... FROM orders WHERE status = 'pending'"

EXPLAIN & Logs

parse_explain.py - Analyze EXPLAIN output:

python scripts/parse_explain.py --pg explain_output.txt
python scripts/parse_explain.py --mysql explain_output.txt

diff_explain.py - Compare before/after EXPLAIN outputs:

python scripts/diff_explain.py --pg before.txt after.txt
python scripts/diff_explain.py --mysql before.txt after.txt --json

parse_slow_log.py - Analyze slow query logs:

python scripts/parse_slow_log.py --pg /var/log/postgresql.log --top 20
python scripts/parse_slow_log.py --mysql slow-query.log --json

Schema & Code

analyze_schema.py - Find schema optimization opportunities:

python scripts/analyze_schema.py schema.sql

detect_orm_issues.py - Find ORM anti-patterns in code:

python scripts/detect_orm_issues.py app.py
python scripts/detect_orm_issues.py --dir ./models

check_migration.py - Check migration safety:

python scripts/check_migration.py --pg migration.sql
python scripts/check_migration.py --mysql migration.sql

Workflow with Scripts

For query optimization:

analyze_query.py → detect issues
suggest_index.py → generate indexes
User runs EXPLAIN (before)
User applies optimization
User runs EXPLAIN (after)
diff_explain.py → verify improvement

For verifying optimization impact:

# Save before state
psql -c "EXPLAIN (ANALYZE, BUFFERS) SELECT ..." > before.txt

# Apply optimization (add index, rewrite query, etc.)

# Save after state  
psql -c "EXPLAIN (ANALYZE, BUFFERS) SELECT ..." > after.txt

# Compare
python scripts/diff_explain.py --pg before.txt after.txt

For codebase audit:

detect_orm_issues.py --dir ./ → find N+1 patterns
analyze_schema.py → check schema
Prioritize fixes by severity

For migration review:

check_migration.py → safety check
Fix critical issues before deployment
Use safe alternatives (CONCURRENTLY, etc.)

EXPLAIN Commands

Provide appropriate commands:

-- PostgreSQL
EXPLAIN (ANALYZE, BUFFERS) SELECT ...;

-- MySQL
EXPLAIN ANALYZE SELECT ...;

-- SQLite
EXPLAIN QUERY PLAN SELECT ...;

// MongoDB
db.collection.find({...}).explain("executionStats")

Index Recommendation Format

When suggesting indexes:

-- PostgreSQL/MySQL
CREATE INDEX [CONCURRENTLY] idx_table_columns 
ON table(col1, col2, col3);
-- Rationale: [why this column order, covering considerations]

// MongoDB (ESR: Equality, Sort, Range)
db.collection.createIndex({ equality_field: 1, sort_field: -1, range_field: 1 })

Handling Missing Information

When user provides incomplete context:

No schema provided:

"To give precise recommendations, I need the table schema. Could you share:

Column data types

Existing indexes

Approximate row counts

Meanwhile, here's what I can infer from the query..."

No EXPLAIN output:

"Run this command and share the output:
EXPLAIN (ANALYZE, BUFFERS) <their query>;
This will show actual vs estimated rows and scan types."

Unknown database type:

"Which database are you using? (PostgreSQL, MySQL, SQLite, MongoDB, etc.) The optimization strategies differ significantly between them."

Conditional Workflows

Analyzing a query:

Run analyze_query.py first
Check for anti-patterns
Suggest fixes with examples

Interpreting EXPLAIN output:

Run parse_explain.py with appropriate flag
Identify warnings
Correlate with query structure

Full optimization request:

Analyze query → identify issues
Suggest indexes → generate CREATE statements
Request EXPLAIN → interpret results
Provide before/after comparison

Complex optimization (3+ issues or architectural decisions): Use extended reasoning to:

Map full query execution path
Identify all optimization opportunities
Evaluate trade-offs between approaches
Prioritize by impact and implementation cost

Script Error Handling

If scripts fail:

Check Python version (requires 3.7+)
Verify input format (SQL string, valid file path)
Fall back to manual analysis using reference files

Common errors:

FileNotFoundError → Check file path, use absolute paths
json.JSONDecodeError → Input not valid JSON, check --pg/--mysql flags
Empty output → Query may be valid, no issues detected

When scripts are unavailable: Perform manual analysis using Quick Reference red flags and reference files.

Unsupported Databases

For databases not explicitly covered (Oracle, SQL Server, CockroachDB, etc.):

Apply general SQL patterns from references/sql-patterns.md
Note: "Optimization patterns based on PostgreSQL/MySQL; verify syntax for [database]"
Recommend user consult database-specific documentation for EXPLAIN syntax and index creation

name	optimizing-queries
description	Analyzes and optimizes SQL/NoSQL queries for performance. Use when reviewing query performance, optimizing slow queries, analyzing EXPLAIN output, suggesting indexes, identifying N+1 problems, recommending query rewrites, or improving database access patterns. Supports PostgreSQL, MySQL, SQLite, MongoDB, Redis, DynamoDB, and Elasticsearch.

Query Optimization

Analyze database queries and provide actionable optimization recommendations.

Workflow

Copy and track progress:

Query Optimization:
- [ ] Identify database type
- [ ] Analyze query structure
- [ ] Check for red flags
- [ ] Provide recommendations
- [ ] Suggest verification

Identify database type from syntax or user context
Analyze query structure - parse SELECT, FROM, WHERE, JOIN, ORDER BY, GROUP BY clauses
Check for red flags (see Quick Reference)
Provide concrete recommendations with copy-pasteable code
Suggest verification - EXPLAIN command or metric to confirm

For complex queries (3+ JOINs or nested subqueries):

Decompose into logical units
Analyze each unit's access pattern
Synthesize recommendations in priority order

Analysis Checklist

For each query, evaluate:

Index usage: Missing or inefficient indexes
Scan type: Full scans on large tables
Join efficiency: Join order, type, missing conditions
Filtering: WHERE placement, function usage on columns
Projection: SELECT * vs specific columns
Pagination: OFFSET vs keyset/cursor
N+1 detection: Loop patterns in code context

Quick Reference

SQL Red Flags

SELECT * on large tables
WHERE function(column) prevents index use
LIKE '%pattern' (leading wildcard)
OR conditions spanning different columns
Large OFFSET values
NOT IN with nullable subquery
Missing LIMIT on unbounded queries

NoSQL Red Flags

MongoDB: COLLSCAN, $where, unbounded arrays
Redis: KEYS *, missing TTL, wrong data structure
DynamoDB: Scan operations, hot partitions
Elasticsearch: Deep pagination, wildcard prefix

Examples

Example 1: Missing Index

Input:

SELECT * FROM orders WHERE customer_email = 'user@example.com';

Output:

## Issue [HIGH]
Full table scan on `orders` - no index on `customer_email`

## Impact
Query time grows linearly with table size; causes lock contention

## Solution
CREATE INDEX idx_orders_customer_email ON orders(customer_email);

## Verification
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_email = 'user@example.com';
-- Should show "Index Scan" instead of "Seq Scan"

Example 2: N+1 Query Pattern

Input:

users = db.query("SELECT * FROM users")
for user in users:
    orders = db.query(f"SELECT * FROM orders WHERE user_id = {user.id}")

Output:

## Issue [HIGH]
N+1 query pattern - executes 1 + N queries instead of 1

## Impact
100 users = 101 queries; network latency multiplied

## Solution
SELECT u.*, o.* FROM users u
LEFT JOIN orders o ON u.id = o.user_id;

## Verification
Check query count before/after; should reduce from N+1 to 1

Example 3: When NOT to Optimize

Input:

SELECT * FROM config WHERE key = 'app_version';

Output:

## No Issues Detected

This query on a small lookup table (typically <100 rows) is acceptable.
Adding an index would add write overhead without meaningful read benefit.

## When to Reconsider
- If table grows beyond 10,000 rows
- If query is called >100 times/second

Example 4: Incomplete Information

Input:

SELECT * FROM large_table WHERE status = ?;

Output:

## Issue [MEDIUM]
Preliminary analysis without schema information

## Potential Issues
1. `SELECT *` - may fetch unnecessary columns
2. `status` column - if low cardinality (e.g., 5 values), index may not help

## Information Needed
- Table row count
- Cardinality of `status` column
- Existing indexes: `SHOW INDEX FROM large_table;`

Example 5: MongoDB COLLSCAN

Input:

db.orders.find({ customer_email: "user@example.com" })

Output:

## Issue [HIGH]
COLLSCAN on `orders` - no index on `customer_email`

## Impact
Query scans entire collection; O(n) performance

## Solution
db.orders.createIndex({ customer_email: 1 })

## Verification
db.orders.find({ customer_email: "..." }).explain("executionStats")
// Should show "stage": "IXSCAN" instead of "COLLSCAN"

Response Format

Structure recommendations as:

## Issue [HIGH/MEDIUM/LOW]
[Specific problem identified]

## Impact
[Performance/resource cost]

## Solution
[Concrete fix with code]

## Verification
[EXPLAIN command or metric to confirm]

Confidence Levels

When uncertainty exists, indicate confidence:

HIGH: Pattern clearly matches known anti-pattern
MEDIUM: Likely issue, depends on data distribution/schema
LOW: Potential issue, needs EXPLAIN to confirm

Response Constraints

Max 5 recommendations per query - prioritize by impact
Order by severity: HIGH → MEDIUM → LOW
One primary fix per issue - mention alternatives briefly
Code must be copy-pasteable - no placeholders like <table_name>
Verification commands must be complete - include all flags

Prioritization Matrix

Issue Type	Impact	Priority
Missing JOIN index	Query blocks	P0
Full table scan (>100k rows)	Slow response	P0
N+1 pattern	Latency × N	P1
SELECT *	Memory/bandwidth	P1
Missing LIMIT	Resource exhaustion	P1
Large OFFSET	Slow pagination	P2
Suboptimal ORDER BY	Sort overhead	P2

Reference Files

Consult these for detailed patterns (use grep to find specific sections):

references/sql-patterns.md - SQL optimization
- Grep: ## Index, ## Query Rewrite, ## Join, ## Aggregation
references/nosql-patterns.md - MongoDB, Redis, DynamoDB, Elasticsearch
- Grep: ## MongoDB, ## Redis, ## DynamoDB, ## Elasticsearch
references/explain-analysis.md - Execution plans
- Grep: ## PostgreSQL, ## MySQL, ## MongoDB
references/common-scenarios.md - Common patterns
- Grep: ## Slow pagination, ## N+1, ## Dashboard, ## Bulk
references/orm-patterns.md - ORM fixes
- Grep: ### Django, ### SQLAlchemy, ### ActiveRecord, ### Prisma

Utility Scripts

All scripts use Python standard library only (no pip install required).

Run these for automated analysis:

Query Analysis

analyze_query.py - Detect anti-patterns and score complexity:

python scripts/analyze_query.py "SELECT * FROM orders WHERE ..."
python scripts/analyze_query.py --file query.sql --json

suggest_index.py - Generate index recommendations:

python scripts/suggest_index.py "SELECT ... FROM orders WHERE status = 'pending'"

EXPLAIN & Logs

parse_explain.py - Analyze EXPLAIN output:

python scripts/parse_explain.py --pg explain_output.txt
python scripts/parse_explain.py --mysql explain_output.txt

diff_explain.py - Compare before/after EXPLAIN outputs:

python scripts/diff_explain.py --pg before.txt after.txt
python scripts/diff_explain.py --mysql before.txt after.txt --json

parse_slow_log.py - Analyze slow query logs:

python scripts/parse_slow_log.py --pg /var/log/postgresql.log --top 20
python scripts/parse_slow_log.py --mysql slow-query.log --json

Schema & Code

analyze_schema.py - Find schema optimization opportunities:

python scripts/analyze_schema.py schema.sql

detect_orm_issues.py - Find ORM anti-patterns in code:

python scripts/detect_orm_issues.py app.py
python scripts/detect_orm_issues.py --dir ./models

check_migration.py - Check migration safety:

python scripts/check_migration.py --pg migration.sql
python scripts/check_migration.py --mysql migration.sql

Workflow with Scripts

For query optimization:

analyze_query.py → detect issues
suggest_index.py → generate indexes
User runs EXPLAIN (before)
User applies optimization
User runs EXPLAIN (after)
diff_explain.py → verify improvement

For verifying optimization impact:

# Save before state
psql -c "EXPLAIN (ANALYZE, BUFFERS) SELECT ..." > before.txt

# Apply optimization (add index, rewrite query, etc.)

# Save after state  
psql -c "EXPLAIN (ANALYZE, BUFFERS) SELECT ..." > after.txt

# Compare
python scripts/diff_explain.py --pg before.txt after.txt

For codebase audit:

detect_orm_issues.py --dir ./ → find N+1 patterns
analyze_schema.py → check schema
Prioritize fixes by severity

For migration review:

check_migration.py → safety check
Fix critical issues before deployment
Use safe alternatives (CONCURRENTLY, etc.)

EXPLAIN Commands

Provide appropriate commands:

-- PostgreSQL
EXPLAIN (ANALYZE, BUFFERS) SELECT ...;

-- MySQL
EXPLAIN ANALYZE SELECT ...;

-- SQLite
EXPLAIN QUERY PLAN SELECT ...;

// MongoDB
db.collection.find({...}).explain("executionStats")

Index Recommendation Format

When suggesting indexes:

-- PostgreSQL/MySQL
CREATE INDEX [CONCURRENTLY] idx_table_columns 
ON table(col1, col2, col3);
-- Rationale: [why this column order, covering considerations]

// MongoDB (ESR: Equality, Sort, Range)
db.collection.createIndex({ equality_field: 1, sort_field: -1, range_field: 1 })

Handling Missing Information

When user provides incomplete context:

No schema provided:

"To give precise recommendations, I need the table schema. Could you share:

Column data types

Existing indexes

Approximate row counts

Meanwhile, here's what I can infer from the query..."

No EXPLAIN output:

"Run this command and share the output:
EXPLAIN (ANALYZE, BUFFERS) <their query>;
This will show actual vs estimated rows and scan types."

Unknown database type:

"Which database are you using? (PostgreSQL, MySQL, SQLite, MongoDB, etc.) The optimization strategies differ significantly between them."

Conditional Workflows

Analyzing a query:

Run analyze_query.py first
Check for anti-patterns
Suggest fixes with examples

Interpreting EXPLAIN output:

Run parse_explain.py with appropriate flag
Identify warnings
Correlate with query structure

Full optimization request:

Analyze query → identify issues
Suggest indexes → generate CREATE statements
Request EXPLAIN → interpret results
Provide before/after comparison

Complex optimization (3+ issues or architectural decisions): Use extended reasoning to:

Map full query execution path
Identify all optimization opportunities
Evaluate trade-offs between approaches
Prioritize by impact and implementation cost

Script Error Handling

If scripts fail:

Check Python version (requires 3.7+)
Verify input format (SQL string, valid file path)
Fall back to manual analysis using reference files

Common errors:

FileNotFoundError → Check file path, use absolute paths
json.JSONDecodeError → Input not valid JSON, check --pg/--mysql flags
Empty output → Query may be valid, no issues detected

When scripts are unavailable: Perform manual analysis using Quick Reference red flags and reference files.

Unsupported Databases

For databases not explicitly covered (Oracle, SQL Server, CockroachDB, etc.):

Apply general SQL patterns from references/sql-patterns.md
Note: "Optimization patterns based on PostgreSQL/MySQL; verify syntax for [database]"
Recommend user consult database-specific documentation for EXPLAIN syntax and index creation

optimizing-queries

Query Optimization

Workflow

Analysis Checklist

Quick Reference

SQL Red Flags

NoSQL Red Flags

Examples

Response Format

Confidence Levels

Response Constraints

Prioritization Matrix

Reference Files

Utility Scripts

Query Analysis

EXPLAIN & Logs

Schema & Code

Workflow with Scripts

EXPLAIN Commands

Index Recommendation Format

Handling Missing Information

Conditional Workflows

Script Error Handling

Unsupported Databases

Query Optimization

Workflow

Analysis Checklist

Quick Reference

SQL Red Flags

NoSQL Red Flags

Examples

Response Format

Confidence Levels

Response Constraints

Prioritization Matrix

Reference Files

Utility Scripts

Query Analysis

EXPLAIN & Logs

Schema & Code

Workflow with Scripts

EXPLAIN Commands

Index Recommendation Format

Handling Missing Information

Conditional Workflows

Script Error Handling

Unsupported Databases