with one click
sqlite-review
// Review and tune SQLite schemas, queries, indexes, and pragmas. Connects to the actual database to gather concrete evidence (EXPLAIN QUERY PLAN, page counts, table stats) before recommending changes.
// Review and tune SQLite schemas, queries, indexes, and pragmas. Connects to the actual database to gather concrete evidence (EXPLAIN QUERY PLAN, page counts, table stats) before recommending changes.
[HINT] Download the complete skill directory including SKILL.md and all related files
| name | sqlite-review |
| description | Review and tune SQLite schemas, queries, indexes, and pragmas. Connects to the actual database to gather concrete evidence (EXPLAIN QUERY PLAN, page counts, table stats) before recommending changes. |
Audit and optimize SQLite usage in this project. Every recommendation must be backed by concrete evidence gathered from the live database — never guess when you can measure.
crates/scanner-scheduler/src/store/ or related persistence modules| Database | Path pattern | Purpose |
|---|---|---|
findings.db | <store_root>/findings.db | Star-schema store for scan results (runs, occurrences, secrets, rules, paths, roots) |
triage.sqlite | <store_root>/triage/triage.sqlite | User triage decisions (occurrence_triage, secret_triage) |
Key source files:
| File | Role |
|---|---|
crates/scanner-scheduler/src/store.rs | Store module (DDL, pragmas, write/read paths) |
Find or create a database to work with. Prefer a real database from a test run over an empty one, since stats on empty tables are meaningless.
# Find existing databases.
find . -name 'findings.db' -o -name 'triage.sqlite' 2>/dev/null
# If none exist, run the integration tests to generate one.
cargo test --test integration sqlite_persistence -- --nocapture 2>&1 | head -40
# Or create one by running the scanner briefly against a test fixture.
If no database is available, create a temporary in-memory one by executing the
schema DDL and populating it with synthetic data so that EXPLAIN QUERY PLAN and
page-count queries return meaningful results. Use the sqlite3 CLI:
sqlite3 /tmp/scanner_review.db < <(cat <<'SQL'
-- Extract DDL from the persistence backend (not yet in-tree as raw SQL).
-- Check crates/scanner-scheduler/src/store.rs for producer contracts and
-- search for CREATE TABLE statements in the codebase, then INSERT synthetic rows.
SQL
)
Connect to the database with sqlite3 and gather facts. Capture all of
these before proposing any changes.
sqlite3 <db_path> <<'SQL'
-- 1. Integrity check.
PRAGMA integrity_check;
-- 2. Confirm pragmas.
PRAGMA journal_mode;
PRAGMA synchronous;
PRAGMA foreign_keys;
PRAGMA cache_size;
PRAGMA page_size;
PRAGMA wal_checkpoint(PASSIVE);
-- 3. Schema version.
PRAGMA user_version;
-- 4. Table and index page counts (proxy for size).
SELECT name, type FROM sqlite_master WHERE type IN ('table','index') ORDER BY type, name;
-- 5. Per-table row counts and page estimates.
SELECT 'roots' AS tbl, COUNT(*) FROM roots
UNION ALL SELECT 'paths', COUNT(*) FROM paths
UNION ALL SELECT 'rules', COUNT(*) FROM rules
UNION ALL SELECT 'secrets', COUNT(*) FROM secrets
UNION ALL SELECT 'runs', COUNT(*) FROM runs
UNION ALL SELECT 'occurrences', COUNT(*) FROM occurrences
UNION ALL SELECT 'observations', COUNT(*) FROM observations
UNION ALL SELECT 'run_rules', COUNT(*) FROM run_rules;
-- 6. Database file size (page_count * page_size).
SELECT page_count * page_size AS db_bytes,
page_count,
page_size
FROM pragma_page_count(), pragma_page_size();
-- 7. Freelist pages (fragmentation indicator).
PRAGMA freelist_count;
-- 8. Index usage stats (if sqlite_stat1 exists).
SELECT * FROM sqlite_stat1;
SQL
Record the output — it is the baseline for comparison.
Read the DDL source and the live schema, then check each item:
PRAGMA foreign_key_check;.user_version matches the latest migration number.sqlite3 <db_path> <<'SQL'
-- Dump live schema for comparison with source code.
.schema
-- Check FK integrity.
PRAGMA foreign_key_check;
SQL
For every query in crates/scanner-scheduler/src/store.rs and related modules, run
EXPLAIN QUERY PLAN and verify the planner uses the expected index.
SCAN) on a table with >1000 rows.USE TEMP B-TREE FOR ORDER BY) that could be
eliminated by a covering index.sqlite3 <db_path> <<'SQL'
-- Example: check the list_findings query plan.
EXPLAIN QUERY PLAN
SELECT o.occurrence_id, o.object_path, o.start_byte, o.end_byte,
r.rule_name, s.secret_hash
FROM observations obs
JOIN occurrences o ON o.occ_pk = obs.occ_pk
JOIN rules r ON r.rule_pk = o.rule_pk
JOIN secrets s ON s.secret_pk = o.secret_pk
WHERE obs.run_pk = 1
ORDER BY o.object_path, o.start_byte;
-- Example: check the diff_runs NOT EXISTS subquery.
EXPLAIN QUERY PLAN
SELECT o.occurrence_id
FROM observations obs1
JOIN occurrences o ON o.occ_pk = obs1.occ_pk
WHERE obs1.run_pk = 1
AND NOT EXISTS (
SELECT 1 FROM observations obs2
WHERE obs2.run_pk = 2 AND obs2.occ_pk = obs1.occ_pk
);
-- Check if any indexes are unused (requires ANALYZE first).
ANALYZE;
-- Then inspect sqlite_stat1 for indexes with NULL stat values.
SELECT * FROM sqlite_stat1 ORDER BY tbl, idx;
SQL
A covering index lets SQLite satisfy a query entirely from the index without touching the main table. Look for queries that:
sqlite3 <db_path> <<'SQL'
-- Does the observations PK cover the diff query? (It should — WITHOUT ROWID.)
EXPLAIN QUERY PLAN
SELECT 1 FROM observations WHERE run_pk = 2 AND occ_pk = 42;
SQL
For each query identified in Phase 3 as suboptimal:
.timer on on a representative dataset.sqlite3 <db_path> <<'SQL'
.timer on
-- Before: measure baseline.
SELECT COUNT(*) FROM occurrences WHERE path_pk = 42;
-- After: with proposed composite index.
CREATE INDEX IF NOT EXISTS idx_occ_path_rule ON occurrences(path_pk, rule_pk);
SELECT COUNT(*) FROM occurrences WHERE path_pk = 42;
-- Compare times, then drop the test index if not keeping.
DROP INDEX IF EXISTS idx_occ_path_rule;
SQL
| Pattern | What to verify |
|---|---|
INSERT OR IGNORE + SELECT (surrogate key lookup) | The UNIQUE index on the lookup column is hit, not a scan |
BEGIN IMMEDIATE batching | Batch sizes are reasonable (100–1000 rows); not one-row-at-a-time inside a loop |
ON CONFLICT DO UPDATE (triage upserts) | PRIMARY KEY covers the conflict target; no redundant index |
hex(run_id) LIKE 'PREFIX%' (prefix match) | Cannot use index on BLOB column — flag if table grows large |
JOIN chains (observations → occurrences → rules/secrets) | Join order matches index availability; no Cartesian products |
NOT EXISTS subqueries (diff_runs) | Subquery uses index seek, not a scan per outer row |
ORDER BY clauses | Satisfied by index or, if not, the sort cost is acceptable for expected row counts |
Review the pragma settings in schema.rs and triage/schema.rs against the
workload characteristics.
| Pragma | Current | Check |
|---|---|---|
journal_mode | WAL | Correct for concurrent readers + single writer |
synchronous | NORMAL | Acceptable for WAL; verify durability requirements |
foreign_keys | ON | Must stay ON; verify not accidentally toggled |
busy_timeout | 5000 ms | Adequate? Check if any SQLITE_BUSY errors in logs |
cache_size | -64000 (64 MB) | Proportional to working set? Check with PRAGMA cache_spill |
page_size | (default 4096) | Match filesystem block size; 4096 is usually correct |
mmap_size | (not set) | Consider setting for read-heavy workloads; measure with .timer on |
temp_store | (not set) | MEMORY can help if temp tables/sorts are frequent |
wal_autocheckpoint | (default 1000) | May need tuning for write-heavy bursts; check WAL file size |
sqlite3 <db_path> <<'SQL'
-- Check WAL file size (indicator of checkpoint pressure).
PRAGMA wal_checkpoint(PASSIVE);
-- Returns: busy, log_pages, checkpointed_pages
-- Check if mmap would help (read-heavy).
PRAGMA mmap_size = 268435456; -- 256 MB; measure query times before/after.
-- Check page fragmentation.
PRAGMA freelist_count;
-- If freelist is >10% of page_count, consider VACUUM.
SQL
Examine the writer module for correctness and performance:
BEGIN IMMEDIATE ... COMMIT.
No implicit autocommit for multi-row inserts.conn.prepare_cached() usage.If the database is expected to handle concurrent access:
# Run the concurrent writer stress test.
cargo test --lib writer -- concurrent --nocapture
# Run the integration suite to verify no SQLITE_BUSY failures.
cargo test --test integration sqlite_persistence -- --nocapture
Check for:
## SQLite Review: [database name]
### Baseline Metrics
| Metric | Value |
|--------|-------|
| DB size | X MB |
| Page count | N |
| Freelist pages | N (X%) |
| WAL size | N pages |
| Row counts | roots: N, paths: N, ... |
### Schema Findings
| Severity | Table.Column | Issue | Recommendation |
|----------|-------------|-------|----------------|
| WARN | secrets.status | Missing CHECK constraint | Add CHECK (status IN (0,1,2,3)) |
| INFO | run_rules | Already WITHOUT ROWID | No action |
### Query Plan Analysis
| Query | Location | Plan | Issue | Fix |
|-------|----------|------|-------|-----|
| list_findings | query.rs:133 | SCAN occurrences | Missing composite index | Add idx_occ_root_path(root_pk, path_pk) |
| diff_runs | query.rs:215 | SEARCH observations USING PK | Optimal | None |
### Index Recommendations
| Action | Index | Rationale | Evidence |
|--------|-------|-----------|----------|
| ADD | idx_occ_root_path ON occurrences(root_pk, path_pk) | Covers list_findings filter+join | EXPLAIN shows SCAN → SEARCH |
| DROP | idx_foo | Never used in any query | sqlite_stat1 shows 0 lookups |
### PRAGMA Recommendations
| Pragma | Current | Proposed | Rationale | Evidence |
|--------|---------|----------|-----------|----------|
| mmap_size | 0 | 268435456 | Read queries 15% faster | .timer on before/after |
### Write Path Findings
| Severity | Location | Issue | Recommendation |
|----------|----------|-------|----------------|
| WARN | writer.rs:361 | Prepare per-call, not cached | Use prepare_cached() |
### Summary
- **Critical**: N issues requiring immediate action
- **Warnings**: N issues to address before next release
- **Info**: N observations, no action needed
- **Estimated size impact**: ±X MB from index changes
hex(col) LIKE 'PREFIX%' cannot use
indexes. Flag and suggest alternatives (prefix table, stored hex column)./performance-analyzer — Profile Rust code around DB operations/test-strategy — Choose between unit, property, and fuzz tests for DB layer/security-reviewer — Audit SQL injection risks in dynamic query construction