// Expert guidance for designing, optimizing, and maintaining database schemas for SQL and NoSQL systems. Use when creating new databases, optimizing existing schemas, planning migrations, implementing security policies, or ensuring GDPR compliance. Covers normalization, indexing, data types, relationships, performance optimization, and audit logging.
| name | database-schema-architect |
| description | Expert guidance for designing, optimizing, and maintaining database schemas for SQL and NoSQL systems. Use when creating new databases, optimizing existing schemas, planning migrations, implementing security policies, or ensuring GDPR compliance. Covers normalization, indexing, data types, relationships, performance optimization, and audit logging. |
| license | Apache-2.0 |
This skill provides comprehensive guidance for database schema design, from initial planning through production deployment and ongoing maintenance. It covers both relational (SQL) and NoSQL databases with focus on scalability, performance, security, and compliance.
Before designing any schema:
Tables: Plural nouns in lowercase
customers, orders, order_itemsCustomer, order, OrderItemColumns: Singular nouns, descriptive
customer_id, email_address, created_atcustId, e_mail, datePrimary Keys: table_name_id format
customer_id, order_idForeign Keys: Reference the table they point to
customer_id, product_idIndexes: Prefix with idx_ followed by table and columns
idx_orders_customer_id, idx_products_category_statusConstraints: Descriptive of their purpose
fk_orders_customer_id, ck_price_positive, uq_users_emailStart with Third Normal Form (3NF):
When to Denormalize:
Use scripts/normalization_checker.py to validate normalization level
Example - E-commerce Schema:
-- Normalized approach
CREATE TABLE customers (
customer_id SERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
first_name VARCHAR(100) NOT NULL,
last_name VARCHAR(100) NOT NULL,
phone VARCHAR(20),
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT uq_customers_email UNIQUE (email)
);
CREATE TABLE products (
product_id SERIAL PRIMARY KEY,
sku VARCHAR(50) UNIQUE NOT NULL,
name VARCHAR(200) NOT NULL,
description TEXT,
price DECIMAL(10, 2) NOT NULL,
inventory_count INTEGER NOT NULL DEFAULT 0,
category_id INTEGER NOT NULL,
is_active BOOLEAN NOT NULL DEFAULT TRUE,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT fk_products_category FOREIGN KEY (category_id)
REFERENCES categories(category_id),
CONSTRAINT ck_products_price_positive CHECK (price >= 0),
CONSTRAINT ck_products_inventory_non_negative CHECK (inventory_count >= 0)
);
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
customer_id INTEGER NOT NULL,
order_date TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
status VARCHAR(20) NOT NULL DEFAULT 'pending',
subtotal DECIMAL(10, 2) NOT NULL,
tax_amount DECIMAL(10, 2) NOT NULL DEFAULT 0,
total_amount DECIMAL(10, 2) NOT NULL,
CONSTRAINT fk_orders_customer FOREIGN KEY (customer_id)
REFERENCES customers(customer_id),
CONSTRAINT ck_orders_amounts_positive CHECK (
subtotal >= 0 AND
tax_amount >= 0 AND
total_amount >= 0
)
);
CREATE TABLE order_items (
order_item_id SERIAL PRIMARY KEY,
order_id INTEGER NOT NULL,
product_id INTEGER NOT NULL,
quantity INTEGER NOT NULL,
price_at_purchase DECIMAL(10, 2) NOT NULL,
CONSTRAINT fk_order_items_order FOREIGN KEY (order_id)
REFERENCES orders(order_id) ON DELETE CASCADE,
CONSTRAINT fk_order_items_product FOREIGN KEY (product_id)
REFERENCES products(product_id),
CONSTRAINT ck_order_items_quantity_positive CHECK (quantity > 0),
CONSTRAINT ck_order_items_price_positive CHECK (price_at_purchase >= 0)
);
Use scripts/datatype_optimizer.py to analyze and recommend optimal types
Key Principles:
Common Patterns:
See references/DATA_TYPES_REFERENCE.md for comprehensive guide
Critical Rule: Index foreign keys ALWAYS Secondary Rule: Index columns used in WHERE, JOIN, ORDER BY
Types of Indexes:
Create indexes for:
-- Foreign keys (CRITICAL)
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
CREATE INDEX idx_order_items_order_id ON order_items(order_id);
CREATE INDEX idx_order_items_product_id ON order_items(product_id);
-- Frequent WHERE clauses
CREATE INDEX idx_products_category_id ON products(category_id);
CREATE INDEX idx_products_is_active ON products(is_active) WHERE is_active = TRUE; -- Partial index
-- Composite indexes for common query patterns
CREATE INDEX idx_orders_customer_status ON orders(customer_id, status);
CREATE INDEX idx_products_category_active ON products(category_id, is_active) WHERE is_active = TRUE;
-- Covering index for specific query
CREATE INDEX idx_products_search ON products(category_id, is_active)
INCLUDE (name, price); -- PostgreSQL 11+
Index Trade-offs:
Use scripts/index_analyzer.py to analyze existing queries and suggest optimal indexes
Always Define:
Example Constraints:
-- Price validation
CONSTRAINT ck_price_positive CHECK (price >= 0)
-- Status validation (enum-like behavior)
CONSTRAINT ck_order_status CHECK (status IN ('pending', 'processing', 'shipped', 'delivered', 'cancelled'))
-- Date logic validation
CONSTRAINT ck_dates_logical CHECK (end_date >= start_date)
-- Conditional constraints
CONSTRAINT ck_discount_valid CHECK (
(discount_type = 'percentage' AND discount_value BETWEEN 0 AND 100) OR
(discount_type = 'fixed' AND discount_value >= 0)
)
Enable for multi-tenant applications:
-- Enable RLS on table
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;
-- Create policy for users to see only their orders
CREATE POLICY orders_user_policy ON orders
FOR SELECT
USING (customer_id = current_setting('app.current_user_id')::INTEGER);
-- Policy for admins to see all
CREATE POLICY orders_admin_policy ON orders
FOR ALL
USING (current_setting('app.user_role') = 'admin');
See references/SECURITY_BEST_PRACTICES.md for comprehensive RLS patterns
Standard Audit Log Table:
CREATE TABLE audit_logs (
audit_id BIGSERIAL PRIMARY KEY,
table_name VARCHAR(100) NOT NULL,
record_id VARCHAR(100) NOT NULL,
operation VARCHAR(10) NOT NULL, -- INSERT, UPDATE, DELETE
user_id INTEGER NOT NULL,
changed_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
old_values JSONB,
new_values JSONB,
ip_address INET,
user_agent TEXT
);
CREATE INDEX idx_audit_logs_table_record ON audit_logs(table_name, record_id);
CREATE INDEX idx_audit_logs_user_id ON audit_logs(user_id);
CREATE INDEX idx_audit_logs_changed_at ON audit_logs(changed_at);
Implement via triggers or application-level logging See assets/audit_log_setup.sql for trigger-based implementation
-- Add deleted_at column for soft deletes
ALTER TABLE customers ADD COLUMN deleted_at TIMESTAMP;
-- Create retention policy (example: 7 years)
CREATE INDEX idx_customers_deleted_at ON customers(deleted_at) WHERE deleted_at IS NOT NULL;
-- Anonymization function
CREATE OR REPLACE FUNCTION anonymize_customer(cust_id INTEGER)
RETURNS VOID AS $$
BEGIN
UPDATE customers
SET
email = 'deleted_' || customer_id || '@anonymized.com',
first_name = 'DELETED',
last_name = 'DELETED',
phone = NULL,
deleted_at = CURRENT_TIMESTAMP
WHERE customer_id = cust_id;
END;
$$ LANGUAGE plpgsql;
CREATE TABLE user_consents (
consent_id SERIAL PRIMARY KEY,
user_id INTEGER NOT NULL REFERENCES users(user_id),
consent_type VARCHAR(50) NOT NULL, -- marketing, analytics, etc.
consented BOOLEAN NOT NULL,
consented_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
ip_address INET,
CONSTRAINT uq_user_consent UNIQUE (user_id, consent_type)
);
See references/GDPR_COMPLIANCE.md for complete patterns
Migration Script Template (see assets/migration_template.sql):
-- Migration: [DESCRIPTION]
-- Date: YYYY-MM-DD
-- Author: [NAME]
BEGIN;
-- Forward migration
-- Add your DDL statements here
-- Verify data integrity
DO $$
BEGIN
-- Add verification queries
IF NOT EXISTS (SELECT 1 FROM information_schema.tables WHERE table_name = 'new_table') THEN
RAISE EXCEPTION 'Migration verification failed';
END IF;
END $$;
COMMIT;
Rollback Script Template:
-- Rollback for: [MIGRATION_NAME]
BEGIN;
-- Reverse operations in reverse order
-- DROP statements, ALTER TABLE drops, etc.
COMMIT;
Adding Columns (Safe):
-- Add column with default (can lock table on large tables)
ALTER TABLE orders ADD COLUMN tracking_number VARCHAR(100);
-- For large tables, add without default first
ALTER TABLE orders ADD COLUMN tracking_number VARCHAR(100);
-- Then update in batches
UPDATE orders SET tracking_number = generate_tracking_number() WHERE tracking_number IS NULL;
Renaming Columns:
-- Rename column
ALTER TABLE customers RENAME COLUMN phone TO phone_number;
-- Update dependent views and functions
Changing Data Types:
-- Safe: Expanding VARCHAR
ALTER TABLE customers ALTER COLUMN phone TYPE VARCHAR(30);
-- Risky: Reducing size (requires validation)
-- First check if data fits
SELECT MAX(LENGTH(phone)) FROM customers; -- Must be โค new size
-- Then alter
ALTER TABLE customers ALTER COLUMN phone TYPE VARCHAR(15);
Adding Indexes (Non-Blocking in PostgreSQL):
-- Use CONCURRENTLY to avoid locking
CREATE INDEX CONCURRENTLY idx_orders_created_at ON orders(created_at);
Embed vs Reference:
// Embed when data is accessed together
{
_id: ObjectId("..."),
customer: {
name: "John Doe",
email: "john@example.com"
},
items: [
{ product: "Widget", quantity: 2, price: 19.99 },
{ product: "Gadget", quantity: 1, price: 29.99 }
],
total: 69.97
}
// Reference when data is large or frequently updated independently
{
_id: ObjectId("..."),
customer_id: ObjectId("..."), // Reference to customers collection
product_ids: [ObjectId("..."), ObjectId("...")], // References
status: "shipped"
}
Indexing in MongoDB:
// Single field index
db.customers.createIndex({ email: 1 });
// Compound index
db.orders.createIndex({ customer_id: 1, status: 1, created_at: -1 });
// Text index for full-text search
db.products.createIndex({ name: "text", description: "text" });
// Unique index
db.users.createIndex({ email: 1 }, { unique: true });
Use scripts/query_analyzer.py to automate analysis
Avoid N+1 Queries:
-- Bad: Separate queries in loop
SELECT * FROM customers WHERE customer_id = ?; -- Called N times
-- Good: Single query with JOIN
SELECT c.*, o.order_id, o.total_amount
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id;
Use Appropriate JOIN Types:
Pagination:
-- Basic pagination (works well for small offsets)
SELECT * FROM products ORDER BY product_id LIMIT 20 OFFSET 40;
-- Keyset pagination (better for large offsets)
SELECT * FROM products
WHERE product_id > 1000
ORDER BY product_id
LIMIT 20;
Partial Indexes for Boolean Columns:
-- Index only TRUE values (if most are FALSE)
CREATE INDEX idx_products_active ON products(is_active) WHERE is_active = TRUE;
Use scripts/schema_validator.py to automate validation
This skill includes several helper scripts in the scripts/ directory:
Usage: Run scripts with python scripts/[script_name].py [arguments]
For detailed information, see:
User Request: "Design a database schema for a SaaS project management application"
Response:
This skill guides you through each step with specific examples and validation tools.