| name | database-schema-designer |
| description | Use this skill when designing database schemas for relational (SQL) or document (NoSQL) databases. Provides normalization guidelines, indexing strategies, migration patterns, and performance optimization techniques. Ensures scalable, maintainable, and performant data models. |
| version | 1.0.0 |
| author | AI Agent Hub |
| tags | ["database","schema-design","sql","nosql","performance","migrations"] |
Database Schema Designer
Overview
This skill provides comprehensive guidance for designing robust, scalable database schemas for both SQL and NoSQL databases. Whether building from scratch or evolving existing schemas, this framework ensures data integrity, performance, and maintainability.
When to use this skill:
- Designing new database schemas
- Refactoring or migrating existing schemas
- Optimizing database performance
- Choosing between SQL and NoSQL approaches
- Creating database migrations
- Establishing indexing strategies
- Modeling complex relationships
- Planning data archival and partitioning
Database Design Philosophy
Core Principles
1. Model the Domain, Not the UI
- Schema reflects business entities and relationships
- Don't let UI requirements drive data structure
- Separate presentation concerns from data model
2. Optimize for Reads or Writes (Not Both)
- OLTP (transactional): Normalized, optimized for writes
- OLAP (analytical): Denormalized, optimized for reads
- Choose based on access patterns
3. Plan for Scale From Day One
- Indexing strategy
- Partitioning approach
- Caching layer
- Read replicas
4. Data Integrity Over Performance
- Use constraints, foreign keys, validation
- Performance issues can be optimized later
- Data corruption is costly to fix
SQL Database Design
Normalization
Database normalization reduces redundancy and ensures data integrity.
1st Normal Form (1NF)
Rule: Each column contains atomic (indivisible) values, no repeating groups.
CREATE TABLE orders (
id INT PRIMARY KEY,
customer_id INT,
product_ids VARCHAR(255)
);
CREATE TABLE orders (
id INT PRIMARY KEY,
customer_id INT
);
CREATE TABLE order_items (
id INT PRIMARY KEY,
order_id INT,
product_id INT,
FOREIGN KEY (order_id) REFERENCES orders(id)
);
2nd Normal Form (2NF)
Rule: Must be in 1NF + all non-key columns depend on the entire primary key.
CREATE TABLE order_items (
order_id INT,
product_id INT,
customer_id INT,
customer_name VARCHAR(100),
quantity INT,
PRIMARY KEY (order_id, product_id)
);
CREATE TABLE orders (
id INT PRIMARY KEY,
customer_id INT,
FOREIGN KEY (customer_id) REFERENCES customers(id)
);
CREATE TABLE order_items (
order_id INT,
product_id INT,
quantity INT,
PRIMARY KEY (order_id, product_id)
);
CREATE TABLE customers (
id INT PRIMARY KEY,
name VARCHAR(100)
);
3rd Normal Form (3NF)
Rule: Must be in 2NF + no transitive dependencies (non-key columns depend only on primary key).
CREATE TABLE customers (
id INT PRIMARY KEY,
name VARCHAR(100),
postal_code VARCHAR(10),
country VARCHAR(50)
);
CREATE TABLE customers (
id INT PRIMARY KEY,
name VARCHAR(100),
postal_code VARCHAR(10),
FOREIGN KEY (postal_code) REFERENCES postal_codes(code)
);
CREATE TABLE postal_codes (
code VARCHAR(10) PRIMARY KEY,
country VARCHAR(50)
);
Denormalization (When to Break Rules)
Sometimes denormalization improves performance for read-heavy applications.
CREATE TABLE orders (
id INT PRIMARY KEY,
customer_id INT,
total_amount DECIMAL(10, 2),
item_count INT,
created_at TIMESTAMP
);
When to denormalize:
- Read-heavy applications (reporting, analytics)
- Frequently joined tables causing performance issues
- Pre-calculated aggregates (counts, sums, averages)
- Caching derived data to avoid complex joins
Data Types
Choose appropriate data types for efficiency and accuracy.
String Types
CHAR(10)
CHAR(2)
VARCHAR(255)
TEXT
email VARCHAR(255)
phone_number VARCHAR(20)
postal_code VARCHAR(10)
email VARCHAR(500)
description VARCHAR(50)
Numeric Types
TINYINT
SMALLINT
INT
BIGINT
DECIMAL(10, 2)
FLOAT
DOUBLE
CREATE TABLE products (
id INT PRIMARY KEY,
price DECIMAL(10, 2)
);
price FLOAT
Date/Time Types
DATE
TIME
DATETIME
TIMESTAMP
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
Boolean
is_active BOOLEAN DEFAULT TRUE
is_active TINYINT(1) DEFAULT 1
Indexing Strategies
Indexes speed up reads but slow down writes. Use strategically.
When to Create Indexes
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_created_at ON orders(created_at);
CREATE INDEX idx_orders_customer_status ON orders(customer_id, status);
Index Types
B-Tree Index (Default)
CREATE INDEX idx_products_price ON products(price);
SELECT * FROM products WHERE price > 100;
SELECT * FROM products WHERE price BETWEEN 50 AND 150;
Hash Index
CREATE INDEX idx_users_email USING HASH ON users(email);
SELECT * FROM users WHERE email = 'user@example.com';
Full-Text Index
CREATE FULLTEXT INDEX idx_articles_content ON articles(title, content);
SELECT * FROM articles WHERE MATCH(title, content) AGAINST('database design');
Partial Index (PostgreSQL)
CREATE INDEX idx_active_users ON users(email) WHERE is_active = TRUE;
Composite Indexes (Column Order Matters)
CREATE INDEX idx_orders_customer_status ON orders(customer_id, status);
SELECT * FROM orders WHERE customer_id = 123 AND status = 'pending';
SELECT * FROM orders WHERE customer_id = 123;
SELECT * FROM orders WHERE status = 'pending';
Rule of Thumb: Put most selective column first, or most frequently queried alone.
Constraints
Use constraints to enforce data integrity at the database level.
Primary Key
CREATE TABLE users (
id INT AUTO_INCREMENT PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL
);
CREATE TABLE users (
id CHAR(36) PRIMARY KEY DEFAULT (UUID()),
email VARCHAR(255) UNIQUE NOT NULL
);
Foreign Key
CREATE TABLE orders (
id INT PRIMARY KEY,
customer_id INT NOT NULL,
FOREIGN KEY (customer_id) REFERENCES customers(id)
ON DELETE CASCADE
ON UPDATE CASCADE
);
ON DELETE RESTRICT
ON DELETE SET NULL
ON DELETE NO ACTION
Unique Constraint
CREATE TABLE users (
id INT PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
username VARCHAR(50) UNIQUE NOT NULL
);
CREATE TABLE enrollments (
student_id INT,
course_id INT,
UNIQUE (student_id, course_id)
);
Check Constraint
CREATE TABLE products (
id INT PRIMARY KEY,
price DECIMAL(10, 2) CHECK (price >= 0),
stock INT CHECK (stock >= 0),
discount_percent INT CHECK (discount_percent BETWEEN 0 AND 100)
);
Not Null Constraint
CREATE TABLE users (
id INT PRIMARY KEY,
email VARCHAR(255) NOT NULL,
name VARCHAR(100) NOT NULL,
bio TEXT
);
Common Schema Patterns
One-to-Many (Orders → Order Items)
CREATE TABLE orders (
id INT PRIMARY KEY,
customer_id INT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE order_items (
id INT PRIMARY KEY,
order_id INT NOT NULL,
product_id INT NOT NULL,
quantity INT NOT NULL,
price DECIMAL(10, 2) NOT NULL,
FOREIGN KEY (order_id) REFERENCES orders(id) ON DELETE CASCADE
);
Many-to-Many (Students ↔ Courses)
CREATE TABLE students (
id INT PRIMARY KEY,
name VARCHAR(100) NOT NULL
);
CREATE TABLE courses (
id INT PRIMARY KEY,
title VARCHAR(200) NOT NULL
);
CREATE TABLE enrollments (
student_id INT,
course_id INT,
enrolled_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
grade VARCHAR(2),
PRIMARY KEY (student_id, course_id),
FOREIGN KEY (student_id) REFERENCES students(id) ON DELETE CASCADE,
FOREIGN KEY (course_id) REFERENCES courses(id) ON DELETE CASCADE
);
Self-Referencing (Employees → Manager)
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(100) NOT NULL,
manager_id INT,
FOREIGN KEY (manager_id) REFERENCES employees(id)
);
Polymorphic Relationships (Comments on Posts/Photos)
CREATE TABLE comments (
id INT PRIMARY KEY,
content TEXT NOT NULL,
post_id INT,
photo_id INT,
CHECK (
(post_id IS NOT NULL AND photo_id IS NULL) OR
(post_id IS NULL AND photo_id IS NOT NULL)
),
FOREIGN KEY (post_id) REFERENCES posts(id) ON DELETE CASCADE,
FOREIGN KEY (photo_id) REFERENCES photos(id) ON DELETE CASCADE
);
CREATE TABLE comments (
id INT PRIMARY KEY,
content TEXT NOT NULL,
commentable_type VARCHAR(50) NOT NULL,
commentable_id INT NOT NULL
);
NoSQL Database Design
Document Databases (MongoDB)
When to use:
- Schema flexibility needed
- Rapid iteration
- Hierarchical data
- Read-heavy workloads
Embedding vs Referencing
Embedding (Denormalization)
{
"_id": "order_123",
"customer": {
"id": "cust_456",
"name": "Jane Smith",
"email": "jane@example.com"
},
"items": [
{ "product_id": "prod_789", "quantity": 2, "price": 29.99 },
{ "product_id": "prod_101", "quantity": 1, "price": 49.99 }
],
"total": 109.97,
"created_at": "December 2025T10:30:00Z"
}
When to embed:
- Data accessed together frequently
- 1:few relationships (few items)
- Child documents don't need independent existence
Referencing (Normalization)
{
"_id": "order_123",
"customer_id": "cust_456",
"item_ids": ["item_1", "item_2"],
"total": 109.97,
"created_at": "December 2025T10:30:00Z"
}
When to reference:
- Data accessed independently
- 1:many relationships (many items)
- Large documents (approaching 16MB limit)
- Frequently updated data
Indexing in MongoDB
db.users.createIndex({ email: 1 }, { unique: true });
db.orders.createIndex({ customer_id: 1, created_at: -1 });
db.articles.createIndex({ title: "text", content: "text" });
db.stores.createIndex({ location: "2dsphere" });
Database Migrations
Migration Best Practices
1. Always Reversible
ALTER TABLE users ADD COLUMN phone VARCHAR(20);
ALTER TABLE users DROP COLUMN phone;
2. Backward Compatible
ALTER TABLE users ADD COLUMN middle_name VARCHAR(50);
ALTER TABLE users ADD COLUMN middle_name VARCHAR(50) NOT NULL;
ALTER TABLE users ADD COLUMN middle_name VARCHAR(50);
UPDATE users SET middle_name = '' WHERE middle_name IS NULL;
ALTER TABLE users MODIFY COLUMN middle_name VARCHAR(50) NOT NULL;
3. Data Migrations Separate from Schema Changes
ALTER TABLE orders ADD COLUMN status VARCHAR(20) DEFAULT 'pending';
UPDATE orders SET status = 'completed' WHERE completed_at IS NOT NULL;
4. Test Migrations on Production Copy
- Test on staging with production data snapshot
- Measure migration duration
- Plan for downtime (if needed)
Zero-Downtime Migrations
Adding a Column:
ALTER TABLE users ADD COLUMN phone VARCHAR(20);
UPDATE users SET phone = old_phone WHERE phone IS NULL;
ALTER TABLE users MODIFY COLUMN phone VARCHAR(20) NOT NULL;
Renaming a Column:
ALTER TABLE users ADD COLUMN email_address VARCHAR(255);
UPDATE users SET email_address = email;
ALTER TABLE users DROP COLUMN email;
Performance Optimization
Query Optimization
Use EXPLAIN to analyze queries:
EXPLAIN SELECT * FROM orders WHERE customer_id = 123 AND status = 'pending';
Look for:
- Type: ALL (table scan - bad), index, ref, eq_ref
- Possible keys: Indexes available
- Key: Index actually used
- Rows: Estimated rows scanned
Optimization techniques:
- Add indexes on WHERE, ORDER BY, GROUP BY columns
- Avoid SELECT * (fetch only needed columns)
- Use LIMIT for pagination
- Denormalize for read-heavy queries
N+1 Query Problem
orders = db.query("SELECT * FROM orders")
for order in orders:
customer = db.query(f"SELECT * FROM customers WHERE id = {order.customer_id}")
print(f"{customer.name} ordered {order.total}")
results = db.query("""
SELECT orders.*, customers.name
FROM orders
JOIN customers ON orders.customer_id = customers.id
""")
for result in results:
print(f"{result.name} ordered {result.total}")
Integration with Agents
Backend System Architect
- Uses this skill when designing data models
- Applies normalization and indexing strategies
- Plans for scalability and performance
Code Quality Reviewer
- Validates schema design follows best practices
- Checks for missing indexes and constraints
- Reviews migration safety
AI/ML Engineer
- Uses denormalization patterns for analytics
- Designs data pipelines and aggregation tables
Quick Start Checklist
When designing a new schema:
Skill Version: 1.0.0
Last Updated: December 2025
Maintained by: AI Agent Hub Team