ワンクリックで
database-design
Design effective database schemas — normalization, indexing strategies, relationships, migrations, query optimization, and data modeling patterns for SQL and NoSQL.
メニュー
Design effective database schemas — normalization, indexing strategies, relationships, migrations, query optimization, and data modeling patterns for SQL and NoSQL.
Split any task into small, reviewable subtasks — atomic units of work that are easy to track, easy to review, and hard to get wrong.
Use the gh CLI to fetch issues, discussions, and linked issues — analyze the problem, find relevant code, and implement a fix with a PR.
Write and manage Architecture Decision Records — ADR format, status lifecycle, tradeoff analysis, templates, and integration with project workflow.
Write clear completion criteria before starting work — acceptance criteria formats, quality gates, scope boundaries, and the done checklist pattern.
Design and manage deployments — CI/CD pipelines, Docker, zero-downtime strategies, rollback plans, environment management, and release automation.
Systematically recover from failures — isolate, diagnose, fix, verify, prevent. The three-attempts rule, evidence gathering before guessing, and when to roll back vs push forward.
| name | Database Design |
| description | Design effective database schemas — normalization, indexing strategies, relationships, migrations, query optimization, and data modeling patterns for SQL and NoSQL. |
Design effective database schemas — normalization, indexing strategies, relationships, migrations, query optimization, and data modeling patterns for SQL and NoSQL.
Use this skill when designing database schemas, planning migrations, optimizing queries, or choosing between SQL and NoSQL for a use case.
| Form | Rule | Example violation |
|---|---|---|
| 1NF | Each cell contains a single atomic value | A column with comma-separated tags |
| 2NF | Every non-key column depends on the whole primary key | order_id, product_id, customer_name — customer depends only on order, not product |
| 3NF | Every non-key column depends only on the primary key | product_id, category_name, category_description — description depends on category, not product |
When to denormalize:
-- One-to-many (most common)
CREATE TABLE orders (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users(id),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE order_items (
id BIGSERIAL PRIMARY KEY,
order_id BIGINT NOT NULL REFERENCES orders(id),
product_id BIGINT NOT NULL REFERENCES products(id),
quantity INT NOT NULL
);
CREATE INDEX idx_order_items_order ON order_items(order_id);
-- Many-to-many
CREATE TABLE products_tags (
product_id BIGINT NOT NULL REFERENCES products(id),
tag_id BIGINT NOT NULL REFERENCES tags(id),
PRIMARY KEY (product_id, tag_id)
);
-- Polymorphic (use with caution)
CREATE TABLE comments (
id BIGSERIAL PRIMARY KEY,
author_id BIGINT NOT NULL,
commentable_type VARCHAR(50) NOT NULL, -- 'post' or 'video'
commentable_id BIGINT NOT NULL,
body TEXT NOT NULL
);
CREATE INDEX idx_comments_polymorphic ON comments(commentable_type, commentable_id);
| Index type | Best for | PostgreSQL syntax |
|---|---|---|
| B-tree (default) | Equality and range queries, sorting | CREATE INDEX ON users(email) |
| Hash | Equality only | CREATE INDEX ON users USING hash(email) |
| GiST | Full-text search, geometric data | CREATE INDEX ON documents USING gist(content) |
| GIN | Array/JSON containment checks | CREATE INDEX ON products USING gin(tags) |
| BRIN | Very large, naturally ordered tables | CREATE INDEX ON logs USING brin(created_at) |
| Composite | Multi-column queries | CREATE INDEX ON orders(user_id, created_at) |
-- Most selective column first (for composite indexes)
CREATE INDEX ON orders(user_id, status, created_at DESC);
-- Query: WHERE user_id = 123 AND status = 'active' ORDER BY created_at DESC
-- Covering index (includes data, avoids table lookup)
CREATE INDEX ON products(category_id) INCLUDE (name, price);
-- Partial index (only index relevant rows)
CREATE INDEX ON orders(created_at) WHERE status = 'pending';
-- Unique index (enforces uniqueness)
CREATE UNIQUE INDEX ON users(email) WHERE deleted_at IS NULL;
EXPLAIN (ANALYZE, BUFFERS, TIMING)
SELECT u.name, COUNT(o.id)
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id;
Look for:
-- Instead of:
SELECT * FROM orders WHERE EXTRACT(YEAR FROM created_at) = 2024;
-- Use range query (sargable — uses index):
SELECT * FROM orders
WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';
-- Instead of:
SELECT * FROM products WHERE CONCAT(name, ' ', description) LIKE '%search%';
-- Use full-text search:
CREATE INDEX ON products USING gin(to_tsvector('english', name || ' ' || description));
SELECT * FROM products
WHERE to_tsvector('english', name || ' ' || description) @@ to_tsquery('english', 'search');
Problem:
for each user in users:
orders = db.query("SELECT * FROM orders WHERE user_id = ?", user.id)
→ N+1 queries
Fix:
users = db.query("SELECT * FROM users")
orders = db.query("SELECT * FROM orders WHERE user_id = ANY(?)", [u.id for u in users])
-- 1. Additive changes first (add columns/tables before using them)
ALTER TABLE users ADD COLUMN display_name VARCHAR(255);
-- 2. Backfill data in a separate step
UPDATE users SET display_name = name WHERE display_name IS NULL;
-- 3. Start using new column in application code
-- 4. Remove old column (much later, after verifying no code uses it)
ALTER TABLE users DROP COLUMN name;
| Change type | Safe approach |
|---|---|
| Add column | Safe if nullable or has default |
| Add NOT NULL column | Add nullable → backfill → add NOT NULL constraint |
| Rename column | Add new column → dual-write → backfill → switch reads → drop old |
| Rename table | Create view with old name → migrate code → drop view |
| Add index | CREATE INDEX CONCURRENTLY (PostgreSQL) — doesn't lock table |
| Drop index | DROP INDEX CONCURRENTLY (PostgreSQL) |
| Add foreign key | ADD CONSTRAINT ... NOT VALID → VALIDATE CONSTRAINT |
-- Prefer these:
TIMESTAMPTZ -- Always store with timezone
UUID -- For distributed IDs, or
BIGSERIAL -- For auto-increment integers
TEXT -- Over VARCHAR(n) unless you need length checks
NUMERIC(10,2) -- For money (avoid FLOAT)
JSONB -- For unstructured data (over JSON)
BOOLEAN -- For flags
ENUM -- For fixed sets of values (or reference table)
| Use case | Document store (MongoDB) | Key-value (Redis) | Search (Elastic) |
|---|---|---|---|
| Flexible schema | ✅ Yes | ❌ | ❌ |
| Complex queries | ⚠️ Limited | ❌ | ✅ Full-text |
| Joins | ❌ Embed or app-level | ❌ | ❌ Denormalize |
| Transactions | ⚠️ Limited (≥4.0) | ❌ | ❌ |
| High-volume writes | ✅ | ✅ | ⚠️ |
| Caching | ❌ | ✅ Primary use | ❌ |
-- Tables: plural snake_case
users, order_items, product_categories
-- Columns: singular snake_case
id, user_id, created_at
-- Primary key: always `id BIGSERIAL PRIMARY KEY`
-- Foreign key: `referenced_table_singular_id`
-- Junction tables: `table1_table2` in alphabetical order
-- Indexes: `idx_table_column` or `idx_table_col1_col2`
-- Unique constraints: `uq_table_column`
uuid_generate_v7() in PostgreSQL.WHERE user_id = '123' may not use an index if the column is integer. Match types.UNION ALL can be faster.pg_stat_user_indexes.COUNT(*) on large tables: It's slow on PostgreSQL (MVCC). Use approximate counts or a separate counter table.