mit einem Klick
document-model-advisor
MongoDB schema design advisor focusing on embed vs reference decisions, relationship modeling, and performance optimization patterns
Menü
MongoDB schema design advisor focusing on embed vs reference decisions, relationship modeling, and performance optimization patterns
Generate conference talk proposals (CFPs), abstracts, and presentation outlines with slide structure and timing guidance
Generate workshop agendas and hands-on curriculum for customer developer days, technical training sessions, and field engagements
Generate scoring rubrics and constructive feedback for hackathon submissions with fair evaluation frameworks and actionable improvement suggestions
Generate Model Context Protocol (MCP) tool servers from API descriptions, enabling AI assistants to connect to external services
Add AI capabilities to a MongoDB app including LLM summarization, structured generation, RAG pipeline with Atlas Vector Search, Voyage AI embeddings, and usage tracking with cost estimation
Self-service Atlas cluster provisioning with HTTP Digest auth, Admin API v2 client, 9-step orchestration with rollback, status polling, and DevRel attribution tracking
| name | document-model-advisor |
| description | MongoDB schema design advisor focusing on embed vs reference decisions, relationship modeling, and performance optimization patterns |
| license | MIT |
| metadata | {"version":"1.0.0","author":"Michael Lynn [mlynn.org](https://mlynn.org)","category":"database-design","domain":"data-modeling","updated":"2026-03-01T00:00:00.000Z","python-tools":"relationship_analyzer.py, schema_optimizer.py, migration_planner.py","tech-stack":"mongodb, json-schema, python"} |
Use this skill when designing MongoDB schemas, migrating from relational databases, or optimizing existing document structures.
Trigger phrases:
The most common MongoDB question: "Should I embed or reference?" This skill analyzes your data relationships and provides concrete recommendations based on access patterns, data size, update frequency, and consistency requirements.
Not for: Generating boilerplate code. This is about design decisions, not code generation.
Analyze relationships:
python scripts/relationship_analyzer.py entities.json --output analysis.json
Get schema recommendations:
python scripts/schema_optimizer.py analysis.json --output schema.json
Plan migration (if from SQL):
python scripts/migration_planner.py sql-schema.sql --output migration-plan.json
scripts/relationship_analyzer.py — Analyze entity relationships and access patternsscripts/schema_optimizer.py — Recommend embed vs reference strategiesscripts/migration_planner.py — Plan SQL → MongoDB schema migrationreferences/embed-vs-reference-guide.md — The classic MongoDB design questionreferences/schema-patterns.md — Proven MongoDB schema patternsassets/entity-definition-template.json — Define entities and relationshipsassets/schema-examples.json — Sample schemas for common domainsIs the data always accessed together?
├─ YES → Consider embedding
│ ├─ Is the embedded data unbounded?
│ │ ├─ YES → Reference instead (16MB doc limit)
│ │ └─ NO → Embed
│ └─ Is the embedded data updated frequently?
│ ├─ YES → Reference (avoid large doc rewrites)
│ └─ NO → Embed
└─ NO → Consider referencing
├─ Is the data used in multiple contexts?
│ ├─ YES → Reference (avoid duplication)
│ └─ NO → Could embed
└─ Do you need atomic updates across entities?
├─ YES → Embed (single doc transactions are atomic)
└─ NO → Reference is fine
1. Access Patterns (most important)
2. Data Size
3. Update Frequency
4. Data Lifecycle
5. Consistency Requirements
6. Duplication Tolerance
Example: User addresses (1-3 addresses per user)
{
"_id": "user123",
"name": "John Doe",
"addresses": [
{ "type": "home", "street": "123 Main St", "city": "NYC" },
{ "type": "work", "street": "456 Park Ave", "city": "NYC" }
]
}
Why embed: Few addresses, always accessed with user, same lifecycle.
Example: Blog posts and comments (potentially thousands of comments)
// Post
{ "_id": "post123", "title": "...", "author": "user123" }
// Comments (separate collection)
{ "_id": "comment1", "postId": "post123", "text": "...", "author": "user456" }
{ "_id": "comment2", "postId": "post123", "text": "...", "author": "user789" }
Why reference: Unbounded comments, separate access patterns, independent updates.
Example: Students and courses
// Student
{ "_id": "student123", "name": "Alice", "courseIds": ["course1", "course2"] }
// Course
{ "_id": "course1", "name": "MongoDB 101", "studentIds": ["student123", "student456"] }
Why reference: Many-to-many, independent lifecycles, queried from both directions.
❌ Unbounded Arrays
{
"_id": "post123",
"comments": [/* thousands of comments */] // Will hit 16MB limit
}
❌ Massive Duplication
// Duplicating full user object in every post
{
"_id": "post123",
"author": {
"id": "user123",
"name": "John",
"email": "john@example.com",
"bio": "...", // Duplicated everywhere
"preferences": {...}
}
}
❌ Deep Nesting
{
"order": {
"customer": {
"address": {
"country": {
"region": {
"subregion": {...} // Too deep
}
}
}
}
}
}
Reference with denormalized frequently-accessed fields:
{
"_id": "post123",
"authorId": "user123", // Reference
"authorName": "John Doe", // Denormalized for display
"authorAvatar": "https://..." // Denormalized for display
}
Use when: Need reference for main data, but want to avoid lookup for display.
Embed a subset, reference for full data:
{
"_id": "product123",
"reviews": [
{ "rating": 5, "text": "Great!", "author": "Alice" }, // Recent 10
{ "rating": 4, "text": "Good", "author": "Bob" }
],
"totalReviews": 1547 // Total count, full reviews in separate collection
}
Use when: Want preview data embedded, full data on demand.
Group unbounded data into buckets:
// Time-series data bucketed by hour
{
"_id": "sensor123_2024-03-01-14",
"sensorId": "sensor123",
"hour": "2024-03-01T14:00:00Z",
"readings": [
{ "time": "14:00:00", "value": 72.3 },
{ "time": "14:01:00", "value": 72.5 },
// ... up to 60 readings
]
}
Use when: Unbounded time-series or event data.
Purpose: Analyze entity relationships and recommend embed vs reference.
Input: Entity definitions
{
"entities": [
{
"name": "User",
"fields": ["id", "name", "email"],
"relationships": [
{
"to": "Post",
"type": "one-to-many",
"accessPattern": "separate",
"typical_count": "unbounded"
}
]
},
{
"name": "Post",
"fields": ["id", "title", "content"],
"relationships": [
{
"to": "User",
"type": "many-to-one",
"accessPattern": "together",
"typical_count": 1
}
]
}
]
}
Usage:
python scripts/relationship_analyzer.py entities.json --output analysis.json
Output:
{
"recommendations": [
{
"relationship": "User → Post",
"decision": "reference",
"reasoning": [
"Unbounded (typical_count: unbounded)",
"Separate access pattern",
"Independent lifecycle"
],
"confidence": 0.95
},
{
"relationship": "Post → User",
"decision": "reference_with_denormalization",
"reasoning": [
"Always accessed together",
"Small, bounded (1 user per post)",
"Recommend: Store userId + authorName for display"
],
"confidence": 0.85
}
]
}
Purpose: Generate optimized MongoDB schema from analysis.
Usage:
python scripts/schema_optimizer.py analysis.json --output schema.json
Output:
{
"collections": [
{
"name": "users",
"schema": {
"_id": "ObjectId",
"name": "string",
"email": "string"
},
"indexes": [
{ "fields": ["email"], "unique": true }
]
},
{
"name": "posts",
"schema": {
"_id": "ObjectId",
"title": "string",
"content": "string",
"authorId": "ObjectId", // Reference
"authorName": "string" // Denormalized
},
"indexes": [
{ "fields": ["authorId"] }
]
}
],
"denormalization_rules": [
{
"from": "users.name",
"to": "posts.authorName",
"sync_strategy": "on_write",
"note": "Update posts when user.name changes"
}
]
}
Purpose: Plan SQL → MongoDB schema migration.
Input: SQL schema
CREATE TABLE users (
id INT PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(100) UNIQUE
);
CREATE TABLE posts (
id INT PRIMARY KEY,
title VARCHAR(200),
content TEXT,
user_id INT REFERENCES users(id)
);
Usage:
python scripts/migration_planner.py schema.sql --output migration-plan.json
Output:
{
"tables_to_collections": {
"users": "users",
"posts": "posts"
},
"relationship_mappings": [
{
"sql_fk": "posts.user_id → users.id",
"mongodb_approach": "reference",
"field": "posts.userId",
"reasoning": "One-to-many, typical SQL pattern maps to reference"
}
],
"migration_steps": [
"1. Export users table to JSON",
"2. Import to users collection",
"3. Export posts table to JSON",
"4. Transform posts.user_id → posts.userId",
"5. Import to posts collection",
"6. Create index on posts.userId"
],
"considerations": [
"Original SQL join: SELECT posts.*, users.name FROM posts JOIN users",
"MongoDB equivalent: Aggregation with $lookup or denormalize users.name into posts"
]
}
{
"_id": "user123",
"name": "John Doe",
"email": "john@example.com",
"profile": { // Embed (always accessed together)
"bio": "...",
"avatar": "...",
"preferences": {...}
},
"addresses": [ // Embed (few, bounded)
{ "type": "home", "street": "..." }
]
}
{
"_id": "order123",
"customerId": "user123", // Reference
"customerName": "John Doe", // Denormalized
"items": [ // Embed (part of order)
{
"productId": "prod456", // Reference
"productName": "Widget", // Denormalized
"quantity": 2,
"price": 29.99
}
],
"total": 59.98,
"status": "shipped"
}
Why: Order items are embedded (part of order), but product/customer are referenced (independent entities).
// Post document
{
"_id": "post123",
"title": "MongoDB Schema Design",
"content": "...",
"authorId": "user123", // Reference
"authorName": "John Doe", // Denormalized
"tags": ["mongodb", "schema"], // Embed (small, bounded)
"commentCount": 47, // Derived field
"recentComments": [ // Embed subset
{ "author": "Alice", "text": "Great post!" }
]
}
// Comments collection (separate)
{
"_id": "comment1",
"postId": "post123",
"author": "Alice",
"text": "Great post!",
"createdAt": ISODate("...")
}
✅ Embed when:
Examples:
✅ Reference when:
Examples:
✅ Denormalize when:
Pattern: Reference + denormalized fields
{
"postId": "post123",
"authorId": "user123", // Reference
"authorName": "John", // Denormalized
"authorAvatar": "..." // Denormalized
}
1. Identify relationships:
2. Consider access patterns:
3. Denormalize strategically:
SQL Schema:
users (id, name, email)
posts (id, title, content, user_id)
comments (id, post_id, user_id, text)
tags (id, name)
post_tags (post_id, tag_id)
MongoDB Schema:
// users collection
{ "_id": ObjectId, "name": "...", "email": "..." }
// posts collection
{
"_id": ObjectId,
"title": "...",
"content": "...",
"authorId": ObjectId, // Reference to users
"authorName": "...", // Denormalized
"tags": ["mongodb", "schema"], // Embedded (from post_tags join table)
"commentCount": 47
}
// comments collection
{
"_id": ObjectId,
"postId": ObjectId,
"authorId": ObjectId,
"authorName": "...", // Denormalized
"text": "..."
}
Embedded:
{ "user.addresses.city": 1 } // Can index embedded fields
Referenced:
{ "userId": 1 } // Need index on foreign key
Before finalizing schema:
Use document-model-advisor | Use other tools |
|---|---|
| Schema design decisions | Code generation |
| Embed vs reference questions | ORM setup |
| SQL → MongoDB migration planning | Query optimization |
| Relationship modeling | Index tuning |
references/embed-vs-reference-guide.mdreferences/schema-patterns.mdMichael Lynn — mlynn.org · @mlynn · LinkedIn · GitHub
Next steps after schema design: