Run any Skill in Manus with one click

$pwd:

sql-ecosystem

Name: Sql Ecosystem
Author: takeokunn

// This skill should be used when working with SQL databases, "SELECT", "INSERT", "UPDATE", "DELETE", "CREATE TABLE", "JOIN", "INDEX", "EXPLAIN", transactions, or database migrations. Provides comprehensive SQL patterns across PostgreSQL, MySQL, and SQLite.

Run Skill in Manus

$ git log --oneline --stat

stars:67

forks:1

updated:May 6, 2026 at 13:26

SKILL.md

readonly

package.json

"author": "takeokunn"

"repository": "takeokunn/nixos-configuration"

View GitHub Repository

$ install --globalskills.sh

$ download --local

Run Skill in Manus

[HINT] Download the complete skill directory including SKILL.md and all related files

Run any Skill with one click

name	SQL Ecosystem
description	This skill should be used when working with SQL databases, "SELECT", "INSERT", "UPDATE", "DELETE", "CREATE TABLE", "JOIN", "INDEX", "EXPLAIN", transactions, or database migrations. Provides comprehensive SQL patterns across PostgreSQL, MySQL, and SQLite.
version	2.0.0

Provide comprehensive patterns for SQL database operations, schema design, query optimization, transaction management, and migrations across ANSI SQL standard with database-specific notes.

<sql_fundamentals> <data_types> ANSI SQL standard data types supported across major databases -- Numeric types INTEGER, SMALLINT, BIGINT DECIMAL(precision, scale), NUMERIC(precision, scale) REAL, DOUBLE PRECISION

    -- String types
    CHAR(n), VARCHAR(n), TEXT

    -- Date/Time types
    DATE, TIME, TIMESTAMP, INTERVAL

    -- Boolean
    BOOLEAN
  </example>
</concept>

<concept name="database_specific_types">
  <description>Useful types specific to each database</description>
  <example>
    -- PostgreSQL specific
    UUID, JSONB, ARRAY, INET, CIDR, MACADDR
    SERIAL, BIGSERIAL (auto-increment)
    TSTZRANGE, DATERANGE (range types)

    -- MySQL specific
    TINYINT, MEDIUMINT
    ENUM('value1', 'value2'), SET('a', 'b', 'c')
    JSON (stored as text internally)

    -- SQLite specific
    -- Uses type affinity: TEXT, INTEGER, REAL, BLOB, NULL
    -- Any type name accepted but mapped to affinity
  </example>
  <note>Prefer ANSI types for portability; use DB-specific types when features are needed</note>
</concept>

</data_types>

<ddl_patterns> Table creation with constraints CREATE TABLE users ( id SERIAL PRIMARY KEY, email VARCHAR(255) NOT NULL UNIQUE, name VARCHAR(100) NOT NULL, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

      CONSTRAINT email_format CHECK (email ~* '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$')
    );

    CREATE TABLE orders (
      id SERIAL PRIMARY KEY,
      user_id INTEGER NOT NULL,
      total DECIMAL(10, 2) NOT NULL,
      status VARCHAR(20) DEFAULT 'pending',

      CONSTRAINT fk_user FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
    );
  </example>
</pattern>

<pattern name="alter_table">
  <description>Safe table modification patterns</description>
  <example>
    -- Add column (safe)
    ALTER TABLE users ADD COLUMN phone VARCHAR(20);

    -- Add column with default (PostgreSQL 11+ is instant)
    ALTER TABLE users ADD COLUMN active BOOLEAN DEFAULT true;

    -- Rename column
    ALTER TABLE users RENAME COLUMN name TO full_name;

    -- Add constraint
    ALTER TABLE users ADD CONSTRAINT unique_phone UNIQUE (phone);

    -- Drop constraint
    ALTER TABLE users DROP CONSTRAINT unique_phone;
  </example>
</pattern>

<pattern name="create_index">
  <description>Index creation patterns</description>
  <example>
    -- B-tree index (default, most common)
    CREATE INDEX idx_users_email ON users(email);

    -- Unique index
    CREATE UNIQUE INDEX idx_users_email_unique ON users(email);

    -- Composite index (order matters for query optimization)
    CREATE INDEX idx_orders_user_status ON orders(user_id, status);

    -- Partial index (PostgreSQL)
    CREATE INDEX idx_active_users ON users(email) WHERE active = true;

    -- Expression index (PostgreSQL)
    CREATE INDEX idx_users_lower_email ON users(LOWER(email));
  </example>
  <decision_tree name="index_selection">
    <question>What type of queries will use this index?</question>
    <if_yes condition="Equality lookups">B-tree (default)</if_yes>
    <if_yes condition="Range queries">B-tree</if_yes>
    <if_yes condition="Full-text search">GIN with tsvector (PostgreSQL)</if_yes>
    <if_yes condition="JSON containment">GIN (PostgreSQL)</if_yes>
    <if_yes condition="Geospatial">GiST (PostgreSQL)</if_yes>
  </decision_tree>
</pattern>

</ddl_patterns>

<dml_patterns> Query patterns for data retrieval -- Basic select with filtering SELECT id, email, name FROM users WHERE active = true ORDER BY created_at DESC LIMIT 10 OFFSET 0;

    -- Aggregate with grouping
    SELECT status, COUNT(*) as count, SUM(total) as revenue
    FROM orders
    WHERE created_at >= '2024-01-01'
    GROUP BY status
    HAVING COUNT(*) > 10
    ORDER BY revenue DESC;
  </example>
</pattern>

<pattern name="insert">
  <description>Data insertion patterns</description>
  <example>
    -- Single insert
    INSERT INTO users (email, name) VALUES ('user@example.com', 'John Doe');

    -- Multi-row insert
    INSERT INTO users (email, name) VALUES
      ('user1@example.com', 'User One'),
      ('user2@example.com', 'User Two'),
      ('user3@example.com', 'User Three');

    -- Insert with returning (PostgreSQL)
    INSERT INTO users (email, name)
    VALUES ('new@example.com', 'New User')
    RETURNING id, created_at;

    -- Upsert (PostgreSQL)
    INSERT INTO users (email, name)
    VALUES ('user@example.com', 'Updated Name')
    ON CONFLICT (email) DO UPDATE SET name = EXCLUDED.name;

    -- Upsert (MySQL)
    INSERT INTO users (email, name)
    VALUES ('user@example.com', 'Updated Name')
    ON DUPLICATE KEY UPDATE name = VALUES(name);
  </example>
</pattern>

<pattern name="update">
  <description>Data modification patterns</description>
  <example>
    -- Basic update
    UPDATE users SET name = 'New Name' WHERE id = 1;

    -- Update with subquery
    UPDATE orders
    SET status = 'cancelled'
    WHERE user_id IN (SELECT id FROM users WHERE active = false);

    -- Update with join (PostgreSQL)
    UPDATE orders o
    SET status = 'vip'
    FROM users u
    WHERE o.user_id = u.id AND u.vip = true;

    -- Update with returning (PostgreSQL)
    UPDATE users SET active = false WHERE id = 1 RETURNING *;
  </example>
</pattern>

<pattern name="delete">
  <description>Data removal patterns</description>
  <example>
    -- Basic delete
    DELETE FROM users WHERE id = 1;

    -- Delete with subquery
    DELETE FROM orders
    WHERE user_id IN (SELECT id FROM users WHERE active = false);

    -- Truncate (faster for all rows, resets sequences)
    TRUNCATE TABLE logs;
    TRUNCATE TABLE logs RESTART IDENTITY; -- PostgreSQL

    -- Soft delete pattern (prefer this)
    UPDATE users SET deleted_at = NOW() WHERE id = 1;
  </example>
  <note>Prefer soft deletes for audit trails; use hard deletes only for GDPR/compliance</note>
</pattern>

<pattern name="parameterized_queries">
  <description>Safe query construction preventing SQL injection - ALWAYS use for user input</description>
  <example>
    -- PostgreSQL with psycopg2/psycopg3 (Python)
    cursor.execute(
      "SELECT * FROM users WHERE email = %s AND status = %s",
      (user_email, status)
    )

    -- PostgreSQL with pg (Node.js)
    client.query(
      'SELECT * FROM users WHERE email = $1 AND status = $2',
      [userEmail, status]
    )

    -- MySQL with mysql-connector (Python)
    cursor.execute(
      "SELECT * FROM users WHERE email = %s AND status = %s",
      (user_email, status)
    )

    -- MySQL with mysql2 (Node.js)
    connection.execute(
      'SELECT * FROM users WHERE email = ? AND status = ?',
      [userEmail, status]
    )

    -- SQLite with sqlite3 (Python)
    cursor.execute(
      "SELECT * FROM users WHERE email = ? AND status = ?",
      (user_email, status)
    )

    -- Go with database/sql
    db.Query(
      "SELECT _ FROM users WHERE email = $1 AND status = $2",
      userEmail, status
    )
  </example>
  <warning>NEVER use string concatenation or template literals with user input - this enables SQL injection attacks</warning>
  <example>
    -- DANGEROUS: SQL injection vulnerability
    query = "SELECT _ FROM users WHERE email = '" + user_input + "'"
    query = f"SELECT * FROM users WHERE email = '{user_input}'"

    -- If user_input = "'; DROP TABLE users; --"
    -- Executes: SELECT * FROM users WHERE email = ''; DROP TABLE users; --'
  </example>
</pattern>

<pattern name="safe_like_patterns">
  <description>Prevent pattern injection in LIKE queries with user input</description>
  <example>
    -- VULNERABLE: User can inject wildcards
    -- If user_input = "%", this returns ALL records
    SELECT * FROM products WHERE name LIKE '%' || user_input || '%';

    -- SAFE: Escape wildcards before using in LIKE
    -- Python: escaped = user_input.replace('%', '\\%').replace('_', '\\_')
    -- Then use parameterized query:
    cursor.execute(
      "SELECT * FROM products WHERE name LIKE %s",
      ('%' + escaped_input + '%',)
    )

    -- PostgreSQL: Use ESCAPE clause explicitly
    SELECT * FROM products
    WHERE name LIKE '%' || $1 || '%' ESCAPE '\';

    -- Alternative: Use position() or strpos() for exact matching
    SELECT * FROM products WHERE position($1 in name) > 0;
  </example>
  <warning>Wildcards % and _ in user input can bypass intended restrictions</warning>
</pattern>

<pattern name="dynamic_sql_safely">
  <description>Safe dynamic SQL construction with whitelisting for identifiers</description>
  <example>
    -- DANGEROUS: Identifier injection
    query = f"SELECT {column_name} FROM {table_name}"

    -- SAFE: Whitelist allowed values (Python example)
    ALLOWED_COLUMNS = {'id', 'name', 'email', 'created_at'}
    ALLOWED_TABLES = {'users', 'products', 'orders'}

    if column_name not in ALLOWED_COLUMNS:
      raise ValueError(f"Invalid column: {column_name}")
    if table_name not in ALLOWED_TABLES:
      raise ValueError(f"Invalid table: {table_name}")

    -- PostgreSQL: Use quote_ident() for identifiers
    SELECT quote_ident($1) FROM quote_ident($2);

    -- Use identifier quoting as additional protection
    query = f'SELECT "{column_name}" FROM "{table_name}"'
  </example>
  <warning>Never use user input directly for table/column names; always validate against whitelist</warning>
</pattern>

</dml_patterns>

Database constraint patterns for data integrity -- Primary Key PRIMARY KEY (id) PRIMARY KEY (user_id, product_id) -- composite

    -- Foreign Key
    FOREIGN KEY (user_id) REFERENCES users(id)
    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE SET NULL

    -- Unique
    UNIQUE (email)
    UNIQUE (user_id, product_id) -- composite unique

    -- Check
    CHECK (price > 0)
    CHECK (status IN ('pending', 'active', 'completed'))

    -- Not Null
    NOT NULL

    -- Default
    DEFAULT CURRENT_TIMESTAMP
    DEFAULT 'pending'
  </example>
</concept>

<database_versions> <supported_versions>18, 17 (LTS), 16, 15, 14</supported_versions> <eol_versions>13 and earlier (13 reached EOL November 2025)</eol_versions> <postgresql_18> PostgreSQL 18 (latest major version, released 2025) <latest_minor>Use the latest PostgreSQL 18.x minor available in the active package set</latest_minor> Asynchronous I/O (AIO) enables concurrent I/O tasks like readahead and sequential scan, significantly improving performance for I/O-bound workloads. </postgresql_18> <postgresql_17> PostgreSQL 17 (LTS) SQL/JSON standard functions: JSON_TABLE, JSON_QUERY, JSON_VALUE, JSON_EXISTS for standards-compliant JSON processing -- JSON_TABLE: convert JSON to relational rows (PG 17+) SELECT jt.* FROM orders AS o , JSON_TABLE(o.metadata, '$.items[*]' COLUMNS ( product_name TEXT PATH '$.name' , quantity INTEGER PATH '$.qty' , price NUMERIC PATH '$.price' ) ) AS jt;

      -- JSON_QUERY: extract JSON sub-object
      SELECT JSON_QUERY(metadata, '$.shipping') AS shipping_info
        FROM orders;
    </example>
  </feature>
</postgresql_17>
<postgresql_15>
  <feature name="merge_statement">
    <description>MERGE statement (SQL standard UPSERT with full MATCHED/NOT MATCHED logic, since PG 15)</description>
    <example>
      MERGE INTO inventory AS target
      USING new_shipment AS source
         ON target.product_id = source.product_id
      WHEN MATCHED THEN
        UPDATE SET quantity = target.quantity + source.quantity
                 , updated_at = CURRENT_TIMESTAMP
      WHEN NOT MATCHED THEN
        INSERT (product_id, quantity, updated_at)
        VALUES (source.product_id, source.quantity, CURRENT_TIMESTAMP);
    </example>
  </feature>
</postgresql_15>
<performance_features>
  <feature name="incremental_sort">
    <description>Incremental sort (PG 13+, improved through PG 17/18): exploits existing partial sort order from indexes, reducing sort overhead for ORDER BY with multiple columns</description>
    <example>
      -- With index on (user_id), query ORDER BY user_id, created_at
      -- PG uses incremental sort: sorts only created_at within each user_id group
      EXPLAIN ANALYZE
      SELECT * FROM orders ORDER BY user_id, created_at;
      -- Look for "Incremental Sort" in plan output
    </example>
  </feature>
  <feature name="logical_replication">
    <description>Logical replication improvements (PG 15-18): failover slots (PG 17), parallel apply (PG 16), row filters and column lists (PG 15)</description>
  </feature>
</performance_features>
<monitoring>
  <feature name="pg_stat_io">
    <description>pg_stat_io view for I/O monitoring (since PG 16): tracks reads, writes, extends, hits, and evictions per backend type and I/O context</description>
    <example>
      SELECT backend_type, context, reads, writes, hits
        FROM pg_stat_io
       WHERE reads > 0
       ORDER BY reads DESC;
    </example>
  </feature>
</monitoring>

MySQL 8.4 LTS (long-term support) and MySQL 9.x Innovation releases SQLite 3.48+ with type affinity system; single-file database

<query_patterns> Return only matching rows from both tables SELECT u.name, o.total FROM users u INNER JOIN orders o ON u.id = o.user_id WHERE o.status = 'completed'; <use_case>When you need data from both tables and only care about matches</use_case>

<pattern name="left_join">
  <description>Return all rows from left table, matching rows from right</description>
  <example>
    SELECT u.name, COUNT(o.id) as order_count
    FROM users u
    LEFT JOIN orders o ON u.id = o.user_id
    GROUP BY u.id, u.name;
  </example>
  <use_case>When you need all rows from primary table even without matches</use_case>
</pattern>

<pattern name="right_join">
  <description>Return all rows from right table, matching rows from left</description>
  <example>
    SELECT u.name, o.total
    FROM users u
    RIGHT JOIN orders o ON u.id = o.user_id;
  </example>
  <note>Often rewritten as LEFT JOIN by swapping table order for clarity</note>
</pattern>

<pattern name="full_outer_join">
  <description>Return all rows from both tables</description>
  <example>
    SELECT u.name, o.total
    FROM users u
    FULL OUTER JOIN orders o ON u.id = o.user_id;
  </example>
  <note>Not supported in MySQL; use UNION of LEFT and RIGHT JOINs</note>
</pattern>

<pattern name="cross_join">
  <description>Cartesian product of two tables</description>
  <example>
    SELECT u.name, p.name as product
    FROM users u
    CROSS JOIN products p;
  </example>
  <warning>Produces M*N rows; use carefully with large tables</warning>
</pattern>

<pattern name="self_join">
  <description>Join table with itself</description>
  <example>
    -- Find employees and their managers
    SELECT e.name as employee, m.name as manager
    FROM employees e
    LEFT JOIN employees m ON e.manager_id = m.id;
  </example>
</pattern>

<pattern name="lateral_join">
  <description>LATERAL join: subquery can reference columns from preceding FROM items (PostgreSQL, MySQL 8.0.14+)</description>
  <example>
    -- Top 3 orders per user (more efficient than window function approach)
    SELECT u.name, top_orders.total, top_orders.created_at
      FROM users AS u
      LEFT JOIN LATERAL (
        SELECT o.total, o.created_at
          FROM orders AS o
         WHERE o.user_id = u.id
         ORDER BY o.total DESC
         LIMIT 3
      ) AS top_orders ON true;

    -- Use with set-returning functions
    SELECT u.name, tag.value
      FROM users AS u
         , LATERAL unnest(u.tags) AS tag(value);
  </example>
  <use_case>Top-N per group, dependent subqueries in FROM, set-returning functions</use_case>
</pattern>

Subquery returning single value SELECT name, (SELECT AVG(total) FROM orders) as avg_order_total FROM users;

<pattern name="in_subquery">
  <description>Filter using subquery results</description>
  <example>
    SELECT * FROM users
    WHERE id IN (SELECT user_id FROM orders WHERE total > 1000);
  </example>
</pattern>

<pattern name="exists_subquery">
  <description>Check for existence of related records</description>
  <example>
    -- More efficient than IN for large datasets
    SELECT * FROM users u
    WHERE EXISTS (
      SELECT 1 FROM orders o
      WHERE o.user_id = u.id AND o.total > 1000
    );
  </example>
  <note>EXISTS stops at first match; more efficient than IN for existence checks</note>
</pattern>

<pattern name="correlated_subquery">
  <description>Subquery referencing outer query</description>
  <example>
    SELECT u.name,
      (SELECT MAX(o.total) FROM orders o WHERE o.user_id = u.id) as max_order
    FROM users u;
  </example>
  <warning>Executes once per outer row; consider JOIN for performance</warning>
</pattern>

<pattern name="derived_table">
  <description>Subquery in FROM clause</description>
  <example>
    SELECT user_stats.name, user_stats.total_spent
    FROM (
      SELECT u.name, SUM(o.total) as total_spent
      FROM users u
      JOIN orders o ON u.id = o.user_id
      GROUP BY u.id, u.name
    ) AS user_stats
    WHERE user_stats.total_spent > 10000;
  </example>
</pattern>

Common Table Expression for readable queries WITH active_users AS ( SELECT id, name, email FROM users WHERE active = true ) SELECT au.name, COUNT(o.id) as order_count FROM active_users au LEFT JOIN orders o ON au.id = o.user_id GROUP BY au.id, au.name; CTEs improve readability; some DBs materialize them (performance consideration). Use MATERIALIZED / NOT MATERIALIZED hints (PG 12+) to control this behavior.

<pattern name="cte_materialization_hints">
  <description>Control CTE materialization in PostgreSQL 12+ for performance tuning</description>
  <example>
    -- Force materialization (useful when CTE is referenced multiple times)
    WITH expensive_calc AS MATERIALIZED (
      SELECT user_id, SUM(total) AS lifetime_value
        FROM orders
       GROUP BY user_id
    )
    SELECT * FROM expensive_calc WHERE lifetime_value > 1000;

    -- Prevent materialization (allow optimizer to inline the CTE)
    WITH simple_filter AS NOT MATERIALIZED (
      SELECT id, name FROM users WHERE active = true
    )
    SELECT sf.name, COUNT(o.id)
      FROM simple_filter AS sf
      LEFT JOIN orders AS o ON sf.id = o.user_id
     GROUP BY sf.id, sf.name;
  </example>
  <note>Default: PG inlines CTEs referenced once, materializes if referenced multiple times. Use hints to override.</note>
</pattern>

<pattern name="multiple_ctes">
  <description>Chain multiple CTEs</description>
  <example>
    WITH
      active_users AS (
        SELECT id, name FROM users WHERE active = true
      ),
      user_orders AS (
        SELECT user_id, SUM(total) as total_spent
        FROM orders
        GROUP BY user_id
      )
    SELECT au.name, COALESCE(uo.total_spent, 0) as total_spent
    FROM active_users au
    LEFT JOIN user_orders uo ON au.id = uo.user_id
    ORDER BY total_spent DESC;
  </example>
</pattern>

<pattern name="recursive_cte">
  <description>Recursive query for hierarchical data</description>
  <example>
    -- Traverse org hierarchy
    WITH RECURSIVE org_tree AS (
      -- Base case: top-level managers
      SELECT id, name, manager_id, 1 as level
      FROM employees
      WHERE manager_id IS NULL

      UNION ALL

      -- Recursive case: subordinates
      SELECT e.id, e.name, e.manager_id, ot.level + 1
      FROM employees e
      INNER JOIN org_tree ot ON e.manager_id = ot.id
    )
    SELECT * FROM org_tree ORDER BY level, name;
  </example>
  <use_case>Tree structures, bill of materials, path finding</use_case>
</pattern>

<window_functions> Assign unique sequential numbers SELECT name, total, ROW_NUMBER() OVER (ORDER BY total DESC) as rank FROM orders;

    -- Partition by user
    SELECT
      user_id,
      total,
      ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) as order_num
    FROM orders;
  </example>
  <use_case>Pagination, deduplication, ranking</use_case>
</pattern>

<pattern name="rank_dense_rank">
  <description>Ranking with tie handling</description>
  <example>
    SELECT
      name,
      score,
      RANK() OVER (ORDER BY score DESC) as rank,        -- gaps after ties
      DENSE_RANK() OVER (ORDER BY score DESC) as dense  -- no gaps
    FROM players;
    -- Score 100: RANK=1, DENSE_RANK=1
    -- Score 100: RANK=1, DENSE_RANK=1
    -- Score 90:  RANK=3, DENSE_RANK=2
  </example>
</pattern>

<pattern name="lag_lead">
  <description>Access adjacent rows</description>
  <example>
    SELECT
      date,
      revenue,
      LAG(revenue, 1) OVER (ORDER BY date) as prev_day_revenue,
      LEAD(revenue, 1) OVER (ORDER BY date) as next_day_revenue,
      revenue - LAG(revenue, 1) OVER (ORDER BY date) as daily_change
    FROM daily_sales;
  </example>
  <use_case>Time series analysis, trend detection</use_case>
</pattern>

<pattern name="running_aggregates">
  <description>Cumulative calculations with window framing (ROWS, RANGE, GROUPS)</description>
  <example>
    SELECT
      date,
      revenue,
      SUM(revenue) OVER (ORDER BY date) as cumulative_revenue,
      AVG(revenue) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as moving_avg_7d,
      -- RANGE: based on value distance (e.g., all rows within 7 days)
      SUM(revenue) OVER (ORDER BY date RANGE BETWEEN INTERVAL '7 days' PRECEDING AND CURRENT ROW) as rolling_7d_sum,
      -- GROUPS: based on peer groups (rows with same ORDER BY value)
      AVG(revenue) OVER (ORDER BY date GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING) as avg_adjacent_groups
    FROM daily_sales;
  </example>
  <note>ROWS = physical rows, RANGE = logical value range, GROUPS = peer groups (PG 11+). Choose based on whether duplicates in ORDER BY column should be treated as one unit.</note>
</pattern>

<pattern name="first_last_value">
  <description>Get first/last values in window</description>
  <example>
    SELECT
      user_id,
      order_date,
      total,
      FIRST_VALUE(total) OVER (PARTITION BY user_id ORDER BY order_date) as first_order,
      LAST_VALUE(total) OVER (
        PARTITION BY user_id
        ORDER BY order_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
      ) as last_order
    FROM orders;
  </example>
  <note>LAST_VALUE requires explicit frame; default frame ends at current row</note>
</pattern>

<pattern name="ntile">
  <description>Divide rows into buckets</description>
  <example>
    SELECT
      name,
      score,
      NTILE(4) OVER (ORDER BY score DESC) as quartile
    FROM students;
  </example>
  <use_case>Percentile analysis, distribution grouping</use_case>
</pattern>

</window_functions>

Aggregate data by groups SELECT status, COUNT(*) as count, SUM(total) as sum, AVG(total) as avg, MIN(total) as min, MAX(total) as max FROM orders GROUP BY status;

<pattern name="having">
  <description>Filter aggregated results</description>
  <example>
    SELECT user_id, COUNT(*) as order_count
    FROM orders
    GROUP BY user_id
    HAVING COUNT(*) >= 5;
  </example>
  <note>HAVING filters after aggregation; WHERE filters before</note>
</pattern>

<pattern name="grouping_sets">
  <description>Multiple grouping levels in single query (PostgreSQL, MySQL 8+)</description>
  <example>
    SELECT
      COALESCE(category, 'All Categories') as category,
      COALESCE(region, 'All Regions') as region,
      SUM(sales) as total_sales
    FROM sales_data
    GROUP BY GROUPING SETS (
      (category, region),
      (category),
      (region),
      ()
    );
  </example>
</pattern>

<pattern name="rollup">
  <description>Hierarchical aggregation</description>
  <example>
    SELECT
      year,
      quarter,
      SUM(revenue) as revenue
    FROM sales
    GROUP BY ROLLUP (year, quarter);
    -- Produces: (year, quarter), (year), ()
  </example>
</pattern>

<pattern name="cube">
  <description>All possible grouping combinations</description>
  <example>
    SELECT
      category,
      region,
      SUM(sales) as total_sales
    FROM sales_data
    GROUP BY CUBE (category, region);
    -- Produces: (category, region), (category), (region), ()
  </example>
  <note>CUBE produces 2^N grouping sets for N columns; use GROUPING SETS for selective combinations</note>
</pattern>

<schema_design> First Normal Form: Atomic values, no repeating groups -- Violation: comma-separated values CREATE TABLE bad_orders ( id INTEGER, products TEXT -- 'apple,banana,orange' );

    -- 1NF compliant: separate rows
    CREATE TABLE order_items (
      order_id INTEGER,
      product_id INTEGER,
      PRIMARY KEY (order_id, product_id)
    );
  </example>
</concept>

<concept name="2nf">
  <description>Second Normal Form: No partial dependencies on composite key</description>
  <example>
    -- Violation: product_name depends only on product_id
    CREATE TABLE bad_order_items (
      order_id INTEGER,
      product_id INTEGER,
      product_name TEXT,  -- partial dependency
      quantity INTEGER,
      PRIMARY KEY (order_id, product_id)
    );

    -- 2NF compliant: separate product table
    CREATE TABLE products (
      id INTEGER PRIMARY KEY,
      name TEXT
    );
    CREATE TABLE order_items (
      order_id INTEGER,
      product_id INTEGER REFERENCES products(id),
      quantity INTEGER,
      PRIMARY KEY (order_id, product_id)
    );
  </example>
</concept>

<concept name="3nf">
  <description>Third Normal Form: No transitive dependencies</description>
  <example>
    -- Violation: city depends on zip_code, not directly on user
    CREATE TABLE bad_users (
      id INTEGER PRIMARY KEY,
      name TEXT,
      zip_code TEXT,
      city TEXT  -- transitive: user -> zip_code -> city
    );

    -- 3NF compliant: separate locations
    CREATE TABLE locations (
      zip_code TEXT PRIMARY KEY,
      city TEXT
    );
    CREATE TABLE users (
      id INTEGER PRIMARY KEY,
      name TEXT,
      zip_code TEXT REFERENCES locations(zip_code)
    );
  </example>
</concept>

<decision_tree name="normalization_level">
  <question>What are the priority requirements?</question>
  <if_yes condition="Data integrity and minimal redundancy">Normalize to 3NF</if_yes>
  <if_yes condition="Read performance critical">Consider denormalization for hot paths</if_yes>
  <if_yes condition="Write-heavy with simple reads">Normalize fully</if_yes>
  <if_yes condition="Reporting/analytics">Consider star schema denormalization</if_yes>
</decision_tree>

Use auto-generated IDs as primary keys CREATE TABLE users ( id SERIAL PRIMARY KEY, -- PostgreSQL -- id INT AUTO_INCREMENT PRIMARY KEY, -- MySQL email VARCHAR(255) UNIQUE NOT NULL ); Prefer surrogate keys for stability; natural keys can change

<pattern name="soft_delete">
  <description>Mark records as deleted instead of removing</description>
  <example>
    CREATE TABLE users (
      id SERIAL PRIMARY KEY,
      email VARCHAR(255) NOT NULL,
      deleted_at TIMESTAMP NULL,

      CONSTRAINT unique_active_email UNIQUE (email) WHERE deleted_at IS NULL
    );

    -- Query active records
    SELECT * FROM users WHERE deleted_at IS NULL;
  </example>
  <use_case>Audit trails, data recovery, compliance</use_case>
</pattern>

<pattern name="audit_columns">
  <description>Track record creation and modification</description>
  <example>
    CREATE TABLE orders (
      id SERIAL PRIMARY KEY,
      -- business columns...
      created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
      updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
      created_by INTEGER REFERENCES users(id),
      updated_by INTEGER REFERENCES users(id)
    );

    -- Auto-update trigger (PostgreSQL)
    CREATE OR REPLACE FUNCTION update_updated_at()
    RETURNS TRIGGER AS $$
    BEGIN
      NEW.updated_at = CURRENT_TIMESTAMP;
      RETURN NEW;
    END;
    $$
    LANGUAGE plpgsql;

    CREATE TRIGGER orders_updated_at
      BEFORE UPDATE ON orders
      FOR EACH ROW
      EXECUTE FUNCTION update_updated_at();
  </example>
</pattern>

<pattern name="polymorphic_association">
  <description>Single table references multiple entity types</description>
  <example>
    -- Comments can belong to posts or videos
    CREATE TABLE comments (
      id SERIAL PRIMARY KEY,
      content TEXT,
      commentable_type VARCHAR(50) NOT NULL,  -- 'post' or 'video'
      commentable_id INTEGER NOT NULL,

      INDEX idx_commentable (commentable_type, commentable_id)
    );
  </example>
  <warning>Cannot enforce FK constraint; validate at application level</warning>
</pattern>

<pattern name="enum_table">
  <description>Reference table for enumerated values</description>
  <example>
    CREATE TABLE order_statuses (
      id SERIAL PRIMARY KEY,
      name VARCHAR(50) UNIQUE NOT NULL,
      description TEXT
    );

    INSERT INTO order_statuses (name) VALUES
      ('pending'), ('processing'), ('shipped'), ('delivered'), ('cancelled');

    CREATE TABLE orders (
      id SERIAL PRIMARY KEY,
      status_id INTEGER REFERENCES order_statuses(id)
    );
  </example>
  <note>Prefer over ENUM for flexibility; easier to add/modify values</note>
</pattern>

<pattern name="generated_columns">
  <description>Computed columns stored or virtual (PG 12+, MySQL 5.7+, SQLite 3.31+)</description>
  <example>
    -- PostgreSQL: STORED generated columns (computed on write)
    CREATE TABLE products (
      id SERIAL PRIMARY KEY,
      price NUMERIC(10, 2) NOT NULL,
      tax_rate NUMERIC(4, 3) NOT NULL,
      total_price NUMERIC(10, 2) GENERATED ALWAYS AS (price * (1 + tax_rate)) STORED
    );

    -- MySQL: supports both STORED and VIRTUAL
    CREATE TABLE products (
      id INT AUTO_INCREMENT PRIMARY KEY,
      first_name VARCHAR(50),
      last_name VARCHAR(50),
      full_name VARCHAR(101) GENERATED ALWAYS AS (CONCAT(first_name, ' ', last_name)) VIRTUAL
    );
  </example>
  <note>PostgreSQL only supports STORED; MySQL/SQLite support both STORED and VIRTUAL. VIRTUAL is computed on read, STORED is persisted and indexable.</note>
</pattern>

<pattern name="junction_table">
  <description>Many-to-many relationship</description>
  <example>
    CREATE TABLE users (id SERIAL PRIMARY KEY, name TEXT);
    CREATE TABLE roles (id SERIAL PRIMARY KEY, name TEXT);

    CREATE TABLE user_roles (
      user_id INTEGER REFERENCES users(id) ON DELETE CASCADE,
      role_id INTEGER REFERENCES roles(id) ON DELETE CASCADE,
      granted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
      PRIMARY KEY (user_id, role_id)
    );
  </example>
</pattern>

<query_optimization> <explain_analysis> Understand query execution plans -- PostgreSQL EXPLAIN SELECT * FROM users WHERE email = 'test@example.com'; EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com'; EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) SELECT ...;

    -- MySQL
    EXPLAIN SELECT * FROM users WHERE email = 'test@example.com';
    EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com';  -- MySQL 8.0.18+

    -- SQLite
    EXPLAIN QUERY PLAN SELECT * FROM users WHERE email = 'test@example.com';
  </example>
</concept>

<concept name="key_metrics">
  <description>Important EXPLAIN output indicators</description>
  <example>
    -- PostgreSQL EXPLAIN output interpretation
    Seq Scan        -- Full table scan (often bad for large tables)
    Index Scan      -- Using index (good)
    Index Only Scan -- Covering index, no heap access (best)
    Bitmap Scan     -- Multiple index conditions combined
    Nested Loop     -- Join method for small datasets
    Hash Join       -- Join method for larger datasets
    Merge Join      -- Join method for sorted data

    -- Key metrics to watch
    cost=startup..total   -- Estimated cost units
    rows=N                -- Estimated row count
    actual time=X..Y      -- Real execution time (with ANALYZE)
    loops=N               -- Number of iterations
  </example>
</concept>

</explain_analysis>

<index_strategies> Index contains all columns needed by query -- Query only needs email and name SELECT email, name FROM users WHERE email LIKE 'a%';

    -- Covering index avoids table lookup
    CREATE INDEX idx_users_email_name ON users(email, name);
  </example>
</pattern>

<pattern name="composite_index_order">
  <description>Order columns by selectivity and query patterns</description>
  <example>
    -- Query: WHERE status = ? AND user_id = ?
    -- If status has few values (low cardinality), put user_id first
    CREATE INDEX idx_orders_user_status ON orders(user_id, status);

    -- Leftmost prefix rule: this index supports:
    -- WHERE user_id = ?
    -- WHERE user_id = ? AND status = ?
    -- But NOT: WHERE status = ?
  </example>
</pattern>

<pattern name="partial_index">
  <description>Index subset of rows (PostgreSQL)</description>
  <example>
    -- Only index active users
    CREATE INDEX idx_active_users ON users(email) WHERE active = true;

    -- Only index recent orders
    CREATE INDEX idx_recent_orders ON orders(created_at)
      WHERE created_at > '2024-01-01';
  </example>
  <use_case>When queries always filter by same condition</use_case>
</pattern>

</index_strategies>

<common_optimizations> Select only needed columns -- Bad: fetches all columns SELECT * FROM users WHERE id = 1;

    -- Good: only needed columns
    SELECT id, name, email FROM users WHERE id = 1;
  </example>
</pattern>

<pattern name="use_exists_over_count">
  <description>EXISTS is more efficient for existence checks</description>
  <example>
    -- Bad: counts all matching rows
    SELECT CASE WHEN COUNT(*) > 0 THEN true ELSE false END
    FROM orders WHERE user_id = 1;

    -- Good: stops at first match
    SELECT EXISTS(SELECT 1 FROM orders WHERE user_id = 1);
  </example>
</pattern>

<pattern name="batch_operations">
  <description>Batch inserts and updates for better performance</description>
  <example>
    -- Bad: individual inserts
    INSERT INTO logs (message) VALUES ('log1');
    INSERT INTO logs (message) VALUES ('log2');
    INSERT INTO logs (message) VALUES ('log3');

    -- Good: batch insert
    INSERT INTO logs (message) VALUES
      ('log1'), ('log2'), ('log3');

    -- Good: batch update with CASE
    UPDATE products
    SET price = CASE id
      WHEN 1 THEN 10.00
      WHEN 2 THEN 20.00
      WHEN 3 THEN 30.00
    END
    WHERE id IN (1, 2, 3);
  </example>
</pattern>

<pattern name="pagination">
  <description>Efficient pagination patterns</description>
  <example>
    -- Offset pagination (simple but slow for large offsets)
    SELECT * FROM orders ORDER BY id LIMIT 20 OFFSET 1000;

    -- Keyset pagination (efficient for large datasets)
    SELECT * FROM orders
    WHERE id > 1000  -- last seen ID
    ORDER BY id
    LIMIT 20;

    -- Cursor-based with composite key
    SELECT * FROM orders
    WHERE (created_at, id) > ('2024-01-01', 1000)
    ORDER BY created_at, id
    LIMIT 20;
  </example>
  <note>Keyset pagination is O(1); offset pagination is O(n)</note>
</pattern>

<pattern name="avoid_or_on_different_columns">
  <description>OR conditions on different columns prevent index usage</description>
  <example>
    -- Bad: can't use single index efficiently
    SELECT * FROM users WHERE email = 'a@b.com' OR name = 'John';

    -- Good: UNION allows index usage on each condition
    SELECT * FROM users WHERE email = 'a@b.com'
    UNION
    SELECT * FROM users WHERE name = 'John';
  </example>
</pattern>

</common_optimizations> </query_optimization>

All operations succeed or all fail BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = 1; UPDATE accounts SET balance = balance + 100 WHERE id = 2; COMMIT; -- Both succeed or neither

<concept name="consistency">
  <description>Database remains in valid state after transaction</description>
  <example>
    -- Constraints ensure consistency
    ALTER TABLE accounts ADD CONSTRAINT positive_balance CHECK (balance >= 0);

    -- Transaction fails if constraint violated
    BEGIN;
    UPDATE accounts SET balance = balance - 1000 WHERE id = 1;  -- Fails if balance &lt; 1000
    COMMIT;
  </example>
</concept>

<concept name="isolation">
  <description>Concurrent transactions don't interfere</description>
  <example>
    -- Set isolation level
    SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

    BEGIN;
    -- Protected from concurrent modifications
    SELECT balance FROM accounts WHERE id = 1;
    UPDATE accounts SET balance = balance - 100 WHERE id = 1;
    COMMIT;
  </example>
</concept>

<concept name="durability">
  <description>Committed changes persist even after crashes</description>
  <note>Handled by database engine through WAL (Write-Ahead Logging)</note>
</concept>

</acid_properties>

<isolation_levels> Lowest isolation; can read uncommitted changes Dirty reads, non-repeatable reads, phantom reads <use_case>Rarely used; only for approximate counts/analytics</use_case>

<concept name="read_committed">
  <description>Default in PostgreSQL; only reads committed data</description>
  <problems>Non-repeatable reads, phantom reads</problems>
  <use_case>Most OLTP applications</use_case>
  <example>
    SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
  </example>
</concept>

<concept name="repeatable_read">
  <description>Default in MySQL; consistent reads within transaction</description>
  <problems>Phantom reads (in standard SQL; PostgreSQL prevents these)</problems>
  <use_case>Financial transactions, reporting</use_case>
  <example>
    SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
  </example>
</concept>

<concept name="serializable">
  <description>Highest isolation; transactions appear sequential</description>
  <problems>Lower concurrency, potential deadlocks</problems>
  <use_case>Critical financial operations, inventory management</use_case>
  <example>
    SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
  </example>
</concept>

<decision_tree name="isolation_selection">
  <question>What is the consistency requirement?</question>
  <if_yes condition="Approximate data acceptable">READ UNCOMMITTED</if_yes>
  <if_yes condition="Standard OLTP">READ COMMITTED (default)</if_yes>
  <if_yes condition="Report consistency needed">REPEATABLE READ</if_yes>
  <if_yes condition="Critical financial/inventory">SERIALIZABLE</if_yes>
</decision_tree>

</isolation_levels>

<locking_patterns> Lock specific rows for update -- PostgreSQL/MySQL BEGIN; SELECT * FROM accounts WHERE id = 1 FOR UPDATE; -- Row is locked until COMMIT UPDATE accounts SET balance = balance - 100 WHERE id = 1; COMMIT;

    -- NOWAIT: fail immediately if locked
    SELECT * FROM accounts WHERE id = 1 FOR UPDATE NOWAIT;

    -- SKIP LOCKED: skip locked rows (queue processing)
    SELECT * FROM jobs WHERE status = 'pending'
    FOR UPDATE SKIP LOCKED
    LIMIT 1;
  </example>
</pattern>

<pattern name="advisory_lock">
  <description>Application-level locks (PostgreSQL)</description>
  <example>
    -- Session-level lock
    SELECT pg_advisory_lock(12345);
    -- Do work...
    SELECT pg_advisory_unlock(12345);

    -- Transaction-level lock (auto-released on commit)
    SELECT pg_advisory_xact_lock(12345);

    -- Try lock (non-blocking)
    SELECT pg_try_advisory_lock(12345);  -- returns true/false
  </example>
  <use_case>Distributed locks, rate limiting, singleton processes</use_case>
</pattern>

<pattern name="optimistic_locking">
  <description>Detect conflicts using version column</description>
  <example>
    -- Add version column
    ALTER TABLE products ADD COLUMN version INTEGER DEFAULT 0;

    -- Read with version
    SELECT id, name, price, version FROM products WHERE id = 1;
    -- version = 5

    -- Update with version check
    UPDATE products
    SET price = 29.99, version = version + 1
    WHERE id = 1 AND version = 5;

    -- If rows affected = 0, conflict occurred -> retry or error
  </example>
  <use_case>Low-contention updates, web applications</use_case>
</pattern>

<pattern name="pessimistic_locking">
  <description>Lock before reading to prevent conflicts</description>
  <example>
    BEGIN;
    SELECT * FROM inventory WHERE product_id = 1 FOR UPDATE;
    -- Check quantity
    UPDATE inventory SET quantity = quantity - 1 WHERE product_id = 1;
    COMMIT;
  </example>
  <use_case>High-contention updates, inventory management</use_case>
</pattern>

</locking_patterns>

<deadlock_prevention> Always acquire locks in same order -- Always lock lower ID first BEGIN; SELECT * FROM accounts WHERE id = 1 FOR UPDATE; SELECT * FROM accounts WHERE id = 2 FOR UPDATE; -- Transfer... COMMIT;

<pattern name="lock_timeout">
  <description>Set maximum wait time for locks</description>
  <example>
    -- PostgreSQL
    SET lock_timeout = '5s';

    -- MySQL
    SET innodb_lock_wait_timeout = 5;
  </example>
</pattern>

<pattern name="detect_and_retry">
  <description>Handle deadlock with retry logic</description>
  <example>
    -- Application code pattern (pseudocode)
    max_retries = 3
    for attempt in range(max_retries):
      try:
        execute_transaction()
        break
      except DeadlockError:
        if attempt == max_retries - 1:
          raise
        sleep(random_backoff())
  </example>
</pattern>

</deadlock_prevention>

Migration file naming conventions -- Timestamp-based (recommended) 20240115120000_create_users_table.sql 20240115120100_add_email_to_users.sql

    -- Sequential
    001_create_users_table.sql
    002_add_email_to_users.sql
  </example>
  <note>Timestamp-based prevents conflicts in team environments</note>
</pattern>

<pattern name="up_down_migrations">
  <description>Include rollback capability</description>
  <example>
    -- 20240115120000_create_users_table.sql

    -- +migrate Up
    CREATE TABLE users (
      id SERIAL PRIMARY KEY,
      email VARCHAR(255) NOT NULL
    );

    -- +migrate Down
    DROP TABLE users;
  </example>
</pattern>

<pattern name="idempotent_migrations">
  <description>Migrations that can run multiple times safely</description>
  <example>
    -- Use IF NOT EXISTS / IF EXISTS
    CREATE TABLE IF NOT EXISTS users (
      id SERIAL PRIMARY KEY,
      email VARCHAR(255) NOT NULL
    );

    CREATE INDEX IF NOT EXISTS idx_users_email ON users(email);

    ALTER TABLE users ADD COLUMN IF NOT EXISTS name VARCHAR(100);
  </example>
</pattern>

<zero_downtime> Add nullable column first, then populate -- Step 1: Add nullable column (instant in PostgreSQL 11+) ALTER TABLE users ADD COLUMN phone VARCHAR(20);

    -- Step 2: Backfill data (in batches)
    UPDATE users SET phone = '' WHERE phone IS NULL AND id BETWEEN 1 AND 1000;

    -- Step 3: Add NOT NULL constraint
    ALTER TABLE users ALTER COLUMN phone SET NOT NULL;
  </example>
</pattern>

<pattern name="add_column_with_default">
  <description>Add column with default (instant in PostgreSQL 11+)</description>
  <example>
    -- PostgreSQL 11+: instant, no table rewrite
    ALTER TABLE users ADD COLUMN active BOOLEAN DEFAULT true NOT NULL;

    -- Older versions: requires table rewrite
    -- Use nullable + backfill + NOT NULL pattern instead
  </example>
</pattern>

<pattern name="rename_column_safely">
  <description>Multi-step column rename for zero downtime</description>
  <example>
    -- Step 1: Add new column
    ALTER TABLE users ADD COLUMN full_name VARCHAR(100);

    -- Step 2: Copy data (in batches)
    UPDATE users SET full_name = name WHERE full_name IS NULL;

    -- Step 3: Deploy code reading both columns

    -- Step 4: Deploy code writing to both columns

    -- Step 5: Deploy code reading only new column

    -- Step 6: Drop old column
    ALTER TABLE users DROP COLUMN name;
  </example>
</pattern>

<pattern name="add_index_concurrently">
  <description>Create index without locking table</description>
  <example>
    -- PostgreSQL: CONCURRENTLY prevents locking
    CREATE INDEX CONCURRENTLY idx_users_email ON users(email);

    -- Note: Cannot run inside transaction
    -- May take longer but allows concurrent reads/writes
  </example>
  <warning>CONCURRENTLY can fail; check index is valid after creation</warning>
</pattern>

<pattern name="drop_column_safely">
  <description>Remove column without breaking application</description>
  <example>
    -- Step 1: Stop writing to column in application

    -- Step 2: Deploy and wait for old code to drain

    -- Step 3: Drop column
    ALTER TABLE users DROP COLUMN old_column;
  </example>
</pattern>

</zero_downtime>

<data_migration> Process large datasets in chunks -- Process in batches of 1000 DO $$ DECLARE batch_size INTEGER := 1000; rows_updated INTEGER; BEGIN LOOP UPDATE users SET email_normalized = LOWER(email) WHERE email_normalized IS NULL AND id IN ( SELECT id FROM users WHERE email_normalized IS NULL LIMIT batch_size );

        GET DIAGNOSTICS rows_updated = ROW_COUNT;
        EXIT WHEN rows_updated = 0;

        COMMIT;
        PERFORM pg_sleep(0.1);  -- Reduce load
      END LOOP;
    END $$;
  </example>
</pattern>

<pattern name="backfill_with_cursor">
  <description>Use cursor for very large tables</description>
  <example>
    DECLARE batch_cursor CURSOR FOR
      SELECT id FROM users WHERE new_column IS NULL;

    FETCH 1000 FROM batch_cursor;
    -- Process batch
    -- Repeat until no more rows
  </example>
</pattern>

</data_migration>

<context7_integration> Use Context7 MCP for up-to-date SQL documentation

<sql_libraries> </sql_libraries>

<usage_patterns> resolve-library-id libraryName="postgresql" Workflow guidance Step completed get-library-docs context7CompatibleLibraryID="/websites/postgresql" topic="window functions" Workflow guidance Step completed

<pattern name="mysql_docs">
  <step order="1">

resolve-library-id libraryName="mysql" Workflow guidance Step completed get-library-docs context7CompatibleLibraryID="/websites/dev_mysql_doc_refman_9_4_en" topic="JSON functions" Workflow guidance Step completed

<pattern name="sqlite_docs">
  <step order="1">

resolve-library-id libraryName="sqlite" Workflow guidance Step completed get-library-docs context7CompatibleLibraryID="/sqlite/sqlite" topic="query optimization" Workflow guidance Step completed </usage_patterns> </context7_integration>

PostgreSQL SQL formatter (pg_format CLI); formats SQL with consistent indentation and keyword casing pg_format --spaces 2 --keyword-case 2 input.sql SQL linter and auto-fixer supporting multiple dialects (PostgreSQL, MySQL, SQLite, BigQuery, etc.) sqlfluff lint --dialect postgres query.sql; sqlfluff fix --dialect postgres query.sql Unit testing framework for PostgreSQL; write tests in SQL using TAP protocol SELECT plan(2); SELECT has_table('users', 'users table should exist'); SELECT has_column('users', 'email', 'users should have email column'); SELECT finish();

<best_practices>

Follow Emacs sql-indent style: right-align keywords (SELECT, FROM, WHERE, GROUP BY), use leading commas for column lists Use snake_case for all identifiers (tables, columns, aliases); avoid abbreviations unless universally understood Use UPPERCASE for SQL keywords (SELECT, FROM, WHERE, JOIN, AS, OVER); lowercase for identifiers Comment WHY not WHAT; use -- for single-line, /* */ for multi-line comments Use meaningful table aliases with explicit AS keyword (FROM orders AS o, not FROM orders t1)

Never use SELECT * in production; explicitly list required columns for clarity, performance, and stability Use explicit JOIN syntax with ON clause; never use implicit comma-separated joins Use IS NULL / IS NOT NULL for NULL comparisons; never = NULL or != NULL Use COALESCE for default values (ANSI-standard); NULLIF to prevent division by zero Use CTEs for complex multi-step queries; subqueries for simple IN/EXISTS checks Prefer JOINs over correlated subqueries for performance

Create indexes on foreign key columns and frequently filtered/sorted columns Avoid functions on indexed columns in WHERE clause; restructure queries instead Use EXISTS instead of IN for large subqueries (EXISTS stops at first match) Use keyset pagination (WHERE id > last_id) over OFFSET for large datasets Batch large operations (10,000-100,000 rows per batch) to reduce lock contention Analyze query plans with EXPLAIN/EXPLAIN ANALYZE before optimizing Avoid leading wildcards in LIKE patterns (LIKE 'prefix%' is indexable, '%suffix' is not)

Use parameterized queries for ALL user input; NEVER concatenate strings into SQL Escape wildcards (%, _) in LIKE patterns when using user input Whitelist allowed values for dynamic table/column names; never use user input directly Grant minimum required permissions; use roles instead of direct user grants Encrypt sensitive data at rest (TDE) and in transit (SSL/TLS) Use dynamic data masking for non-privileged access to PII/PHI Separate admin accounts from regular application accounts

Use explicit transaction boundaries (BEGIN/COMMIT) for multi-statement operations Name constraints explicitly (CONSTRAINT fk_user_id FOREIGN KEY...) for easier migration management Store database migrations in version control with application code Use appropriate isolation levels for transaction requirements (READ COMMITTED default, SERIALIZABLE for critical) Implement soft deletes (deleted_at column) for audit trails where appropriate Test migrations on production-like data volume before deployment Document stored procedures with standard headers (author, date, parameters, change history) Use views to encapsulate common query logic; refactor repeated CTEs into views

<formatting_style name="emacs_sql_indent"> Emacs sql-indent compatible formatting conventions for SQL code Right-align SQL keywords so content starts at consistent column (SELECT at col 1, FROM indented 2 spaces, WHERE indented 1 space) Place commas at beginning of lines, aligned with first column name after SELECT Indent subquery content by 4 spaces; closing parenthesis aligns with start of subquery block Use UPPERCASE for all SQL keywords including AS, OVER, PARTITION BY, LAG, SUM, etc. Place each column on its own line for readability and cleaner version control diffs <good_example><![CDATA[ SELECT month , monthly_sales , LAG(monthly_sales) OVER (ORDER BY month) AS last_month_sales , ROUND(monthly_sales / LAG(monthly_sales) OVER (ORDER BY month), 2) AS growth_rate FROM ( SELECT DATE_FORMAT(order_date, '%Y-%m') AS month , SUM(amount) AS monthly_sales FROM orders GROUP BY month ) AS MonthlySummary;

SELECT c.customer_id , c.customer_name , COUNT(o.order_id) AS order_count , COALESCE(SUM(o.total), 0) AS total_spent FROM customers AS c LEFT JOIN orders AS o ON c.customer_id = o.customer_id WHERE c.status = 'active' AND c.created_at >= '2024-01-01' GROUP BY c.customer_id , c.customer_name HAVING COUNT(o.order_id) > 5 ORDER BY total_spent DESC LIMIT 100; ]]></good_example> <bad_example><![CDATA[ select month, monthly_sales, LAG(monthly_sales) OVER (order by month) as last_month_sales, ROUND(monthly_sales / LAG(monthly_sales) OVER (order by month), 2) as growth_rate from ( select DATE_FORMAT(order_date, '%Y-%m') as month, SUM(amount) AS monthly_sales from orders group by month ) as MonthlySummary;

SELECT c.customer_id, c.customer_name, COUNT(o.order_id) AS order_count, COALESCE(SUM(o.total), 0) AS total_spent FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id WHERE c.status = 'active' AND c.created_at >= '2024-01-01' GROUP BY c.customer_id, c.customer_name HAVING COUNT(o.order_id) > 5 ORDER BY total_spent DESC LIMIT 100; ]]></bad_example> </formatting_style> </best_practices>

<anti_patterns> Using SELECT * in production queries Explicitly list required columns for performance and clarity

Querying without appropriate indexes on filter/join columns Create indexes on columns used in WHERE, JOIN, ORDER BY Executing N+1 queries in a loop -- Bad: N+1 queries for user in users: orders = query("SELECT * FROM orders WHERE user_id = ?", user.id) Use JOIN or IN clause to fetch all data in single query Building SQL with string concatenation (SQL injection risk) Use parameterized queries/prepared statements Comparing columns with mismatched types -- Bad: string comparison prevents index usage SELECT * FROM users WHERE id = '123'; Use correct types; cast explicitly if needed Accidental cross joins from missing join conditions -- Bad: missing ON clause SELECT * FROM users, orders; Always use explicit JOIN with ON clause Excessive normalization causing too many joins Denormalize for read-heavy queries; balance with write complexity Storing monetary values as FLOAT or DOUBLE (introduces rounding errors) -- Bad: floating-point rounding errors CREATE TABLE invoices ( id SERIAL PRIMARY KEY, amount FLOAT -- 0.1 + 0.2 != 0.3 );

  -- Good: exact decimal arithmetic
  CREATE TABLE invoices (
    id SERIAL PRIMARY KEY,
    amount DECIMAL(10, 2) NOT NULL  -- or NUMERIC(10, 2)
  );

  -- Alternative: store as integer cents
  CREATE TABLE invoices (
    id SERIAL PRIMARY KEY,
    amount_cents INTEGER NOT NULL  -- 1999 = $19.99
  );
</example>
<instead>Use DECIMAL/NUMERIC for monetary values, or store as integer cents/minor units</instead>

Missing indexes on foreign key columns (causes slow joins and cascading deletes) -- Bad: FK without index (PostgreSQL does NOT auto-create indexes on FK columns) CREATE TABLE orders ( id SERIAL PRIMARY KEY, user_id INTEGER REFERENCES users(id) );

  -- Good: explicit index on FK column
  CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(id)
  );
  CREATE INDEX idx_orders_user_id ON orders(user_id);
</example>
<instead>Always create indexes on foreign key columns; MySQL does this automatically, PostgreSQL does not</instead>

Use parameterized queries for ALL user input - NEVER use string concatenation Create indexes on foreign key columns Use explicit transaction boundaries for multi-statement operations Escape wildcards in LIKE patterns when using user input Analyze query plans with EXPLAIN before optimizing Use appropriate isolation levels for transaction requirements Implement soft deletes for audit trails Name constraints explicitly for easier migration management Understand database requirements 1. Identify data model and relationships Workflow guidance Step completed 2. Determine query patterns and access frequency Workflow guidance Step completed 3. Review existing schema and indexes Workflow guidance Step completed Write efficient SQL 1. Design normalized schema (3NF baseline) Workflow guidance Step completed 2. Write queries with appropriate indexes Workflow guidance Step completed 3. Use transactions for data integrity Workflow guidance Step completed Verify SQL correctness and performance 1. Analyze with EXPLAIN Workflow guidance Step completed 2. Test with production-like data volume Workflow guidance Step completed 3. Verify transaction isolation Workflow guidance Step completed

<error_escalation> Missing index on infrequently queried column Note for future optimization, proceed Query performance degradation under load Analyze EXPLAIN output, propose index or query optimization Deadlock or lock timeout in production Stop, analyze lock patterns, present resolution options Data corruption or SQL injection vulnerability Block operation, require immediate remediation </error_escalation>

Use parameterized queries for all user input Create indexes on foreign key columns Use explicit transaction boundaries for multi-statement operations Test migrations on non-production environment first SELECT * in production queries String concatenation for SQL construction Long-running transactions holding locks Offset pagination for large datasets

<related_skills> Navigate database schema and find query patterns Fetch PostgreSQL, MySQL, SQLite documentation Debug query performance issues </related_skills>

name	SQL Ecosystem
description	This skill should be used when working with SQL databases, "SELECT", "INSERT", "UPDATE", "DELETE", "CREATE TABLE", "JOIN", "INDEX", "EXPLAIN", transactions, or database migrations. Provides comprehensive SQL patterns across PostgreSQL, MySQL, and SQLite.
version	2.0.0