with one click
ggsql
// Write ggsql queries — a grammar of graphics for SQL. Use when the user wants to create, modify, or understand a ggsql visualization query.
// Write ggsql queries — a grammar of graphics for SQL. Use when the user wants to create, modify, or understand a ggsql visualization query.
| name | ggsql |
| description | Write ggsql queries — a grammar of graphics for SQL. Use when the user wants to create, modify, or understand a ggsql visualization query. |
| allowed-tools | Bash(ggsql:*) |
| argument-hint | [description of desired visualization] |
| metadata | {"author":"George Stagg (@georgestagg)","version":"1.0"} |
| license | MIT |
ggsql is a SQL extension for declarative data visualization based on Grammar of Graphics principles. It lets users combine SQL data queries with visualization specifications in a single, composable syntax.
When the user describes a visualization they want, write a valid ggsql query. Use ONLY syntax documented below. NEVER invent clauses, settings, aesthetics, or layer types.
A ggsql query has two parts:
VISUALISE (or VISUALIZE). Everything after this is the visualization query.There are two patterns for combining SQL with VISUALISE:
The last SQL statement is a SELECT. Data flows from its result set into VISUALISE, which has no FROM clause.
SELECT name, score_a, score_b FROM 'dataset.csv' WHERE value > 50
VISUALISE score_a AS x, score_b AS y
[DRAW / PLACE / SCALE / FACET / PROJECT / LABEL clauses]
Works with any SQL that ends in a SELECT: bare SELECT, WITH...SELECT, UNION/INTERSECT/EXCEPT.
VISUALISE provides its own data source via FROM. Use when referencing a table, file, CTE, or built-in dataset directly without a trailing SELECT.
VISUALISE score_a AS x, score_b AS y FROM 'dataset.csv'
DRAW point
WITH summary AS (SELECT category, COUNT(*) AS n FROM 'dataset.csv' GROUP BY category)
VISUALISE category AS x, n AS y FROM summary
DRAW bar
Data sources can appear in VISUALISE ... FROM or DRAW ... MAPPING ... FROM:
FROM sales, FROM my_cteFROM 'data.parquet', FROM 'data.csv'FROM ggsql:penguins, FROM ggsql:airqualityMarks the start of the visualization. Optionally defines global mappings inherited by all layers.
VISUALISE <mapping>, ... FROM <data-source>
column AS aesthetic — e.g. revenue AS ycolumn — column name must match aesthetic name, e.g. x maps to x* — all columns with names matching aesthetics are mapped'red' AS fill, 42 AS size — literal values mapped to aestheticVISUALISE bill_len AS x, bill_dep AS y, species AS fill FROM ggsql:penguins
VISUALISE * FROM my_table
VISUALISE FROM ggsql:penguins
Defines a layer. Multiple DRAW clauses stack layers (first = bottom, last = top).
DRAW <layer-type>
MAPPING <mapping>, ... FROM <data-source>
REMAPPING <stat-property> AS <aesthetic>, ...
SETTING <param> => <value>, ...
FILTER <condition>
PARTITION BY <column>, ...
ORDER BY <column>, ...
All subclauses are optional if VISUALISE provides global mappings and data.
Same syntax as VISUALISE mappings. Layer mappings merge with global mappings (layer takes precedence). Can include FROM for layer-specific data.
null to prevent inheriting a global mapping: MAPPING null AS colorFor statistical layers (histogram, density, boxplot, violin, smooth, bar without y). Maps calculated statistics to aesthetics. Each layer documents its available stats and default remapping.
DRAW histogram
MAPPING body_mass AS x
REMAPPING density AS y -- use density instead of default count
Set literal aesthetic values or layer parameters. Aesthetics set here bypass scales.
DRAW point
SETTING size => 5, opacity => 0.7, stroke => 'red'
Position adjustment is a special setting:
SETTING position => 'identity' -- no adjustment (default for most)
SETTING position => 'stack' -- stack (default for bar, histogram, area)
SETTING position => 'dodge' -- side by side (default for boxplot, violin)
SETTING position => 'jitter' -- random offset
Aggregate collapses each group to a single row, replacing every numeric mapping in place with its aggregated value. Groups = PARTITION BY columns + all discrete mappings. Supported by point, line, path, bar, area, ribbon, range, segment, rule, text, tile. Not supported by histogram, density, smooth, boxplot, violin (they have their own stats).
SETTING aggregate => '<spec>' -- single
SETTING aggregate => ('<spec>', '<spec>', …) -- list
Each <spec> is either:
'<func>'. Applies to every numeric mapping without an explicit target. With two untargeted defaults, the first applies to lower-side aesthetics (x/xmin/etc.) plus all non-range layers, the second to upper-side (xend/xmax). More than two untargeted defaults is an error.'<aes>:<func>'. Applies func to the named aesthetic only. Overrides any untargeted default for that aesthetic.Functions:
count, sum, prod, min, max, range (max−min), mid ((min+max)/2), mean, median, geomean, harmean, rms, sdev, var, iqr, se, p05–p95.ORDER BY for deterministic order): first, last, diff (last − first).<offset>±[<mult>]<expansion>, e.g. 'mean+1.96sdev', 'median-iqr'. Offsets: mean, median, geomean, harmean, rms, sum, prod, min, max, mid, p05–p95. Expansions: sdev, se, var, iqr, range.Explosion — targeting the same aesthetic with multiple functions emits one row per function per group. A synthetic aggregate column tags each row with the function name. Use REMAPPING aggregate AS <aes> to drive another aesthetic from it. When several aesthetics are exploded with the same length, they explode in lockstep (row 1 = each target's first function, row 2 = second, …); single-function targets are reused on every row. Mixing target lengths > 1 is an error.
-- min/max envelope as two lines per group, coloured by function
DRAW line
MAPPING Date AS x, Temp AS y
REMAPPING aggregate AS color
SETTING aggregate => ('y:min', 'y:max')
PARTITION BY Year
Scale interaction — for an aesthetic that is targeted by aggregate, SCALE BINNED <aes> runs after aggregation (otherwise the diff/mean/etc. would cancel within a bin). Untargeted SCALE BINNED still bins pre-aggregate so the bins can drive grouping. Continuous censoring (SCALE <aes> FROM (lo, hi)) and discrete OOB filtering defer to post-aggregate whenever the aesthetic is being aggregated (targeted or untargeted default).
SQL WHERE condition applied to layer data. Content is passed to the database:
DRAW point
FILTER sex = 'female' AND body_mass > 4000
Additional grouping columns beyond mapped discrete aesthetics:
DRAW line
MAPPING Day AS x, Temp AS y
PARTITION BY Month
Controls record order (important for path layers):
DRAW path
ORDER BY timestamp
Creates annotation layers with literal values only (no data mappings). Supports tuples for multiple annotations.
PLACE <layer-type>
SETTING <aesthetic/param> => <value>, ...
PLACE point SETTING x => 5, y => 10, color => 'red'
PLACE rule SETTING y => 70, linetype => 'dotted'
PLACE text SETTING x => (34, 44), y => (66, 49), label => ('Mean = 34', 'Mean = 44')
Controls how data values are translated to aesthetic values. Sensible defaults are always provided.
SCALE <type> <aesthetic> FROM <input-range> TO <output-range> VIA <transform>
SETTING <param> => <value>, ...
RENAMING <value> => <label>, ...
All parts except aesthetic are optional.
CONTINUOUS — continuous numeric/temporal dataDISCRETE — categorical/string dataBINNED — bin continuous data into discrete groups (never auto-selected, must be explicit)ORDINAL — ordered discrete data (never auto-selected, must be explicit)IDENTITY — pass data through unchanged (no legend created)If omitted, type is inferred from data.
Use the base name: x, y, fill, stroke, color (sets both fill and stroke), opacity, size, linewidth, linetype, shape, panel (facet), row, column.
For position families (xmin/xmax/xend/ymin/ymax/yend), scale with the base name: SCALE x ...
FROM (min, max) — use null to infer from data: FROM (0, null)FROM ('A', 'B', 'C') — controls order, omitted values are nulledFROM ('Torgersen', 'Biscoe', null)TO ('red', 'blue', 'green'), TO (1, 6)TO viridis, TO dark2, TO tableau10Continuous transforms: linear, log, log2, ln, exp10, exp2, exp, sqrt, square, asinh, pseudo_log, pseudo_log2, pseudo_ln, integer
Temporal transforms: date, datetime, time — automatically chosen for date/datetime/time columns.
Discrete transforms: string, bool
SCALE x VIA date -- treat x as temporal
SCALE y VIA log -- log transform
SCALE size VIA square -- scale by radius not area
Continuous/binned scales:
expand — expansion factor, scalar or (mult, add). Default 0.05. Only for x/y.oob — out-of-bounds: 'keep' (default for x/y), 'censor' (default for others), 'squish'breaks — integer count, array of values, or interval string for temporal (e.g. '2 months', 'week')pretty — boolean, default true. Use Wilkinson's algorithm for nice breaks.reverse — boolean, default false. Reverse scale direction.Binned scales additionally:
closed — 'left' (default) or 'right'Discrete/ordinal scales:
reverse — booleanSCALE x SETTING breaks => '2 months'
SCALE y FROM (0, 100) SETTING oob => 'squish'
SCALE BINNED x SETTING breaks => 10, pretty => false
Rename break labels. Direct renaming, wildcard formatting, or both (direct takes priority):
RENAMING 'Adelie' => 'Pygoscelis adeliae', 'adelie' => null -- direct / suppress
RENAMING * => '{} mm' -- string interpolation
RENAMING * => '{:Title}' -- formatters: Title, UPPER, lower, time %B %Y, num %.1f
Split data into small multiples.
FACET <column> BY <column>
SETTING <param> => <value>, ...
FACET region — wrap layout, aesthetic name is panelFACET region BY category — grid layout, aesthetics are row and columnfree — null (default/fixed), 'x', 'y', or ('x', 'y') for independent scalesmissing — 'repeat' (default, show layer in all panels) or 'null' (only show in null panel)ncol/nrow — layout dimensions for 1D faceting (only one allowed)Use SCALE on the facet aesthetic:
FACET region
SCALE panel
RENAMING 'N' => 'North', 'S' => 'South'
Use SCALE FROM to select which panels to show:
FACET island
SCALE panel FROM ('Biscoe', 'Dream')
Controls the coordinate system.
PROJECT <aesthetic>, ... TO <coord-type>
SETTING <param> => <value>, ...
cartesian (default) — horizontal x, vertical y
clip (boolean, default true), ratio (aspect ratio number or null)x, ypolar — angle + radius from center
clip, start (degrees, default 0 = 12 o'clock), end (degrees, default start+360), inner (0-1 proportion for donut hole, default 0)radius (primary), angle (secondary)Swap aesthetic order to flip axes: PROJECT y, x TO cartesian. If no PROJECT clause, coordinate type is inferred from mappings (x/y = cartesian, radius/angle = polar).
PROJECT TO polar SETTING inner => 0.5 -- donut chart
PROJECT TO polar SETTING start => -90, end => 90 -- half-circle gauge
Override default axis/legend labels and add titles.
LABEL
<aesthetic/title> => <string>, ...
Available labels:
title — main titlesubtitle — subtitle below titlecaption — text below the plotx, y, fill, color, etc.null to suppress a label: fill => nullLABEL
title => 'Sales by Region',
subtitle => 'Q4 2024 data',
x => 'Date',
y => 'Revenue (USD)',
fill => 'Region',
caption => 'Source: internal sales database'
Scatterplot. Required: x, y. Optional: size, colour, stroke, fill, opacity, shape.
Line plot sorted along primary axis. Required: x, y. Optional: colour/stroke, opacity, linewidth, linetype. Settings: position, orientation ('aligned'/'transposed').
Like line but connects points in data order (not sorted). Same aesthetics as line.
Bar chart. Auto-counts if y not provided. Optional: x (categories), y (height), fill, colour, stroke. Stats: count, proportion. Properties: weight. Settings: position (default 'stack'), width (0-1). Orientation inferred from mapping (categories on x = vertical, on y = horizontal).
DRAW bar MAPPING species AS x -- auto-count
DRAW bar MAPPING species AS x, total AS y -- pre-computed
DRAW bar MAPPING species AS x, sex AS fill -- stacked (default)
SETTING position => 'dodge' -- side by side
Bins continuous data. Required: x. Stats: count, density. Default remapping: count AS <secondary>. Settings: position (default 'stack'), bins (default 30), binwidth, closed ('left'/'right').
DRAW histogram MAPPING body_mass AS x SETTING binwidth => 100
DRAW histogram MAPPING body_mass AS x REMAPPING density AS y -- density instead of count
Kernel density estimation. Required: x. Stats: density, intensity. Settings: position (default 'identity'), bandwidth, adjust (default 1), kernel ('gaussian' default, 'epanechnikov', 'triangular', 'rectangular', 'biweight', 'cosine').
Five-number summary with outliers. Required: x (categorical), y (continuous). Stats: type, value. Settings: position (default 'dodge'), outliers (default true), coef (whisker IQR multiple, default 1.5), width (default 0.9).
Mirrored kernel density for groups. Required: x (categorical), y (continuous). Stats: density, intensity. Default remapping: density AS offset. Settings: position (default 'dodge'), bandwidth, adjust, kernel (same as density), width (default 0.9), side ('both'/'left'/'bottom'/'right'/'top'), tails (number or null, default 3).
Trendline. Required: x, y. Stats: intensity. Settings: method ('nw' default, 'ols', 'tls'), bandwidth, adjust, kernel (same as density, nw only).
Area chart anchored at zero. Required: x, y. Settings: position (default 'stack'), orientation, total (normalize stacks), center (boolean, for steamgraph).
Like area but with explicit ymin/ymax (unanchored). Required: x, ymin, ymax.
Line segments between two endpoints. Required: x, y, xend, yend. For axis-aligned intervals where one coordinate is shared between start and end, use range instead.
Reference lines spanning the full panel. Required: x or y. Optional: slope (for diagonal: y = a + slope * x).
Text labels. Required: x, y, label. Settings: offset (number or (h, v)), format (string interpolation like RENAMING). hjust: 'left'/'right'/'centre' or 0-1. vjust: 'top'/'bottom'/'middle' or 0-1.
Rectangles. Required: pick 2 per axis from center (x/y), min (xmin/ymin), max (xmax/ymax), width, height. Or just center (defaults width/height to 1).
Closed shapes from ordered coordinates. Required: x, y. Use PARTITION BY to separate distinct polygons.
Range/interval display between two values along the secondary axis. Required: x, ymin, ymax. Settings: width (hinge width in points, default 10, null to hide).
All layers accept common optional aesthetics (colour/stroke, fill, opacity, linewidth, linetype) and position setting where applicable.
ggsql10 (default), tableau10, category10, set1, set2, set3, dark2, paired, pastel1, pastel2, accent, kelly22sequential (default), viridis, plasma, magma, inferno, cividis, blues, greens, oranges, reds, purples, greys, ylgnbu, ylorbr, ylorrd, batlow, hawaii, lajolla, turku, and morevik/diverging, rdbu, rdylbu, rdylgn, spectral, brbg, prgn, piyg, puor, berlin, roma, and moreromao/cyclic, bamo, broco, corko, viko-- Pie chart
VISUALISE species AS fill FROM ggsql:penguins
DRAW bar
PROJECT TO polar
-- Horizontal bar chart
DRAW bar MAPPING species AS y
-- Multi-series line chart
VISUALISE Date AS x
DRAW line MAPPING Temp AS y, 'Temperature' AS color
DRAW line MAPPING Ozone AS y, 'Ozone' AS color
SCALE x VIA date
-- Lollipop chart
SELECT ROUND(bill_dep) AS bill_dep, COUNT(*) AS n FROM ggsql:penguins GROUP BY 1
VISUALISE bill_dep AS x
DRAW range MAPPING 0 AS ymin, n AS ymax SETTING width => null
DRAW point MAPPING n AS y
-- Ridgeline / joy plot
VISUALISE Temp AS x, Month AS y FROM ggsql:airquality
DRAW violin SETTING width => 4, side => 'top'
SCALE ORDINAL y
-- Bar labels
SELECT island, COUNT(*) AS n FROM ggsql:penguins GROUP BY island
VISUALISE island AS x, n AS y
DRAW bar
DRAW text MAPPING n AS label SETTING vjust => 'top', offset => (0, -11), fill => 'white'
-- CTEs with separate layer data
WITH temps AS (SELECT Date, Temp as value FROM ggsql:airquality),
ozone AS (SELECT Date, Ozone as value FROM ggsql:airquality WHERE Ozone IS NOT NULL)
VISUALISE
DRAW line MAPPING Date AS x, value AS y, 'Temperature' AS color FROM temps
DRAW point MAPPING Date AS x, value AS y, 'Ozone' AS color FROM ozone
SCALE x VIA date
-- Per-week summary: open/close range, weekly temperature change (binned post-aggregate)
VISUALISE Date AS x, Temp AS ymin, Temp AS ymax, Temp AS color
FROM ggsql:airquality
DRAW range
SETTING aggregate => ('x:first', 'ymin:first', 'ymax:last', 'color:diff'),
width => null
PARTITION BY Week
SCALE BINNED color
-- Mean ± 1.96·sdev band per group, drawn as a ribbon
VISUALISE Day AS x, Temp AS ymin, Temp AS ymax FROM ggsql:airquality
DRAW ribbon
SETTING aggregate => ('mean-1.96sdev', 'mean+1.96sdev')
PARTITION BY Month
The ggsql CLI should be on the PATH. Subcommands: exec <QUERY>, run <FILE>, validate <QUERY>, parse <QUERY>. Common options: --reader <URI> (default duckdb://memory), --writer <FORMAT> (default vegalite), --output <PATH>, -v (verbose).
ggsql validate "VISUALISE x, y FROM data DRAW point"
ggsql exec "VISUALISE bill_len AS x, bill_dep AS y FROM ggsql:penguins DRAW point" -v
ggsql run query.sql --output chart.vl.json
ggsql:penguins or ggsql:airquality as example data when no specific data is mentioned.ggsql validate "<query>". When the user wants to see the output, use ggsql exec "<query>" -v.