| name | sql-workflow |
| description | Use this skill before writing any SQL query. Covers: output shape inference (cardinality clues from the question), efficient schema exploration, iterative CTE-based query building, structured verification loop (row count, NULL audit, fan-out check, sample inspection), error recovery protocol, saving output to result.sql and result.csv, turn budget management, and common benchmark traps. |
| type | skill |
SQL Workflow Skill
1. Schema Exploration — Do This First
Before writing any SQL, understand the data:
- Read local schema files first (if schema/ directory exists in workdir):
schema/DDL.csv — all CREATE TABLE statements (if it exists)
schema/{table_name}.json — column names, types, descriptions, sample values
Reading these files costs zero tool calls and gives you table structure + sample data.
Only call MCP tools for information not in the local files (e.g., row counts, live data exploration).
- Call
list_tables to get all schemas and tables — only if no local schema files exist or you need row counts.
- Call
describe_table on the tables that seem relevant to the question (only if JSON files lack detail)
- Call
explore_column on categorical columns to see distinct values (for filtering/grouping)
- Call
find_join_path if you need to join tables and the relationship is unclear
Stop exploring after 3-5 tool calls. Write SQL based on what you've found.
2. Output Shape Inference — Before Writing SQL
Read the task question carefully for cardinality clues:
- "for each X" → GROUP BY X, one output row per X
- "top N" / "top 5" → LIMIT N or QUALIFY RANK() <= N
- "total / sum / average" → single row aggregate
- "list all" → detail rows, no aggregation
- "how many" → COUNT, result is 1 row 1 column
Write a comment at the top of your SQL:
Critical checks:
- If the question asks for a single number, the result MUST be 1 row × 1 column
- If the question says "how many", verify the CSV has exactly 1 row with a COUNT value
- If "top N" appears in the question, verify the CSV has at most N rows
3. Iterative Query Building — Build Bottom-Up
Do NOT write a 50-line query and run it all at once:
- Write the innermost subquery or first CTE first
- Run it standalone with
query_database — verify row count and sample values
- Add the next CTE, verify again
- Continue until the full query is built
Example incremental pattern:
SELECT COUNT(*) FROM orders WHERE status = 'completed';
SELECT COUNT(*), COUNT(DISTINCT customer_id) FROM orders;
WITH order_totals AS (
SELECT customer_id, SUM(amount) AS total
FROM orders
GROUP BY customer_id
)
SELECT COUNT(*), COUNT(DISTINCT customer_id) FROM order_totals;
4. Execution and Structured Verification
mcp__signalpilot__query_database
connection_name="<task_connection_name>"
sql="SELECT ..."
After executing, run these checks IN ORDER before saving:
- Row count sanity: Does 0 rows make sense? Does 1M rows make sense for a "top 10" question?
- Column count: Does the result have the right number of columns for the question?
- NULL audit: For each key column — unexpected NULLs indicate wrong JOINs:
SELECT COUNT(*) - COUNT(col) AS nulls FROM (your_query) t
- Sample inspection: Look at 5 rows — are values in expected ranges? Do string columns have meaningful values (not join keys)?
- Fan-out check: If JOINing, compare
COUNT(*) vs COUNT(DISTINCT primary_key):
SELECT COUNT(*) AS total_rows, COUNT(DISTINCT <pk>) AS unique_keys FROM (your_query) t;
If they differ, you have duplicate rows from a fan-out JOIN.
- Re-read the question: Does your output actually answer what was asked?
5. Error Recovery Protocol
6. Saving Output
Once you have the correct result:
-
Write final SQL to result.sql:
Write tool: path="result.sql", content="<your SQL query>"
-
Write the result as CSV to result.csv:
Write tool: path="result.csv", content="col1,col2,...\nval1,val2,..."
- Always include a header row with column names
- Use comma as delimiter
- Quote string values that contain commas or newlines
7. Turn Budget Management
- First 3 turns: Schema exploration only (
schema_overview, describe_table on 2-3 tables, explore_column on key categorical columns). STOP exploring.
- Turns 4 through (N-3): Write query iteratively — execute and verify each step.
- Last 3 turns: Finalize
result.sql and result.csv. If you have a working query, SAVE IT NOW — do not keep iterating.
If your query works and passes all verification checks, SAVE IMMEDIATELY — do not continue exploring "just in case".
8. Common Benchmark Traps
- Rounding: Do NOT round unless the question explicitly asks for rounded values. The evaluator uses tolerance-based comparison — full precision is always safer.
- Column naming: Match the question's phrasing exactly. If the question says "total revenue", name the column
total_revenue, not sum_revenue or revenue_total.
- CSV format: No trailing newline, no BOM, comma delimiter, double-quote strings containing commas.
- Empty result: If the correct answer is 0 or empty, write a CSV with just the header row (or header + "0").
- Date/time format in CSV: Use ISO 8601 (
YYYY-MM-DD) unless the question specifies otherwise.
- String case in CSV: Preserve the case from the database — do not uppercase/lowercase unless the question explicitly asks.
- Fan-out from JOINs: Always check
COUNT(*) vs COUNT(DISTINCT key) after every JOIN
- Wrong NULL handling: Use
IS NULL / IS NOT NULL, not = NULL
- Date format mismatch: Check the actual format stored in the column with
explore_column
- Case sensitivity: Use the correct case-insensitive function for your backend
- Interpretation errors: Before saving, re-read the original question. Verify:
- Filter conditions match domain values (check with explore_column if unsure)
- "Excluding X" means the right thing (NOT IN vs EXCEPT vs WHERE NOT)
- Metrics match domain definitions (e.g., "scored points" in F1 = points > 0, not just participated)