com um clique
com um clique
| name | csv-query |
| description | Run SQL queries against CSV/TSV/Excel files using Polars SQL engine |
| user-invocable | true |
| argument-hint | <file> [query] |
| allowed-tools | ["mcp__qsv__qsv_sniff","mcp__qsv__qsv_count","mcp__qsv__qsv_headers","mcp__qsv__qsv_index","mcp__qsv__qsv_stats","mcp__qsv__qsv_frequency","mcp__qsv__qsv_search","mcp__qsv__qsv_select","mcp__qsv__qsv_sqlp","mcp__qsv__qsv_command","mcp__qsv__qsv_to_parquet","mcp__qsv__qsv_list_files","mcp__qsv__qsv_search_tools","mcp__qsv__qsv_get_working_dir","mcp__qsv__qsv_set_working_dir"] |
Query tabular data files using SQL via the Polars-powered sqlp command.
Cowork note: If relative paths don't resolve, call
mcp__qsv__qsv_get_working_dirandmcp__qsv__qsv_set_working_dirto sync the working directory.
Is the query simple (single column filter, basic select)?
select + search for simpler operationssqlp for full SQL supportDoes the query involve joins, GROUP BY, window functions, or complex expressions?
sqlp (Polars SQL engine)Is the CSV file very large (> 10MB)?
mcp__qsv__qsv_to_parquet for faster repeated queries. Note: sqlp can also query CSV files of any size directly.Prepare the file: Run mcp__qsv__qsv_index and mcp__qsv__qsv_stats with cardinality: true, stats_jsonl: true to create index and stats cache.
Read the stats cache: Read <FILESTEM>.stats.csv (e.g., data.stats.csv for data.csv) to understand column metadata before writing SQL. This is the most important step for writing efficient queries.
Run frequency on key columns: For columns you plan to GROUP BY, filter on, or join on, run mcp__qsv__qsv_frequency to see actual value distributions. This reveals the best filter values and whether a GROUP BY will produce a manageable result set.
Write and run SQL: Use mcp__qsv__qsv_sqlp with the SQL query informed by stats and frequency data. The table name in SQL is the filename stem (e.g., data.csv -> SELECT * FROM data). For Parquet files, use read_parquet('data.parquet') as the table source instead.
Refine if needed: Check results and adjust the query.
After reading the .stats.csv cache, use these columns to inform your SQL:
| Stats Column | How to Use in SQL |
|---|---|
type | Use correct casts and comparisons — don't quote integers, use date functions for Date/DateTime columns |
min / max | Write precise WHERE clauses using actual data range (e.g., WHERE price BETWEEN 10.5 AND 999.99 instead of arbitrary bounds) |
cardinality | Estimate GROUP BY result size — low cardinality (< 100) is fast; high cardinality (> 10K) may need LIMIT or a different approach |
nullcount | Only add COALESCE or IS NOT NULL where nullcount > 0 — skip null handling for columns with zero nulls |
sort_order | Skip ORDER BY if data is already sorted on that column (sort_order = "Ascending"/"Descending") |
mean / stddev | Write outlier filters: WHERE col BETWEEN mean - 3*stddev AND mean + 3*stddev |
median / q1 / q3 | For skewed data (when mean and median diverge), use quartile-based ranges: WHERE col BETWEEN q1 AND q3 instead of mean ± stddev |
skewness | If skewness > 1 or < -1, prefer median/quartile-based filters over mean-based ones |
cv | High CV (> 100%) signals high relative variability — add LIMIT to GROUP BY queries and consider binning continuous values |
outliers_percentage | If > 5%, consider excluding outliers before aggregation: WHERE col BETWEEN lower_inner_fence AND upper_inner_fence |
sparsity | Columns with sparsity > 0.5 are mostly null — avoid using them as join keys or GROUP BY columns |
Run mcp__qsv__qsv_frequency with select: "col", limit: 20 before writing WHERE clauses on categorical columns:
frequency shows "active" has 90% of rows, filtering on WHERE status = 'active' is wasteful — filter on the rare values insteadWHERE category IN ('A','B','C'), check frequency first to confirm those values exist and see if you're missing anyThe sqlp command uses Polars SQL dialect:
-- Basic select
SELECT col1, col2 FROM data WHERE col1 > 100
-- Aggregation
SELECT category, COUNT(*) as cnt, AVG(price) as avg_price
FROM data GROUP BY category ORDER BY cnt DESC
-- Window functions
SELECT *, ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) as rank
FROM employees
-- String operations
SELECT * FROM data WHERE col1 LIKE '%pattern%'
-- Date operations
SELECT *, EXTRACT(YEAR FROM date_col) as year FROM data
-- Multiple files (join)
SELECT a.*, b.name FROM file1 a JOIN file2 b ON a.id = b.id
-- CASE expressions
SELECT *, CASE WHEN amount > 1000 THEN 'high' ELSE 'low' END as tier FROM data
sales_2024.csv -> Table: sales_2024my-data.csv -> Table: "my-data" (quote if contains special chars)sqlp uses the Polars engine - some PostgreSQL-specific syntax may not be supported--output file.csv for large result setsLIMIT to preview large result sets before running full queriessqlp can query multiple CSV files in a single SQL statement (useful for joins)Standard workflow order, tool selection matrix, and composition patterns for qsv CSV data wrangling
Respond to all pending review comments on the current PR — fetch comments, apply fixes, verify accuracy, test, commit, and reply. Use when addressing Copilot reviews, GitHub PR reviews, or any batch of review feedback.
Prepare an MCP server and plugin release by bumping versions across all files and updating changelog
Clean a CSV/TSV/Excel file - fix headers, trim whitespace, remove duplicates, validate
Convert between CSV, TSV, Excel, JSONL, Parquet, and other tabular formats
Generate AI-powered Data Dictionary, Description, and Tags for a CSV/TSV/Excel file