بنقرة واحدة
explain-plan
// Visualize and analyze DataFusion query execution plans. Shows logical and physical plans, identifies performance bottlenecks, and suggests optimizations. Supports EXPLAIN and EXPLAIN ANALYZE.
// Visualize and analyze DataFusion query execution plans. Shows logical and physical plans, identifies performance bottlenecks, and suggests optimizations. Supports EXPLAIN and EXPLAIN ANALYZE.
Register a data file as a persistent external table in the DataFusion session. Supports Parquet, CSV, JSON, Arrow IPC, and Avro files. Explores the schema and writes to the session state file for reuse across skills.
Search Apache DataFusion documentation, user guide, and API reference. Returns relevant documentation for a question or keyword. Searches the official DataFusion repository and website.
Install or update datafusion-cli. Supports installation via cargo install, Homebrew, or pre-built binaries. Checks the current version and offers to upgrade if outdated.
Create and manage materialized views using DataFusion. Persist SQL query results as Parquet files for fast repeated access. Track source dependencies and refresh when data changes. Powered by datafusion-cli's COPY TO.
Run SQL queries against registered tables or ad-hoc against files using datafusion-cli. Accepts raw SQL or natural language questions. Supports Parquet, CSV, JSON, and Arrow IPC files.
Read and explore data files (Parquet, CSV, JSON, Arrow IPC, Avro) locally or from S3/GCS. Auto-detects format by extension. Uses datafusion-cli for schema inspection and data preview.
| name | explain-plan |
| description | Visualize and analyze DataFusion query execution plans. Shows logical and physical plans, identifies performance bottlenecks, and suggests optimizations. Supports EXPLAIN and EXPLAIN ANALYZE. |
| argument-hint | <SQL query> [--analyze] |
| allowed-tools | Bash |
You are helping the user understand and optimize query execution plans in Apache DataFusion.
Input: $@
command -v datafusion-cli
If not found, delegate to /datafusion-skills:install-datafusion.
STATE_DIR=""
test -f .datafusion-skills/state.sql && STATE_DIR=".datafusion-skills"
PROJECT_ROOT="$(git rev-parse --show-toplevel 2>/dev/null || echo "$PWD")"
PROJECT_ID="$(echo "$PROJECT_ROOT" | tr '/' '-')"
test -f "$HOME/.datafusion-skills/$PROJECT_ID/state.sql" && STATE_DIR="$HOME/.datafusion-skills/$PROJECT_ID"
--analyze is present → use EXPLAIN ANALYZE (actually runs the query, shows real metrics)EXPLAIN (shows the plan without execution)Extract the SQL query (remove --analyze flag if present).
If the input is natural language, generate SQL first (see /datafusion-skills:query for SQL generation guidelines).
Physical plan (default — shows the execution plan as a visual tree):
datafusion-cli ${STATE_DIR:+--file "$STATE_DIR/state.sql"} -c "
EXPLAIN $SQL;
"
Verbose plan (full optimizer trace — initial logical plan, each optimization pass, initial physical plan, final physical plan with stats and schema):
datafusion-cli ${STATE_DIR:+--file "$STATE_DIR/state.sql"} -c "
EXPLAIN VERBOSE $SQL;
"
With actual metrics (if --analyze) (runs the query, reports per-operator row counts, timing, memory, spill stats):
datafusion-cli ${STATE_DIR:+--file "$STATE_DIR/state.sql"} -c "
EXPLAIN ANALYZE $SQL;
"
Parse the execution plan output and provide insights:
Full table scans → Look for TableScan without pushdown predicates
WHERE clauses or partitioningSort operations → SortExec or SortPreservingMergeExec
Hash joins vs merge joins → HashJoinExec vs SortMergeJoinExec
Repartitioning → RepartitionExec
Projection pushdown → Check if only needed columns are read
Predicate pushdown → Check if filters are pushed to the scan level
predicate in TableScan nodesCoalesce partitions → CoalescePartitionsExec
Structure the analysis as:
Brief description of what the plan does.
Present the plan as an indented tree (already DataFusion's default output format).
Actionable suggestions, such as:
If relevant, suggest DataFusion configuration changes:
-- Increase target partitions for more parallelism
SET datafusion.execution.target_partitions = 8;
-- Increase batch size for throughput
SET datafusion.execution.batch_size = 16384;
-- Enable/disable optimizations
SET datafusion.optimizer.enable_round_robin_repartition = true;
To explore these settings, try
/datafusion-skills:datafusion-docs configuration options.