تشغيل أي مهارة في Manus بنقرة واحدة

$pwd:

explain-plan

Name: Explain Plan
Author: datafusion-contrib

// Visualize and analyze DataFusion query execution plans. Shows logical and physical plans, identifies performance bottlenecks, and suggests optimizations. Supports EXPLAIN and EXPLAIN ANALYZE.

تشغيل في Manus

$ git log --oneline --stat

stars:١٢

forks:٠

updated:٢١ مارس ٢٠٢٦ في ٠٧:٥٩

SKILL.md

readonly

name	explain-plan
description	Visualize and analyze DataFusion query execution plans. Shows logical and physical plans, identifies performance bottlenecks, and suggests optimizations. Supports EXPLAIN and EXPLAIN ANALYZE.
argument-hint	<SQL query> [--analyze]
allowed-tools	Bash

You are helping the user understand and optimize query execution plans in Apache DataFusion.

Input: $@

Step 1 — Check datafusion-cli is installed

command -v datafusion-cli

If not found, delegate to /datafusion-skills:install-datafusion.

Step 2 — Resolve state

STATE_DIR=""
test -f .datafusion-skills/state.sql && STATE_DIR=".datafusion-skills"
PROJECT_ROOT="$(git rev-parse --show-toplevel 2>/dev/null || echo "$PWD")"
PROJECT_ID="$(echo "$PROJECT_ROOT" | tr '/' '-')"
test -f "$HOME/.datafusion-skills/$PROJECT_ID/state.sql" && STATE_DIR="$HOME/.datafusion-skills/$PROJECT_ID"

Step 3 — Determine the mode

If --analyze is present → use EXPLAIN ANALYZE (actually runs the query, shows real metrics)
Otherwise → use EXPLAIN (shows the plan without execution)

Extract the SQL query (remove --analyze flag if present).

If the input is natural language, generate SQL first (see /datafusion-skills:query for SQL generation guidelines).

Step 4 — Run EXPLAIN

Physical plan (default — shows the execution plan as a visual tree):

datafusion-cli ${STATE_DIR:+--file "$STATE_DIR/state.sql"} -c "
EXPLAIN $SQL;
"

Verbose plan (full optimizer trace — initial logical plan, each optimization pass, initial physical plan, final physical plan with stats and schema):

datafusion-cli ${STATE_DIR:+--file "$STATE_DIR/state.sql"} -c "
EXPLAIN VERBOSE $SQL;
"

With actual metrics (if --analyze) (runs the query, reports per-operator row counts, timing, memory, spill stats):

datafusion-cli ${STATE_DIR:+--file "$STATE_DIR/state.sql"} -c "
EXPLAIN ANALYZE $SQL;
"

Step 5 — Analyze the plan

Parse the execution plan output and provide insights:

Key things to look for:

Full table scans → Look for TableScan without pushdown predicates
- Suggest adding WHERE clauses or partitioning
- Check if filter pushdown is happening
Sort operations → SortExec or SortPreservingMergeExec
- Expensive for large datasets
- Suggest pre-sorting data or using sorted Parquet files
Hash joins vs merge joins → HashJoinExec vs SortMergeJoinExec
- Hash joins need memory for the build side
- Suggest which table should be the build side (smaller table)
Repartitioning → RepartitionExec
- Shows data shuffling between partitions
- Can be expensive for large datasets
Projection pushdown → Check if only needed columns are read
- DataFusion should push projections down to the scan
Predicate pushdown → Check if filters are pushed to the scan level
- Look for predicate in TableScan nodes
Coalesce partitions → CoalescePartitionsExec
- Merging partitions back to single partition
- Expected at the top of the plan

For EXPLAIN ANALYZE, additionally check:

Row counts at each stage → identify data amplification or reduction
Execution time per operator → find the bottleneck
Memory usage → identify memory-intensive operations

Step 6 — Present findings

Structure the analysis as:

Query Plan Summary

Brief description of what the plan does.

Plan Visualization

Present the plan as an indented tree (already DataFusion's default output format).

Performance Analysis

Bottlenecks: Operations that are likely slowest
Optimizations applied: Filter pushdown, projection pushdown, etc.
Opportunities: Suggestions for improving performance

Recommendations

Actionable suggestions, such as:

Add indexes or sort data
Rewrite the query to enable better pushdown
Adjust DataFusion configuration options
Use partitioned data layout

Step 7 — Suggest configuration tuning

If relevant, suggest DataFusion configuration changes:

-- Increase target partitions for more parallelism
SET datafusion.execution.target_partitions = 8;

-- Increase batch size for throughput
SET datafusion.execution.batch_size = 16384;

-- Enable/disable optimizations
SET datafusion.optimizer.enable_round_robin_repartition = true;

To explore these settings, try /datafusion-skills:datafusion-docs configuration options.

related-skills.json

نفس المستودع

create-table.md

from "datafusion-contrib/datafusion-skills"

Register a data file as a persistent external table in the DataFusion session. Supports Parquet, CSV, JSON, Arrow IPC, and Avro files. Explores the schema and writes to the session state file for reuse across skills.

2026-03-2112

datafusion-docs.md

from "datafusion-contrib/datafusion-skills"

Search Apache DataFusion documentation, user guide, and API reference. Returns relevant documentation for a question or keyword. Searches the official DataFusion repository and website.

2026-03-2112

install-datafusion.md

from "datafusion-contrib/datafusion-skills"

Install or update datafusion-cli. Supports installation via cargo install, Homebrew, or pre-built binaries. Checks the current version and offers to upgrade if outdated.

2026-03-2112

materialized-view.md

from "datafusion-contrib/datafusion-skills"

Create and manage materialized views using DataFusion. Persist SQL query results as Parquet files for fast repeated access. Track source dependencies and refresh when data changes. Powered by datafusion-cli's COPY TO.

2026-03-2112

query.md

from "datafusion-contrib/datafusion-skills"

Run SQL queries against registered tables or ad-hoc against files using datafusion-cli. Accepts raw SQL or natural language questions. Supports Parquet, CSV, JSON, and Arrow IPC files.

2026-03-2112

read-file.md

from "datafusion-contrib/datafusion-skills"

Read and explore data files (Parquet, CSV, JSON, Arrow IPC, Avro) locally or from S3/GCS. Auto-detects format by extension. Uses datafusion-cli for schema inspection and data preview.

2026-03-2112

package.json

"author": "datafusion-contrib"

"repository": "datafusion-contrib/datafusion-skills"

فتح مستودع GitHub عرض مستودعات المنشئ

$ install --global

$ download --local

تشغيل في Manus

$ useful --forSOC

مديرو قواعد البياناتمهن الحاسوب والرياضيات15-1242L4

STATE_DIR="" test -f .datafusion-skills/state.sql && STATE_DIR=".datafusion-skills" PROJECT_ROOT="$(git rev-parse --show-toplevel 2>/dev/null || echo "$PWD")" PROJECT_ID="$(echo "$PROJECT_ROOT" | tr '/' '-')" test -f "$HOME/.datafusion-skills/$PROJECT_ID/state.sql" && STATE_DIR="$HOME/.datafusion-skills/$PROJECT_ID"

-- Increase target partitions for more parallelism SET datafusion.execution.target_partitions = 8; -- Increase batch size for throughput SET datafusion.execution.batch_size = 16384; -- Enable/disable optimizations SET datafusion.optimizer.enable_round_robin_repartition = true;

explain-plan

Step 1 — Check datafusion-cli is installed

Step 2 — Resolve state

Step 3 — Determine the mode

Step 4 — Run EXPLAIN

Step 5 — Analyze the plan

Key things to look for:

For EXPLAIN ANALYZE, additionally check:

Step 6 — Present findings

Query Plan Summary

Plan Visualization

Performance Analysis

Recommendations

Step 7 — Suggest configuration tuning

المزيد من هذا المستودع

المزيد من هذا المستودع

Step 1 — Check datafusion-cli is installed

Step 2 — Resolve state

Step 3 — Determine the mode

Step 4 — Run EXPLAIN

Step 5 — Analyze the plan

Key things to look for:

For EXPLAIN ANALYZE, additionally check:

Step 6 — Present findings

Query Plan Summary

Plan Visualization

Performance Analysis

Recommendations

Step 7 — Suggest configuration tuning