بنقرة واحدة
create-table
// Register a data file as a persistent external table in the DataFusion session. Supports Parquet, CSV, JSON, Arrow IPC, and Avro files. Explores the schema and writes to the session state file for reuse across skills.
// Register a data file as a persistent external table in the DataFusion session. Supports Parquet, CSV, JSON, Arrow IPC, and Avro files. Explores the schema and writes to the session state file for reuse across skills.
Search Apache DataFusion documentation, user guide, and API reference. Returns relevant documentation for a question or keyword. Searches the official DataFusion repository and website.
Visualize and analyze DataFusion query execution plans. Shows logical and physical plans, identifies performance bottlenecks, and suggests optimizations. Supports EXPLAIN and EXPLAIN ANALYZE.
Install or update datafusion-cli. Supports installation via cargo install, Homebrew, or pre-built binaries. Checks the current version and offers to upgrade if outdated.
Create and manage materialized views using DataFusion. Persist SQL query results as Parquet files for fast repeated access. Track source dependencies and refresh when data changes. Powered by datafusion-cli's COPY TO.
Run SQL queries against registered tables or ad-hoc against files using datafusion-cli. Accepts raw SQL or natural language questions. Supports Parquet, CSV, JSON, and Arrow IPC files.
Read and explore data files (Parquet, CSV, JSON, Arrow IPC, Avro) locally or from S3/GCS. Auto-detects format by extension. Uses datafusion-cli for schema inspection and data preview.
| name | create-table |
| description | Register a data file as a persistent external table in the DataFusion session. Supports Parquet, CSV, JSON, Arrow IPC, and Avro files. Explores the schema and writes to the session state file for reuse across skills. |
| argument-hint | <path-to-file> [--name table_name] [--format csv|parquet|json|arrow|avro] |
| allowed-tools | Bash |
You are helping the user register a data file as a persistent table in their DataFusion session.
File path given: $0
Additional arguments: ${1:-}
Follow these steps in order.
If $0 is a relative path, resolve it:
RESOLVED_PATH="$(cd "$(dirname "$0")" 2>/dev/null && pwd)/$(basename "$0")"
Check the file exists (for local files):
test -f "$RESOLVED_PATH" || test -d "$RESOLVED_PATH"
For directories (partitioned data), use the directory path as-is.
command -v datafusion-cli
If not found, delegate to /datafusion-skills:install-datafusion.
If --format was specified, use that. Otherwise detect from extension:
| Extension | Format |
|---|---|
.parquet, .pq | PARQUET |
.csv, .tsv, .txt | CSV |
.json, .jsonl, .ndjson | JSON |
.arrow, .ipc, .feather | ARROW |
.avro | AVRO |
| directory | PARQUET (default for partitioned data) |
If the extension is unknown, try Parquet first, then CSV.
If --name was specified, use that. Otherwise derive from the filename:
Example: My-Data File.parquet → my_data_file
Confirm the name with the user.
STATE_DIR=""
test -f .datafusion-skills/state.sql && STATE_DIR=".datafusion-skills"
PROJECT_ROOT="$(git rev-parse --show-toplevel 2>/dev/null || echo "$PWD")"
PROJECT_ID="$(echo "$PROJECT_ROOT" | tr '/' '-')"
test -f "$HOME/.datafusion-skills/$PROJECT_ID/state.sql" && STATE_DIR="$HOME/.datafusion-skills/$PROJECT_ID"
If no state directory exists, ask the user where to store state (same as other skills):
- In the project directory (
.datafusion-skills/)- In your home directory (
~/.datafusion-skills/<project-id>/)
mkdir -p "$STATE_DIR"
touch "$STATE_DIR/state.sql"
Build the CREATE EXTERNAL TABLE statement:
For Parquet:
CREATE EXTERNAL TABLE IF NOT EXISTS <table_name> STORED AS PARQUET LOCATION '<RESOLVED_PATH>';
For CSV:
CREATE EXTERNAL TABLE IF NOT EXISTS <table_name> STORED AS CSV LOCATION '<RESOLVED_PATH>' OPTIONS ('has_header' 'true');
For JSON:
CREATE EXTERNAL TABLE IF NOT EXISTS <table_name> STORED AS JSON LOCATION '<RESOLVED_PATH>';
For Arrow IPC:
CREATE EXTERNAL TABLE IF NOT EXISTS <table_name> STORED AS ARROW LOCATION '<RESOLVED_PATH>';
For Avro:
CREATE EXTERNAL TABLE IF NOT EXISTS <table_name> STORED AS AVRO LOCATION '<RESOLVED_PATH>';
Test it:
datafusion-cli --file "$STATE_DIR/state.sql" -c "
<CREATE_STATEMENT>
DESCRIBE <table_name>;
SELECT COUNT(*) AS row_count FROM <table_name>;
SELECT * FROM <table_name> LIMIT 5;
"
Check if this table is already in the state file:
grep -q "<table_name>" "$STATE_DIR/state.sql" 2>/dev/null
If not present, append:
cat >> "$STATE_DIR/state.sql" <<'SQL'
-- Table: <table_name> (<FORMAT> from <RESOLVED_PATH>)
<CREATE_STATEMENT>
SQL
Summarize:
<table_name>This table is now available in all
/datafusion-skills:querysessions. Try:/datafusion-skills:query SELECT * FROM <table_name> LIMIT 10