ワンクリックで
read-file
// Read and explore data files (Parquet, CSV, JSON, Arrow IPC, Avro) locally or from S3/GCS. Auto-detects format by extension. Uses datafusion-cli for schema inspection and data preview.
// Read and explore data files (Parquet, CSV, JSON, Arrow IPC, Avro) locally or from S3/GCS. Auto-detects format by extension. Uses datafusion-cli for schema inspection and data preview.
| name | read-file |
| description | Read and explore data files (Parquet, CSV, JSON, Arrow IPC, Avro) locally or from S3/GCS. Auto-detects format by extension. Uses datafusion-cli for schema inspection and data preview. |
| argument-hint | <filename or URL> [question about the data] |
| allowed-tools | Bash |
You are helping the user read and analyze a data file using Apache DataFusion.
Filename given: $0
Question: ${1:-describe the data}
Follow these steps in order, stopping and reporting clearly if any step fails.
Determine whether the input is local or remote:
s3://...) → remotegs://...) → remotefind "$PWD" -name "$0" -not -path '*/.git/*' 2>/dev/null
RESOLVED_PATH).Use the URI/URL as-is for RESOLVED_PATH.
For S3 access, DataFusion uses environment variables:
AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_DEFAULT_REGIONAWS_PROFILE for profile-based credentialsCheck if credentials are available:
test -n "$AWS_ACCESS_KEY_ID" || test -n "$AWS_PROFILE" || test -f "$HOME/.aws/credentials"
If not available, inform the user they need to configure AWS credentials.
command -v datafusion-cli
If not found, delegate to /datafusion-skills:install-datafusion and then continue.
Detect format from extension:
| Extension | Format | DataFusion support |
|---|---|---|
.parquet, .pq | Parquet | Direct query: SELECT * FROM 'file.parquet' |
.csv, .tsv, .txt | CSV | Direct query: SELECT * FROM 'file.csv' |
.json, .jsonl, .ndjson | JSON | Direct query: SELECT * FROM 'file.json' |
.arrow, .ipc, .feather | Arrow IPC | CREATE EXTERNAL TABLE with STORED AS ARROW |
.avro | Avro | CREATE EXTERNAL TABLE with STORED AS AVRO |
Important: datafusion-cli -c only accepts one SQL statement per flag. Use multiple
-c flags for multiple statements, or write a .sql file and use --file.
DataFusion v44+ supports direct queries on Parquet, CSV, and JSON files by path:
datafusion-cli -c "DESCRIBE 'RESOLVED_PATH';"
datafusion-cli -c "SELECT COUNT(*) AS row_count FROM 'RESOLVED_PATH';"
datafusion-cli -c "SELECT * FROM 'RESOLVED_PATH' LIMIT 10;"
For CSV files with non-standard delimiters or no header, fall back to CREATE EXTERNAL TABLE
using a .sql file:
cat > /tmp/_df_preview.sql << 'SQL'
CREATE EXTERNAL TABLE _preview STORED AS CSV LOCATION 'RESOLVED_PATH' OPTIONS ('has_header' 'false', 'delimiter' '\t');
DESCRIBE _preview;
SELECT COUNT(*) AS row_count FROM _preview;
SELECT * FROM _preview LIMIT 10;
SQL
datafusion-cli --file /tmp/_df_preview.sql
cat > /tmp/_df_preview.sql << 'SQL'
CREATE EXTERNAL TABLE _preview STORED AS ARROW LOCATION 'RESOLVED_PATH';
DESCRIBE _preview;
SELECT COUNT(*) AS row_count FROM _preview;
SELECT * FROM _preview LIMIT 10;
SQL
datafusion-cli --file /tmp/_df_preview.sql
cat > /tmp/_df_preview.sql << 'SQL'
CREATE EXTERNAL TABLE _preview STORED AS AVRO LOCATION 'RESOLVED_PATH';
DESCRIBE _preview;
SELECT COUNT(*) AS row_count FROM _preview;
SELECT * FROM _preview LIMIT 10;
SQL
datafusion-cli --file /tmp/_df_preview.sql
If the extension doesn't match any known format:
datafusion-cli: command not found → invoke /datafusion-skills:install-datafusion and retryOPTIONS ('has_header' 'false'), or OPTIONS ('delimiter' '\t') for TSV/datafusion-skills:datafusion-docs <error keywords> for helpUsing the schema, row count, and sample rows gathered above, answer:
${1:-describe the data: summarize column types, row count, and any notable patterns.}
Be concise but thorough — mention:
After answering, suggest relevant follow-ups:
To query this data further — filter, aggregate, join — use
/datafusion-skills:query.
If the file is useful for repeated access:
To register this as a persistent table, run
/datafusion-skills:create-table RESOLVED_PATH.
If the data is large and the user might want to materialize a summary:
To persist a summary as a Parquet file, try
/datafusion-skills:materialized-view.
Keep suggestions brief and show them only once.
/datafusion-skills:query for further exploration/datafusion-skills:create-table for persistent access/datafusion-skills:datafusion-docs for unclear errorsRegister a data file as a persistent external table in the DataFusion session. Supports Parquet, CSV, JSON, Arrow IPC, and Avro files. Explores the schema and writes to the session state file for reuse across skills.
Search Apache DataFusion documentation, user guide, and API reference. Returns relevant documentation for a question or keyword. Searches the official DataFusion repository and website.
Visualize and analyze DataFusion query execution plans. Shows logical and physical plans, identifies performance bottlenecks, and suggests optimizations. Supports EXPLAIN and EXPLAIN ANALYZE.
Install or update datafusion-cli. Supports installation via cargo install, Homebrew, or pre-built binaries. Checks the current version and offers to upgrade if outdated.
Create and manage materialized views using DataFusion. Persist SQL query results as Parquet files for fast repeated access. Track source dependencies and refresh when data changes. Powered by datafusion-cli's COPY TO.
Run SQL queries against registered tables or ad-hoc against files using datafusion-cli. Accepts raw SQL or natural language questions. Supports Parquet, CSV, JSON, and Arrow IPC files.