ワンクリックでManusで任意のスキルを実行

$pwd:

read-file

Name: Read File
Author: datafusion-contrib

// Read and explore data files (Parquet, CSV, JSON, Arrow IPC, Avro) locally or from S3/GCS. Auto-detects format by extension. Uses datafusion-cli for schema inspection and data preview.

Manusで実行

$ git log --oneline --stat

stars:12

forks:0

updated:2026年3月21日 07:59

SKILL.md

readonly

name	read-file
description	Read and explore data files (Parquet, CSV, JSON, Arrow IPC, Avro) locally or from S3/GCS. Auto-detects format by extension. Uses datafusion-cli for schema inspection and data preview.
argument-hint	<filename or URL> [question about the data]
allowed-tools	Bash

You are helping the user read and analyze a data file using Apache DataFusion.

Filename given: $0 Question: ${1:-describe the data}

Follow these steps in order, stopping and reporting clearly if any step fails.

Step 1 — Classify and resolve the path

Determine whether the input is local or remote:

S3 URI (s3://...) → remote
GCS URI (gs://...) → remote
HTTPS/HTTP URL → remote (DataFusion supports HTTP via object_store)
Otherwise → local file

Local files

find "$PWD" -name "$0" -not -path '*/.git/*' 2>/dev/null

Zero results → tell the user the file was not found and stop.
More than one result → list all matches, ask the user to re-run with a fuller path, and stop.
Exactly one result → use that full path (RESOLVED_PATH).

Remote files

Use the URI/URL as-is for RESOLVED_PATH.

For S3 access, DataFusion uses environment variables:

AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_DEFAULT_REGION
Or AWS_PROFILE for profile-based credentials

Check if credentials are available:

test -n "$AWS_ACCESS_KEY_ID" || test -n "$AWS_PROFILE" || test -f "$HOME/.aws/credentials"

If not available, inform the user they need to configure AWS credentials.

Step 2 — Check datafusion-cli is installed

command -v datafusion-cli

If not found, delegate to /datafusion-skills:install-datafusion and then continue.

Step 3 — Detect file format and read

Detect format from extension:

Extension	Format	DataFusion support
`.parquet`, `.pq`	Parquet	Direct query: `SELECT * FROM 'file.parquet'`
`.csv`, `.tsv`, `.txt`	CSV	Direct query: `SELECT * FROM 'file.csv'`
`.json`, `.jsonl`, `.ndjson`	JSON	Direct query: `SELECT * FROM 'file.json'`
`.arrow`, `.ipc`, `.feather`	Arrow IPC	`CREATE EXTERNAL TABLE` with `STORED AS ARROW`
`.avro`	Avro	`CREATE EXTERNAL TABLE` with `STORED AS AVRO`

Important: datafusion-cli -c only accepts one SQL statement per flag. Use multiple -c flags for multiple statements, or write a .sql file and use --file.

For Parquet, CSV, and JSON files (direct query):

DataFusion v44+ supports direct queries on Parquet, CSV, and JSON files by path:

datafusion-cli -c "DESCRIBE 'RESOLVED_PATH';"

datafusion-cli -c "SELECT COUNT(*) AS row_count FROM 'RESOLVED_PATH';"

datafusion-cli -c "SELECT * FROM 'RESOLVED_PATH' LIMIT 10;"

For CSV files with non-standard delimiters or no header, fall back to CREATE EXTERNAL TABLE using a .sql file:

cat > /tmp/_df_preview.sql << 'SQL'
CREATE EXTERNAL TABLE _preview STORED AS CSV LOCATION 'RESOLVED_PATH' OPTIONS ('has_header' 'false', 'delimiter' '\t');
DESCRIBE _preview;
SELECT COUNT(*) AS row_count FROM _preview;
SELECT * FROM _preview LIMIT 10;
SQL
datafusion-cli --file /tmp/_df_preview.sql

For Arrow IPC files:

cat > /tmp/_df_preview.sql << 'SQL'
CREATE EXTERNAL TABLE _preview STORED AS ARROW LOCATION 'RESOLVED_PATH';
DESCRIBE _preview;
SELECT COUNT(*) AS row_count FROM _preview;
SELECT * FROM _preview LIMIT 10;
SQL
datafusion-cli --file /tmp/_df_preview.sql

For Avro files:

cat > /tmp/_df_preview.sql << 'SQL'
CREATE EXTERNAL TABLE _preview STORED AS AVRO LOCATION 'RESOLVED_PATH';
DESCRIBE _preview;
SELECT COUNT(*) AS row_count FROM _preview;
SELECT * FROM _preview LIMIT 10;
SQL
datafusion-cli --file /tmp/_df_preview.sql

Unknown format

If the extension doesn't match any known format:

Try Parquet first (most common in data engineering)
Then try CSV with auto-detection
Report the error and suggest the user specify the format

Step 4 — Handle errors

datafusion-cli: command not found → invoke /datafusion-skills:install-datafusion and retry
File not found → double-check the path, suggest using absolute path
Parse error on CSV → try different options: OPTIONS ('has_header' 'false'), or OPTIONS ('delimiter' '\t') for TSV
S3 access denied → remind user to configure AWS credentials
Persistent error → use /datafusion-skills:datafusion-docs <error keywords> for help

Step 5 — Answer the question

Using the schema, row count, and sample rows gathered above, answer:

${1:-describe the data: summarize column types, row count, and any notable patterns.}

Be concise but thorough — mention:

Number of columns and their types
Row count
Any notable patterns in the sample (nulls, date ranges, value distributions)

Step 6 — Suggest next steps

After answering, suggest relevant follow-ups:

To query this data further — filter, aggregate, join — use /datafusion-skills:query.

If the file is useful for repeated access:

To register this as a persistent table, run /datafusion-skills:create-table RESOLVED_PATH.

If the data is large and the user might want to materialize a summary:

To persist a summary as a Parquet file, try /datafusion-skills:materialized-view.

Keep suggestions brief and show them only once.

Cross-skill integration

Query follow-ups: Suggest /datafusion-skills:query for further exploration
Table registration: Suggest /datafusion-skills:create-table for persistent access
Error troubleshooting: Use /datafusion-skills:datafusion-docs for unclear errors

related-skills.json

同じリポジトリ

create-table.md

from "datafusion-contrib/datafusion-skills"

Register a data file as a persistent external table in the DataFusion session. Supports Parquet, CSV, JSON, Arrow IPC, and Avro files. Explores the schema and writes to the session state file for reuse across skills.

2026-03-2112

datafusion-docs.md

from "datafusion-contrib/datafusion-skills"

Search Apache DataFusion documentation, user guide, and API reference. Returns relevant documentation for a question or keyword. Searches the official DataFusion repository and website.

2026-03-2112

explain-plan.md

from "datafusion-contrib/datafusion-skills"

Visualize and analyze DataFusion query execution plans. Shows logical and physical plans, identifies performance bottlenecks, and suggests optimizations. Supports EXPLAIN and EXPLAIN ANALYZE.

2026-03-2112

install-datafusion.md

from "datafusion-contrib/datafusion-skills"

Install or update datafusion-cli. Supports installation via cargo install, Homebrew, or pre-built binaries. Checks the current version and offers to upgrade if outdated.

2026-03-2112

materialized-view.md

from "datafusion-contrib/datafusion-skills"

Create and manage materialized views using DataFusion. Persist SQL query results as Parquet files for fast repeated access. Track source dependencies and refresh when data changes. Powered by datafusion-cli's COPY TO.

2026-03-2112

query.md

from "datafusion-contrib/datafusion-skills"

Run SQL queries against registered tables or ad-hoc against files using datafusion-cli. Accepts raw SQL or natural language questions. Supports Parquet, CSV, JSON, and Arrow IPC files.

2026-03-2112

package.json

"author": "datafusion-contrib"

"repository": "datafusion-contrib/datafusion-skills"

GitHub リポジトリを開く Creator のリポジトリを見る

$ install --global

$ download --local

Manusで実行

$ useful --forSOC

ソフトウェア開発者コンピュータ・数学職15-1252L4

Extension

Format

DataFusion support

.parquet, .pq

Parquet

Direct query: SELECT * FROM 'file.parquet'

.csv, .tsv, .txt

CSV

Direct query: SELECT * FROM 'file.csv'

.json, .jsonl, .ndjson

JSON

Direct query: SELECT * FROM 'file.json'

.arrow, .ipc, .feather

Arrow IPC

CREATE EXTERNAL TABLE with STORED AS ARROW

.avro

Avro

CREATE EXTERNAL TABLE with STORED AS AVRO

cat > /tmp/_df_preview.sql << 'SQL' CREATE EXTERNAL TABLE _preview STORED AS CSV LOCATION 'RESOLVED_PATH' OPTIONS ('has_header' 'false', 'delimiter' '\t'); DESCRIBE _preview; SELECT COUNT(*) AS row_count FROM _preview; SELECT * FROM _preview LIMIT 10; SQL datafusion-cli --file /tmp/_df_preview.sql

cat > /tmp/_df_preview.sql << 'SQL' CREATE EXTERNAL TABLE _preview STORED AS ARROW LOCATION 'RESOLVED_PATH'; DESCRIBE _preview; SELECT COUNT(*) AS row_count FROM _preview; SELECT * FROM _preview LIMIT 10; SQL datafusion-cli --file /tmp/_df_preview.sql

cat > /tmp/_df_preview.sql << 'SQL' CREATE EXTERNAL TABLE _preview STORED AS AVRO LOCATION 'RESOLVED_PATH'; DESCRIBE _preview; SELECT COUNT(*) AS row_count FROM _preview; SELECT * FROM _preview LIMIT 10; SQL datafusion-cli --file /tmp/_df_preview.sql

read-file

Step 1 — Classify and resolve the path

Local files

Remote files

Step 2 — Check datafusion-cli is installed

Step 3 — Detect file format and read

For Parquet, CSV, and JSON files (direct query):

For Arrow IPC files:

For Avro files:

Unknown format

Step 4 — Handle errors

Step 5 — Answer the question

Step 6 — Suggest next steps

Cross-skill integration

このリポジトリの他の Skills

Step 1 — Classify and resolve the path

Local files

Remote files

Step 2 — Check datafusion-cli is installed

Step 3 — Detect file format and read

For Parquet, CSV, and JSON files (direct query):

For Arrow IPC files:

For Avro files:

Unknown format

Step 4 — Handle errors

Step 5 — Answer the question

Step 6 — Suggest next steps

Cross-skill integration

このリポジトリの他の Skills