with one click
datafusion-docs
// Search Apache DataFusion documentation, user guide, and API reference. Returns relevant documentation for a question or keyword. Searches the official DataFusion repository and website.
// Search Apache DataFusion documentation, user guide, and API reference. Returns relevant documentation for a question or keyword. Searches the official DataFusion repository and website.
Register a data file as a persistent external table in the DataFusion session. Supports Parquet, CSV, JSON, Arrow IPC, and Avro files. Explores the schema and writes to the session state file for reuse across skills.
Visualize and analyze DataFusion query execution plans. Shows logical and physical plans, identifies performance bottlenecks, and suggests optimizations. Supports EXPLAIN and EXPLAIN ANALYZE.
Install or update datafusion-cli. Supports installation via cargo install, Homebrew, or pre-built binaries. Checks the current version and offers to upgrade if outdated.
Create and manage materialized views using DataFusion. Persist SQL query results as Parquet files for fast repeated access. Track source dependencies and refresh when data changes. Powered by datafusion-cli's COPY TO.
Run SQL queries against registered tables or ad-hoc against files using datafusion-cli. Accepts raw SQL or natural language questions. Supports Parquet, CSV, JSON, and Arrow IPC files.
Read and explore data files (Parquet, CSV, JSON, Arrow IPC, Avro) locally or from S3/GCS. Auto-detects format by extension. Uses datafusion-cli for schema inspection and data preview.
| name | datafusion-docs |
| description | Search Apache DataFusion documentation, user guide, and API reference. Returns relevant documentation for a question or keyword. Searches the official DataFusion repository and website. |
| argument-hint | <question or keyword> |
| allowed-tools | Bash |
You are helping the user find relevant Apache DataFusion documentation.
Query: $@
Follow these steps in order.
If the input is a natural language question (e.g. "how do I create an external table"), extract the key technical terms: nouns, function names, SQL keywords. Drop stop words.
If the input is already a function name or technical term (e.g. APPROX_PERCENTILE_CONT, CREATE EXTERNAL TABLE), use it as-is.
Use the extracted terms as SEARCH_QUERY in the next steps.
The DataFusion user guide is in the GitHub repo under docs/. Search it using gh:
Important: Do NOT quote multi-word search terms as a single string. Pass each word
as a separate token so gh search code matches broadly. For example, use
EXTERNAL TABLE not "EXTERNAL TABLE".
gh search code $SEARCH_QUERY --repo apache/datafusion --language markdown --limit 10
If gh is not available, fall back to the GitHub API:
gh api "search/code?q=$SEARCH_QUERY+repo:apache/datafusion+extension:md&per_page=10" --jq '.items[:10][] | "\(.path)"'
DataFusion's built-in functions are documented in docs/source/user-guide/sql/. Check specifically:
gh search code "$SEARCH_QUERY" --repo apache/datafusion --language markdown --limit 5 -- path:docs/source/user-guide/sql/
Also list the available SQL doc files so you can fetch the most relevant one directly:
gh api "repos/apache/datafusion/contents/docs/source/user-guide/sql" --jq '.[].name' 2>/dev/null
If the query is about API usage or implementation patterns, search Rust source code:
gh search code "$SEARCH_QUERY" --repo apache/datafusion --language rust --limit 5
For the most relevant results (top 2-3), fetch the actual content:
gh api "repos/apache/datafusion/contents/<path>" --jq '.content' | base64 -d
If the file is too large, fetch just the relevant section. Look for the search terms in the content and extract the surrounding context (heading + content under that heading).
Organize the results by relevance:
For each result, provide:
If the search didn't find exactly what the user needed:
You can also check the DataFusion user guide at https://datafusion.apache.org/user-guide/ or the API docs at https://docs.rs/datafusion/latest/datafusion/
If the query is about a specific SQL function:
Try running
datafusion-cli -c "SELECT * FROM information_schema.df_settings WHERE name LIKE '%<keyword>%';"to see related configuration options.
For faster lookups, here are paths to key documentation sections:
| Topic | Path in repo |
|---|---|
| SQL Reference | docs/source/user-guide/sql/ |
| Scalar Functions | docs/source/user-guide/sql/scalar_functions.md |
| Aggregate Functions | docs/source/user-guide/sql/aggregate_functions.md |
| Window Functions | docs/source/user-guide/sql/window_functions.md |
| CREATE EXTERNAL TABLE | docs/source/user-guide/sql/ddl.md |
| Data Types | docs/source/user-guide/sql/data_types.md |
| Configuration | docs/source/user-guide/configs.md |
| Python Bindings | docs/source/user-guide/python/ |
| Library Usage | docs/source/library-user-guide/ |