| name | materialized-view |
| description | Create and manage materialized views using DataFusion. Persist SQL query results as Parquet files for fast repeated access. Track source dependencies and refresh when data changes. Powered by datafusion-cli's COPY TO.
|
| argument-hint | <"create" SQL|"refresh" name|"status"|"list"> |
| allowed-tools | Bash |
You are helping the user create and manage materialized views using Apache DataFusion.
Input: $@
A materialized view in this context is a SQL query whose results are persisted as a Parquet file, registered as an external table, and tracked for refresh. This is a lightweight, file-based approach — no database server needed.
Step 1 — Check datafusion-cli is installed
command -v datafusion-cli
If not found, delegate to /datafusion-skills:install-datafusion and then continue.
Step 2 — Resolve state directory
Look for an existing state file:
STATE_DIR=""
test -f .datafusion-skills/state.sql && STATE_DIR=".datafusion-skills"
PROJECT_ROOT="$(git rev-parse --show-toplevel 2>/dev/null || echo "$PWD")"
PROJECT_ID="$(echo "$PROJECT_ROOT" | tr '/' '-')"
test -f "$HOME/.datafusion-skills/$PROJECT_ID/state.sql" && STATE_DIR="$HOME/.datafusion-skills/$PROJECT_ID"
If no state directory exists, create one. Ask the user:
Where would you like to store DataFusion session state?
- In the project directory (
.datafusion-skills/) — colocated with the project
- In your home directory (
~/.datafusion-skills/<project-id>/) — keeps the repo clean
Create the chosen directory and initialize state.sql if it doesn't exist:
mkdir -p "$STATE_DIR"
touch "$STATE_DIR/state.sql"
Also create the materialized views directory and manifest:
mkdir -p "$STATE_DIR/views"
test -f "$STATE_DIR/views/manifest.json" || echo '{"views":{}}' > "$STATE_DIR/views/manifest.json"
Step 3 — Parse the command
Determine what the user wants:
create <SQL or natural language> → create a new materialized view
refresh <name> → refresh an existing materialized view
status → show status of all materialized views
list → list all materialized views
drop <name> → remove a materialized view
If the input is just SQL or a natural language description, treat it as create.
Command: CREATE
3a — Generate SQL if needed
If the input is natural language, generate SQL. Use the session state to understand available tables:
datafusion-cli --file "$STATE_DIR/state.sql" -c "SHOW TABLES;" 2>/dev/null
3b — Derive a view name
Generate a descriptive snake_case name from the SQL (e.g., daily_trades_by_symbol). Ask the user to confirm or rename.
3c — Validate the query
Test that the SQL runs successfully:
datafusion-cli --file "$STATE_DIR/state.sql" -c "
$SQL LIMIT 1;
"
If it fails, diagnose the error and fix the SQL.
3d — Materialize as Parquet
datafusion-cli --file "$STATE_DIR/state.sql" -c "
COPY ($SQL) TO '$STATE_DIR/views/<view_name>.parquet' STORED AS PARQUET;
"
3e — Register as external table
Append to the state file so the view is available in future sessions:
cat >> "$STATE_DIR/state.sql" <<SQL
-- Materialized view: <view_name>
-- Source SQL: $SQL
CREATE EXTERNAL TABLE IF NOT EXISTS <view_name> STORED AS PARQUET LOCATION '$STATE_DIR/views/<view_name>.parquet';
SQL
3f — Update the manifest
Update $STATE_DIR/views/manifest.json to track the view:
python3 -c "
import json, datetime
manifest_path = '$STATE_DIR/views/manifest.json'
with open(manifest_path) as f:
manifest = json.load(f)
manifest['views']['<view_name>'] = {
'sql': '''$SQL''',
'parquet_path': '$STATE_DIR/views/<view_name>.parquet',
'created_at': datetime.datetime.now().isoformat(),
'last_refreshed': datetime.datetime.now().isoformat(),
'source_tables': [] # extracted from SQL
}
with open(manifest_path, 'w') as f:
json.dump(manifest, f, indent=2)
"
3g — Verify
datafusion-cli --file "$STATE_DIR/state.sql" -c "
SELECT COUNT(*) AS row_count FROM <view_name>;
DESCRIBE <view_name>;
"
Report success with row count and schema.
Command: REFRESH
Refresh a specific view
Read the view's SQL from the manifest:
python3 -c "
import json
with open('$STATE_DIR/views/manifest.json') as f:
manifest = json.load(f)
view = manifest['views'].get('<view_name>')
if view:
print(view['sql'])
else:
print('ERROR: view not found')
"
Then re-materialize:
datafusion-cli --file "$STATE_DIR/state.sql" -c "
COPY ($SQL) TO '$STATE_DIR/views/<view_name>.parquet' STORED AS PARQUET;
"
Update the last_refreshed timestamp in the manifest.
Report the new row count.
Command: STATUS
Show the status of all materialized views:
python3 -c "
import json, os, datetime
with open('$STATE_DIR/views/manifest.json') as f:
manifest = json.load(f)
for name, view in manifest['views'].items():
parquet = view['parquet_path']
size = os.path.getsize(parquet) if os.path.exists(parquet) else 0
size_mb = size / (1024 * 1024)
print(f\"{name}:\")
print(f\" SQL: {view['sql'][:80]}...\")
print(f\" Last refreshed: {view['last_refreshed']}\")
print(f\" File size: {size_mb:.2f} MB\")
print(f\" Path: {parquet}\")
print()
"
Also verify each view is queryable:
datafusion-cli --file "$STATE_DIR/state.sql" -c "SELECT COUNT(*) AS row_count FROM <view_name>;"
Command: LIST
python3 -c "
import json
with open('$STATE_DIR/views/manifest.json') as f:
manifest = json.load(f)
for name in sorted(manifest['views']):
view = manifest['views'][name]
print(f\" {name} — refreshed {view['last_refreshed'][:10]}\")
"
Command: DROP
Remove a materialized view:
- Delete the Parquet file
- Remove from manifest.json
- Remove the CREATE EXTERNAL TABLE line from state.sql
rm -f "$STATE_DIR/views/<view_name>.parquet"
python3 -c "
import json
with open('$STATE_DIR/views/manifest.json') as f:
manifest = json.load(f)
manifest['views'].pop('<view_name>', None)
with open('$STATE_DIR/views/manifest.json', 'w') as f:
json.dump(manifest, f, indent=2)
"
Remove the corresponding lines from state.sql:
sed -i '' '/-- Materialized view: <view_name>/,/^$/d' "$STATE_DIR/state.sql"
Error handling
- Query fails → check if source tables exist, suggest
/datafusion-skills:create-table to register them
- Disk space → warn if the Parquet file is > 1GB
- Manifest corrupt → regenerate from existing Parquet files in the views directory
Suggest next steps
After creating a view:
Your materialized view <view_name> is now available as a table. Query it with /datafusion-skills:query SELECT * FROM <view_name> LIMIT 10.
To refresh it when source data changes, run /datafusion-skills:materialized-view refresh <view_name>.