en un clic
indexing-semantic-layer
// Build or update the semantic layer for a Dinobase source — checks what's already annotated, then fills gaps and rebuilds only what's missing or incomplete.
// Build or update the semantic layer for a Dinobase source — checks what's already annotated, then fills gaps and rebuilds only what's missing or incomplete.
Set up and query business data across 100+ sources (Stripe, HubSpot, Salesforce, etc.) via SQL. Agent-driven setup, cross-source joins, mutations.
Build a new Dinobase YAML connector for a REST API that dlt doesn't have a verified source for. Researches the API, writes the YAML config with read + write endpoints, incremental loading, pagination, and auth.
| name | indexing-semantic-layer |
| description | Build or update the semantic layer for a Dinobase source — checks what's already annotated, then fills gaps and rebuilds only what's missing or incomplete. |
| argument-hint | <source_name> |
The source to index: $ARGUMENTS
Launch a general-purpose subagent to audit existing annotations and fill every gap.
Provide this prompt to the subagent:
You are building the semantic layer for the Dinobase source $ARGUMENTS.
Your job is to audit what's already annotated, then fill every gap — table descriptions, column docs, PII flags, and relationships. Don't overwrite annotations that are already good. Only add or fix what's missing or wrong.
uv run dinobase annotate --input-schema
This shows the exact JSON format accepted by the annotate command.
Run all four checks and note what's missing:
# Tables with no description
uv run dinobase query "SELECT table_name, row_count FROM _dinobase.tables WHERE schema_name = '$ARGUMENTS' AND description IS NULL ORDER BY row_count DESC"
# Tables that already have descriptions (skip these unless wrong)
uv run dinobase query "SELECT table_name, description FROM _dinobase.tables WHERE schema_name = '$ARGUMENTS' AND description IS NOT NULL"
# Columns already annotated
uv run dinobase query "SELECT table_name, column_name, description FROM _dinobase.columns WHERE schema_name = '$ARGUMENTS' AND description IS NOT NULL ORDER BY table_name, column_name"
# Existing relationships
uv run dinobase query "SELECT from_table, from_column, to_table, to_column, cardinality, description FROM _dinobase.relationships WHERE from_schema = '$ARGUMENTS' OR to_schema = '$ARGUMENTS'"
# Existing KV metadata (pii, deprecated, owner, etc.)
uv run dinobase query "SELECT table_name, column_name, key, value FROM _dinobase.metadata WHERE schema_name = '$ARGUMENTS' ORDER BY table_name, column_name, key"
For every table that is missing a description, or any table that has unannotated columns worth documenting, explore its schema and sample data:
uv run dinobase query "SELECT column_name, data_type FROM information_schema.columns WHERE table_schema = '$ARGUMENTS' AND table_name = '<table>' ORDER BY ordinal_position"
uv run dinobase query "SELECT * FROM \"$ARGUMENTS\".\"<table>\" LIMIT 3"
Understand:
*_url, node_id, _dlt_*)?*_id, *__id, _dlt_parent_id, _dlt_root_id)?After exploring, write a single dinobase annotate call with a JSON array containing only the gaps — skip anything that already has a correct annotation. You can mix annotation and relationship items in the same array:
uv run dinobase annotate '[
{"target": "$ARGUMENTS.<table>", "key": "description", "value": "What this table contains"},
{"target": "$ARGUMENTS.<table>.<column>", "key": "description", "value": "What this column means"},
{"target": "$ARGUMENTS.<table>.<column>", "key": "pii", "value": "true"},
{"from_table": "$ARGUMENTS.<table>", "from_column": "<col>", "to_table": "$ARGUMENTS.<other_table>", "to_column": "id", "cardinality": "one_to_many", "description": "Each X belongs to one Y"}
]'
Check that no gaps remain:
# Tables still missing descriptions
uv run dinobase query "SELECT table_name FROM _dinobase.tables WHERE schema_name = '$ARGUMENTS' AND description IS NULL"
# Spot-check the main table — should have description, related_tables, annotated columns
uv run dinobase describe $ARGUMENTS.<main_table> 2>&1 | python3 -c "
import sys, json
d = json.load(sys.stdin)
print('description:', d.get('description'))
print('related_tables:', len(d.get('related_tables', [])))
print('annotated columns:', sum(1 for c in d['columns'] if c.get('description')), '/', len(d['columns']))
"
Report back what was added and what (if anything) couldn't be determined from the data.
*_url, node_id, _dlt_load_id, _dlt_id, _dlt_list_idx_dlt_parent_id: "Join key to X._dlt_id"; FK columns like customer_id: "References customers.id"