| name | ktx-ai-data-agents |
| description | Context layer for AI data agents - query warehouses accurately with semantic layers, metrics, and wiki knowledge through MCP |
| triggers | ["set up ktx for my data warehouse","configure ktx semantic layer for AI agents","use ktx to query my database with Claude","integrate ktx with my analytics workflow","build context from dbt and warehouse metadata","connect ktx to Snowflake/BigQuery/Postgres","search ktx wiki and semantic layer","configure ktx MCP server for agents"] |
ktx AI Data Agents Skill
Skill by ara.so ā AI Agent Skills collection.
Overview
ktx is an executable context layer that teaches AI agents how to query data warehouses accurately. It automatically builds and maintains:
- Semantic layer with approved metric definitions, join graphs, and fan/chasm trap resolution
- Wiki knowledge from Notion, dbt docs, BI tools, and team documentation
- Warehouse metadata including table schemas, joinable columns, and usage patterns
- MCP integration for Claude Code, Codex, Cursor, and other AI agents
Unlike general-purpose agents that reinvent SQL logic on every query, ktx provides agents with canonical definitions and business context through a searchable interface.
Installation
npm install -g @kaelio/ktx
npx @kaelio/ktx setup
Requirements:
- Node.js 18+
- Access to a SQL warehouse (PostgreSQL, Snowflake, BigQuery, ClickHouse, MySQL, SQL Server, or SQLite)
- LLM API key (Anthropic, Google Vertex AI) or Claude Pro/Max subscription
Quick Start
Initial Setup
ktx setup
ktx status
The ktx setup wizard will:
- Create
ktx.yaml configuration
- Configure LLM and embedding providers
- Set up database connections
- Configure context sources (dbt, Looker, Metabase, Notion)
- Build initial context
- Install agent integration (MCP)
Project Structure
my-project/
āāā ktx.yaml # Project configuration
āāā semantic-layer/<connection-id>/ # YAML metric/dimension definitions
āāā wiki/global/ # Shared business knowledge
āāā wiki/user/<user-id>/ # User-scoped notes
āāā raw-sources/<connection-id>/ # Ingest artifacts and reports
āāā .ktx/ # Local state (git-ignored)
Commit: ktx.yaml, semantic-layer/, wiki/
Ignore: .ktx/
Configuration
ktx.yaml Structure
version: 1.0
name: my-analytics-project
llm:
provider: anthropic
model: claude-sonnet-4-6
embeddings:
provider: openai
model: text-embedding-3-small
connections:
- id: warehouse
type: postgres
host: ${DATABASE_HOST}
port: 5432
database: analytics
user: ${DATABASE_USER}
password: ${DATABASE_PASSWORD}
ssl: true
context_sources:
- id: dbt_main
type: dbt
connection_id: warehouse
manifest_path: ./target/manifest.json
catalog_path: ./target/catalog.json
- id: notion_docs
type: notion
token: ${NOTION_TOKEN}
page_ids:
- 3fa85f64-5717-4562-b3fc-2c963f66afa6
Environment Variables
export ANTHROPIC_API_KEY=your-key-here
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
export OPENAI_API_KEY=your-key-here
export DATABASE_HOST=warehouse.example.com
export DATABASE_USER=readonly_user
export DATABASE_PASSWORD=secure-password
export NOTION_TOKEN=secret_notion_token
export LOOKER_API_TOKEN=looker-token
Supported Databases
PostgreSQL:
connections:
- id: postgres_warehouse
type: postgres
host: ${PG_HOST}
port: 5432
database: analytics
user: ${PG_USER}
password: ${PG_PASSWORD}
ssl: true
Snowflake:
connections:
- id: snowflake_warehouse
type: snowflake
account: ${SNOWFLAKE_ACCOUNT}
user: ${SNOWFLAKE_USER}
password: ${SNOWFLAKE_PASSWORD}
warehouse: COMPUTE_WH
database: ANALYTICS
schema: PUBLIC
BigQuery:
connections:
- id: bigquery_warehouse
type: bigquery
project_id: ${GCP_PROJECT_ID}
dataset: analytics
credentials_path: ${GOOGLE_APPLICATION_CREDENTIALS}
Core Commands
Context Management
ktx ingest
ktx ingest --connection warehouse
ktx ingest --force
ktx ingest --dry-run
Search and Query
ktx sl "revenue"
ktx sl "customer churn rate"
ktx wiki "refund policy"
ktx wiki "data retention rules"
ktx describe metric monthly_recurring_revenue
ktx describe dimension customer_segment
MCP Server
ktx mcp start
ktx mcp start --project-dir /path/to/project
ktx mcp status
ktx mcp stop
Project Management
ktx validate
ktx status
ktx connections list
ktx connections test warehouse
Semantic Layer Usage
Defining Metrics
Create semantic-layer/warehouse/metrics.yaml:
metrics:
- name: monthly_recurring_revenue
label: Monthly Recurring Revenue
description: Sum of all active subscription values normalized to monthly
type: sum
sql: |
CASE
WHEN billing_period = 'monthly' THEN amount
WHEN billing_period = 'annual' THEN amount / 12
END
table: subscriptions
filters:
- column: status
operator: equals
value: 'active'
dimensions:
- customer_segment
- plan_type
timestamp_column: created_at
- name: customer_count
label: Active Customers
description: Count of distinct active customer IDs
type: count_distinct
sql: customer_id
table: subscriptions
filters:
- column: status
operator: equals
value: 'active'
Defining Dimensions
Create semantic-layer/warehouse/dimensions.yaml:
dimensions:
- name: customer_segment
label: Customer Segment
description: Business vs. enterprise customer classification
type: categorical
sql: |
CASE
WHEN annual_revenue > 100000 THEN 'Enterprise'
WHEN annual_revenue > 10000 THEN 'Business'
ELSE 'Startup'
END
table: customers
- name: signup_date
label: Signup Date
description: Date customer first signed up
type: time
sql: DATE(created_at)
table: customers
granularities:
- day
- week
- month
- quarter
- year
Join Configuration
Define table relationships in semantic-layer/warehouse/joins.yaml:
joins:
- left_table: subscriptions
right_table: customers
type: left
conditions:
- left_column: customer_id
right_column: id
- left_table: subscriptions
right_table: plans
type: left
conditions:
- left_column: plan_id
right_column: id
Wiki Management
Creating Wiki Pages
cat > wiki/global/refund-policy.md << 'EOF'
Customers can request refunds within 30 days of purchase.
- Full refund: < 7 days
- Prorated refund: 7-30 days
- No refund: > 30 days
Refunds update `transactions.status` to 'refunded' and create
negative entries in `revenue_events`.
EOF
mkdir -p wiki/user/$(whoami)
cat > wiki/user/$(whoami)/analysis-notes.md << 'EOF'
Found discrepancy in EMEA revenue - missing Stripe events.
Tracked in JIRA-1234.
EOF
Ingesting from Notion
context_sources:
- id: product_docs
type: notion
token: ${NOTION_TOKEN}
page_ids:
- 3fa85f64-5717-4562-b3fc-2c963f66afa6
- 7c9e6679-7425-40de-944b-e07fc1f90ae7
recursive: true
Run ktx ingest to sync Notion content into wiki/global/.
Agent Integration (MCP)
Starting MCP Server
ktx mcp start
ktx mcp start --project-dir ~/my-analytics
Claude Desktop Configuration
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"ktx": {
"command": "ktx",
"args": ["mcp", "start", "--project-dir", "/absolute/path/to/project"]
}
}
}
Restart Claude Desktop to load the MCP server.
Using ktx from Claude
Once configured, you can prompt Claude:
What was our MRR last month by customer segment?
Claude will use ktx MCP tools to:
- Search semantic layer for
monthly_recurring_revenue metric
- Find relevant dimensions (
customer_segment)
- Retrieve approved SQL definitions
- Execute query using canonical metric logic
Available MCP Tools
ktx_search_semantic_layer - Search metrics, dimensions, tables
ktx_search_wiki - Search business knowledge and documentation
ktx_describe_entity - Get detailed entity information
ktx_list_connections - List available database connections
ktx_get_context_summary - Get project context overview
Common Patterns
Initial Project Setup Workflow
cd ~/projects/analytics
ktx setup
ktx ingest
ktx status
ktx sl "revenue"
ktx mcp start
Incremental Context Updates
dbt run
dbt docs generate
ktx ingest --connection warehouse
ktx ingest --source notion_docs
ktx validate
Searching Before Agent Queries
ktx sl "churn"
ktx describe metric customer_churn_rate
Multi-Warehouse Setup
connections:
- id: production
type: snowflake
account: ${SNOWFLAKE_PROD_ACCOUNT}
- id: staging
type: snowflake
account: ${SNOWFLAKE_STAGING_ACCOUNT}
context_sources:
- id: dbt_prod
type: dbt
connection_id: production
manifest_path: ./prod/target/manifest.json
- id: dbt_staging
type: dbt
connection_id: staging
manifest_path: ./staging/target/manifest.json
ktx ingest --connection production
ktx sl "revenue" --connection production
Troubleshooting
MCP Server Not Starting
Check project directory:
ktx status
Verify ktx.yaml exists:
ls ktx.yaml
Check Claude Desktop config:
cat ~/Library/Application\ Support/Claude/claude_desktop_config.json
Database Connection Issues
ktx connections test warehouse
Context Ingestion Failures
ktx ingest --verbose
ktx ingest --source dbt_main --verbose
ktx validate
Semantic Layer Errors
Undefined metric:
ktx sl "*" --type metric
cat semantic-layer/warehouse/metrics.yaml
ktx validate
Join graph errors:
cat semantic-layer/warehouse/joins.yaml
Agent Not Finding Context
Restart MCP server:
ktx mcp stop
ktx mcp start
Rebuild context:
ktx ingest --force
Check search results:
ktx sl "your search term"
ktx wiki "your search term"
TypeScript API Usage
For programmatic usage in Node.js:
import { KtxProject } from '@kaelio/ktx';
const project = await KtxProject.load('/path/to/project');
const metrics = await project.searchSemanticLayer('revenue', {
type: 'metric',
limit: 10
});
const wikiPages = await project.searchWiki('refund policy', {
scope: 'global',
limit: 5
});
const metric = await project.describeEntity('metric', 'monthly_recurring_revenue');
await project.ingest({
connectionId: 'warehouse',
force: false
});
Best Practices
- Version control semantic layer: Commit
semantic-layer/ and wiki/global/ to track metric definitions
- Use environment variables: Never hardcode credentials in
ktx.yaml
- Regular ingestion: Run
ktx ingest after dbt runs or schema changes
- Descriptive names: Use clear metric/dimension names that match business language
- Document assumptions: Add descriptions to all semantic layer entities
- Test connections: Use
ktx connections test before ingestion
- User wiki for ephemeral notes: Keep temporary analysis in
wiki/user/
- Review ingest logs: Check for warnings about duplicate or conflicting definitions
Advanced Configuration
Custom LLM Configuration
llm:
provider: vertex
project_id: ${GCP_PROJECT_ID}
location: us-central1
model: claude-3-5-sonnet@20241022
max_tokens: 8192
temperature: 0.0
Embedding Configuration
embeddings:
provider: openai
model: text-embedding-3-large
dimensions: 1536
batch_size: 100
Ingest Scheduling
0 2 * * * cd /path/to/project && ktx ingest --connection warehouse >> /var/log/ktx.log 2>&1
Custom Python Queries
ktx includes a Python semantic layer query planner:
from ktx_sl import SemanticLayer
sl = SemanticLayer.from_project("/path/to/project")
query = sl.query(
metrics=["monthly_recurring_revenue"],
dimensions=["customer_segment", "signup_month"],
filters=[
{"dimension": "signup_date", "operator": ">=", "value": "2025-01-01"}
],
order_by=[{"metric": "monthly_recurring_revenue", "desc": True}]
)
sql = query.to_sql()
print(sql)
Resources