| name | ktx-ai-data-agents-context |
| description | Context layer for AI data agents - teach Claude Code, Codex, and AI agents to query data warehouses accurately with semantic layer, wiki knowledge, and MCP tools |
| triggers | ["set up ktx for AI data agents","configure ktx semantic layer for my warehouse","teach my agent to query the data warehouse correctly","install ktx context layer for data analysis","configure ktx with dbt and warehouse connections","use ktx to build data agent context","integrate ktx MCP server with my AI agent","help my agent understand our analytics metrics"] |
ktx AI Data Agents Context
Skill by ara.so — AI Agent Skills collection.
ktx is an executable context layer that teaches AI agents (Claude Code, Codex, Cursor, OpenCode) how to query your data warehouse accurately. It automatically builds and maintains a semantic layer from your warehouse metadata, dbt projects, LookML, Looker, Metabase, and wiki content - then serves it to agents through CLI and MCP tools with full-text and semantic search.
What ktx Does
- Learns from company knowledge: Ingests wiki content, dbt docs, Looker, Metabase - organizes it, removes duplicates, flags contradictions
- Maps the data stack: Samples tables, detects joinable columns, annotates sources so agents write better queries
- Builds a semantic layer: Combines raw tables and metrics through a join graph that resolves chasm and fan traps automatically
- Serves agents at execution: Exposes CLI and MCP tools for Claude Code, Codex, Cursor, OpenCode to query context
Installation
Global Installation
npm install -g @kaelio/ktx
Project-Specific Installation
npm install --save-dev @kaelio/ktx
Or with pnpm:
pnpm add -D @kaelio/ktx
Initial Setup
Interactive Setup (Recommended)
ktx setup
This command:
- Creates or resumes a local ktx project
- Configures LLM and embedding providers
- Configures database connections
- Configures context sources (dbt, Looker, Metabase, Notion)
- Builds initial context
- Installs agent integration (MCP or Skills)
Check Project Status
ktx status
Example output:
ktx project: /home/user/analytics
Project ready: yes
LLM ready: yes (claude-sonnet-4-6)
Embeddings ready: yes (text-embedding-3-small)
Databases configured: yes (warehouse)
Context sources configured: yes (dbt_main)
ktx context built: yes
Agent integration ready: yes (codex:project)
Configuration
Project Structure
my-project/
├── ktx.yaml # Project configuration
├── semantic-layer/<connection-id>/ # YAML semantic sources
├── wiki/global/ # Shared business context
├── wiki/user/<user-id>/ # User-scoped notes
├── raw-sources/<connection-id>/ # Ingest artifacts and reports
└── .ktx/ # Local state and secrets (git-ignored)
ktx.yaml Example
version: 1
project_id: analytics-warehouse
llm:
provider: anthropic
model: claude-sonnet-4-6
embeddings:
provider: openai
model: text-embedding-3-small
databases:
warehouse:
type: postgres
host: localhost
port: 5432
database: analytics
user: readonly_user
context_sources:
dbt_main:
type: dbt
project_dir: ./dbt
profiles_dir: ~/.dbt
notion_docs:
type: notion
Environment Variables
ktx respects these environment variables:
export ANTHROPIC_API_KEY=your_key_here
export OPENAI_API_KEY=your_key_here
export DB_PASSWORD=your_db_password
export NOTION_TOKEN=your_notion_token
export DBT_PROFILES_DIR=~/.dbt
export KTX_PROJECT_DIR=/path/to/project
Key Commands
Building Context
ktx ingest
ktx ingest --connection warehouse
ktx ingest --rebuild
Searching Context
ktx sl "revenue"
ktx sl "customer lifetime value"
ktx wiki "refund policy"
ktx wiki "revenue recognition rules"
ktx search "monthly recurring revenue"
MCP Server (for AI Agents)
ktx mcp start
ktx mcp start --project-dir /path/to/project
ktx mcp tools
Managing Semantic Layer
ktx sl validate
ktx sl list
ktx sl show monthly_revenue
Database Connection Examples
PostgreSQL
databases:
warehouse:
type: postgres
host: localhost
port: 5432
database: analytics
user: readonly_user
ssl: true
Snowflake
databases:
snowflake_prod:
type: snowflake
account: xy12345.us-east-1
warehouse: COMPUTE_WH
database: ANALYTICS
schema: PUBLIC
user: readonly_user
role: ANALYST
BigQuery
databases:
bigquery_prod:
type: bigquery
project_id: my-gcp-project
dataset: analytics
credentials_file: /path/to/service-account.json
ClickHouse
databases:
clickhouse:
type: clickhouse
host: localhost
port: 9000
database: analytics
user: readonly
Context Source Examples
dbt
context_sources:
dbt_main:
type: dbt
project_dir: ./dbt
profiles_dir: ~/.dbt
target: prod
LookML
context_sources:
looker:
type: lookml
project_dir: ./looker-project
models:
- sales
- marketing
Metabase
context_sources:
metabase:
type: metabase
url: https://metabase.company.com
collections:
- Sales Dashboards
- Marketing Analytics
Notion
context_sources:
notion_wiki:
type: notion
page_ids:
- abc123def456
- ghi789jkl012
Semantic Layer YAML
Defining a Metric
type: metric
name: monthly_revenue
description: Total revenue aggregated by month
sql: |
SELECT
DATE_TRUNC('month', order_date) as month,
SUM(amount) as revenue
FROM orders
WHERE status = 'completed'
GROUP BY 1
dimensions:
- month
measures:
- revenue
tags:
- finance
- reporting
Defining a Dimension
type: dimension
name: customer_segment
description: Customer segmentation based on lifetime value
sql: |
CASE
WHEN lifetime_value >= 10000 THEN 'Enterprise'
WHEN lifetime_value >= 1000 THEN 'Mid-Market'
ELSE 'SMB'
END
source_table: customers
Defining a Joinable Source
type: source
name: orders
description: All customer orders
table: public.orders
primary_key: order_id
joins:
- to: customers
type: many_to_one
on: orders.customer_id = customers.customer_id
columns:
- name: order_id
type: integer
description: Unique order identifier
- name: customer_id
type: integer
description: Foreign key to customers
- name: amount
type: decimal
description: Order total in USD
- name: order_date
type: timestamp
description: When order was placed
Agent Integration
Using with Claude Code
After running ktx setup, ktx installs MCP integration automatically. The MCP server provides these tools:
ktx_search_semantic_layer: Search metrics, dimensions, sources
ktx_search_wiki: Search wiki content
ktx_get_metric: Get full metric definition
ktx_query_preview: Preview SQL for a semantic query
In Claude Code:
Use ktx to find our revenue metrics
What wiki pages discuss refund policies?
Show me the SQL for monthly_recurring_revenue metric
Using with Codex
npx skills add Kaelio/ktx --skill ktx
ktx setup
Codex can now use ktx tools:
Query our warehouse for Q1 2024 revenue by customer segment
Find documentation about our churn calculation
Using with Cursor or OpenCode
Start the MCP server manually:
ktx mcp start
Then configure your agent's MCP settings to connect to the ktx server.
Common Workflows
Setting Up a New Analytics Project
npm install -g @kaelio/ktx
cd /path/to/analytics-project
ktx setup
ktx status
ktx ingest
ktx sl "revenue"
ktx wiki "metric definitions"
Updating Context After Schema Changes
ktx ingest --connection warehouse --rebuild
ktx ingest --rebuild
Adding a New Metric
cat > semantic-layer/warehouse/metrics/arr.yaml <<EOF
type: metric
name: annual_recurring_revenue
description: ARR calculated from active subscriptions
sql: |
SELECT
DATE_TRUNC('year', subscription_start) as year,
SUM(monthly_amount * 12) as arr
FROM subscriptions
WHERE status = 'active'
GROUP BY 1
dimensions:
- year
measures:
- arr
tags:
- finance
- saas
EOF
ktx sl validate
ktx ingest --connection warehouse
ktx sl "annual recurring revenue"
Writing Wiki Documentation
mkdir -p wiki/global/metrics
cat > wiki/global/metrics/revenue-recognition.md <<EOF
# Revenue Recognition
## Overview
Our revenue recognition follows ASC 606 guidelines.
## Key Rules
1. Revenue is recognized when service is delivered
2. Refunds are deducted in the month issued
3. Deferred revenue is amortized monthly
## Related Metrics
- monthly_revenue
- deferred_revenue
- recognized_revenue
EOF
ktx ingest
ktx wiki "revenue recognition"
Troubleshooting
ktx status shows "LLM ready: no"
Configure LLM provider:
ktx setup
Or set environment variable:
export ANTHROPIC_API_KEY=your_key_here
ktx status
Database connection fails
Test connection:
ktx ingest --connection warehouse --verbose
Common fixes:
- Verify user has SELECT permissions on all tables
- Check firewall/network access to database host
- Ensure SSL settings match your database requirements
- For Snowflake: verify warehouse is running
Semantic layer validation errors
ktx sl validate semantic-layer/warehouse/metrics/revenue.yaml
MCP server won't start
ktx status
ktx ingest
ktx mcp start --verbose
ktx mcp start --port 3001
Search returns no results
ktx ingest --rebuild
cat ktx.yaml
ls -la semantic-layer/
ls -la wiki/global/
Agent can't access ktx tools
- Verify MCP server is running:
ktx status
- Check agent's MCP configuration points to correct ktx server
- Restart agent after starting MCP server
- Check MCP server logs:
ktx mcp start --verbose
Best Practices
1. Organize Semantic Layer by Domain
semantic-layer/
warehouse/
metrics/
finance/
revenue.yaml
arr.yaml
product/
dau.yaml
retention.yaml
dimensions/
time.yaml
geography.yaml
sources/
orders.yaml
customers.yaml
2. Document Metric Business Logic in Wiki
Link semantic layer metrics to wiki documentation:
type: metric
name: monthly_revenue
description: See wiki/global/metrics/revenue-recognition.md for full details
3. Use Tags for Discovery
type: metric
name: customer_churn_rate
tags:
- saas
- retention
- executive-dashboard
- monthly-reporting
4. Version Control Everything
.ktx/secrets.yaml
.ktx/cache/
.ktx/*.db
Commit:
ktx.yaml
semantic-layer/
wiki/global/
5. Set Up Read-Only Database Users
ktx never writes to your database, but use dedicated read-only credentials:
CREATE USER ktx_readonly WITH PASSWORD 'secure_password';
GRANT CONNECT ON DATABASE analytics TO ktx_readonly;
GRANT USAGE ON SCHEMA public TO ktx_readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO ktx_readonly;
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT SELECT ON TABLES TO ktx_readonly;
6. Regular Context Updates
Set up a cron job or CI workflow:
#!/bin/bash
cd /path/to/project
ktx ingest --connection warehouse
ktx sl validate
Advanced Usage
Custom LLM Configuration
llm:
provider: vertex
model: claude-3-5-sonnet-v2@20250219
project_id: my-gcp-project
region: us-central1
temperature: 0.1
max_tokens: 4096
Multiple Database Connections
databases:
prod_warehouse:
type: snowflake
account: prod.us-east-1
database: ANALYTICS_PROD
staging_warehouse:
type: snowflake
account: staging.us-east-1
database: ANALYTICS_STAGING
postgres_app:
type: postgres
host: app-db.internal
database: application
Programmatic Usage (TypeScript)
import { KtxContext } from '@kaelio/ktx';
const ctx = await KtxContext.load('/path/to/project');
const results = await ctx.searchSemanticLayer('revenue');
console.log(results);
const metric = await ctx.getMetric('monthly_revenue');
console.log(metric.sql);
const wikiResults = await ctx.searchWiki('refund policy');
console.log(wikiResults);
Resources