| name | ktx-ai-data-agents-context |
| description | Executable context layer for AI data agents to query warehouses accurately through MCP with skills and memory |
| triggers | ["set up ktx for data agent queries","configure ktx semantic layer for my warehouse","help me build context with ktx","integrate ktx with claude code for data analysis","create ktx semantic sources from dbt","query my warehouse using ktx skills","troubleshoot ktx context ingestion","configure ktx mcp server for agents"] |
ktx AI Data Agents Context Skill
Skill by ara.so ā MCP Skills collection.
ktx is a self-improving context layer that teaches AI agents how to query your data warehouse accurately. It combines approved metric definitions, joinable columns, wiki knowledge, and dbt/Looker metadata into one searchable surface for agents like Claude Code, Codex, Cursor, and OpenCode.
What ktx Does
- Auto-learns warehouse structure: Samples tables, detects joinable columns, resolves fan/chasm traps
- Ingests company knowledge: Combines dbt, MetricFlow, LookML, Looker, Metabase, and Notion content
- Builds semantic layer: Creates reusable metric definitions with automatic join resolution
- Serves agents via MCP: Exposes CLI and Model Context Protocol tools for agent execution
- Flags contradictions: Identifies conflicts across wiki pages and metric definitions
- Read-only by design: Never writes to your warehouse
Installation
Global Installation
npm install -g @kaelio/ktx
Project-local Installation
npm install --save-dev @kaelio/ktx
Initial Setup
Interactive Setup
ktx setup
This command:
- Creates or resumes a ktx project
- Configures LLM provider (Anthropic API, Google Vertex, AI Gateway, or Claude Code session)
- Sets up embeddings provider (OpenAI, Google, or AI Gateway)
- Connects to databases (PostgreSQL, Snowflake, BigQuery, ClickHouse, MySQL, SQL Server, SQLite)
- Configures context sources (dbt, MetricFlow, LookML, Looker, Metabase, Notion)
- Builds initial context
- Installs agent integration
Check Project Status
ktx status
Example output:
ktx project: /home/user/analytics
Project ready: yes
LLM ready: yes (claude-sonnet-4-6)
Embeddings ready: yes (text-embedding-3-small)
Databases configured: yes (warehouse)
Context sources configured: yes (dbt_main)
ktx context built: yes
Agent integration ready: yes (codex:project)
Project Structure
my-project/
āāā ktx.yaml # Project configuration
āāā semantic-layer/<connection-id>/ # YAML semantic sources
āāā wiki/global/ # Shared business context
āāā wiki/user/<user-id>/ # User-scoped notes
āāā raw-sources/<connection-id>/ # Ingest artifacts and reports
āāā .ktx/ # Local state and secrets (git-ignored)
Commit: ktx.yaml, semantic-layer/, wiki/
Ignore: .ktx/
Configuration
ktx.yaml Example
version: 1
project:
name: analytics
description: Company analytics warehouse
llm:
provider: anthropic
model: claude-sonnet-4-6
embeddings:
provider: openai
model: text-embedding-3-small
databases:
warehouse:
type: postgres
host: localhost
port: 5432
database: analytics
contextSources:
dbt_main:
type: dbt
path: ./dbt-project
target: prod
Secrets Management
Never commit secrets. Store in .ktx/secrets.yaml (auto git-ignored) or use environment variables:
export ANTHROPIC_API_KEY=your_key_here
export OPENAI_API_KEY=your_key_here
export WAREHOUSE_PASSWORD=your_password_here
Key CLI Commands
Context Building
ktx ingest
ktx ingest --connection warehouse
ktx ingest --source dbt_main
ktx ingest --force
Search Commands
ktx sl "revenue"
ktx sl "customer count by region"
ktx wiki "refund policy"
ktx wiki "how we calculate churn"
ktx sl "orders" --limit 5
MCP Server
ktx mcp start
ktx mcp start --project-dir /path/to/project
ktx status
Project Management
ktx init
ktx validate
ktx info
Real Usage Examples
Example 1: Setting Up ktx for a dbt + Snowflake Project
cd ~/analytics
ktx setup
ktx status
ktx ingest
ktx sl "monthly recurring revenue"
Example 2: Creating a Semantic Source (YAML)
File: semantic-layer/warehouse/metrics.yaml
version: 1
type: semantic-source
connection: warehouse
entities:
- name: orders
type: table
sql_table: analytics.orders
description: All customer orders
columns:
- name: order_id
type: primary_key
- name: customer_id
type: foreign_key
references: customers.customer_id
- name: order_date
type: timestamp
- name: total_amount
type: number
- name: customers
type: table
sql_table: analytics.customers
description: Customer dimension
columns:
- name: customer_id
type: primary_key
- name: email
type: string
- name: created_at
type: timestamp
metrics:
- name: total_revenue
type: simple
sql: "SUM(${orders.total_amount})"
description: Sum of all order amounts
- name: customer_count
type: simple
sql: "COUNT(DISTINCT ${customers.customer_id})"
description: Total unique customers
- name: average_order_value
type: derived
sql: "${total_revenue} / COUNT(DISTINCT ${orders.order_id})"
description: Average revenue per order
After creating/editing semantic sources:
ktx ingest --connection warehouse
Example 3: Adding Wiki Knowledge
File: wiki/global/refund-policy.md
# Refund Policy
Our refund policy states that customers can request a full refund within 30 days of purchase.
## Business Rules
- Refunds are processed within 5-7 business days
- Partial refunds are not supported
- Refunded orders are marked with `status = 'refunded'` in the orders table
## Related Metrics
- **refund_rate**: `COUNT(refunded_orders) / COUNT(total_orders)`
- Exclude refunded orders from revenue calculations using `WHERE status != 'refunded'`
After adding wiki content:
ktx ingest
ktx wiki "refund policy"
Example 4: Agent Integration with Claude Code
npx skills add Kaelio/ktx --skill ktx
ktx mcp start
Example 5: TypeScript API Usage (Programmatic)
import { KtxClient } from '@kaelio/ktx';
const client = new KtxClient({
projectDir: '/path/to/project',
});
const metrics = await client.searchSemanticLayer('revenue', {
limit: 10,
type: 'metric',
});
console.log(metrics);
const wikiResults = await client.searchWiki('refund policy');
console.log(wikiResults);
const result = await client.executeMetric('total_revenue', {
filters: {
order_date: { gte: '2024-01-01' }
},
dimensions: ['region']
});
console.log(result);
Common Patterns
Pattern 1: dbt Integration
ktx setup
Pattern 2: Multi-Database Setup
databases:
warehouse:
type: snowflake
account: xy12345
database: analytics
schema: public
events:
type: clickhouse
host: localhost
port: 9000
database: events
ktx ingest
ktx ingest --connection events
Pattern 3: User-scoped Wiki Pages
ktx wiki "my notes"
ktx wiki "glossary"
Pattern 4: Continuous Context Updates
cd /path/to/analytics
ktx ingest --force
if git diff --cached --name-only | grep -q "^semantic-layer/"; then
ktx validate
fi
Troubleshooting
MCP Server Not Starting
Symptom: Agent can't find ktx tools
ktx status
ktx mcp start --project-dir /path/to/project
ps aux | grep "ktx mcp"
LLM Provider Not Configured
Symptom: ktx ingest fails with "LLM provider not configured"
ktx status
ktx setup
export ANTHROPIC_API_KEY=sk-ant-...
Embeddings Provider Issues
Symptom: Search returns no results or fails
ktx status
ktx setup
export OPENAI_API_KEY=sk-...
Database Connection Failures
Symptom: ktx ingest fails to connect to warehouse
ktx validate --connection warehouse
cat .ktx/secrets.yaml
export WAREHOUSE_PASSWORD=your_password
export WAREHOUSE_USER=your_user
Ingest Hangs or Takes Too Long
Symptom: ktx ingest runs for hours
ls -lh raw-sources/warehouse/
databases:
warehouse:
exclude_tables:
- large_logs_table
- raw_events
ktx ingest --force
Semantic Layer Not Found
Symptom: ktx sl "metric" returns no results
ls -la semantic-layer/
ktx ingest
Wiki Search Returns Nothing
Symptom: ktx wiki "query" returns empty
ls -la wiki/global/
ls -la wiki/user/$USER/
ktx ingest
Contradictions Detected
Symptom: ktx ingest reports contradictions
Contradiction detected:
- wiki/global/revenue.md defines revenue as SUM(amount)
- semantic-layer/warehouse/metrics.yaml defines revenue as SUM(total)
Action required: Review and resolve conflict
Resolution:
- Review both sources
- Update one to match the other
- Re-run
ktx ingest
Project Directory Not Found
Symptom: ktx: command not found or Project not found
npm list -g @kaelio/ktx
npm install -g @kaelio/ktx
ktx status --project-dir /path/to/project
export KTX_PROJECT_DIR=/path/to/project
ktx status
Advanced Configuration
Custom LLM Configuration
llm:
provider: anthropic
model: claude-sonnet-4-6
temperature: 0.3
max_tokens: 4096
Selective Context Ingestion
contextSources:
dbt_main:
type: dbt
path: ./dbt-project
target: prod
include:
- models/marts/**
- models/staging/**
exclude:
- models/staging/raw_*
Join Graph Customization
version: 1
type: join-graph
connection: warehouse
joins:
- left: orders
right: customers
on: orders.customer_id = customers.customer_id
type: left
- left: orders
right: products
on: orders.product_id = products.product_id
type: left
relationship: many_to_one
Best Practices
- Commit semantic layer and wiki: Always version control
semantic-layer/ and wiki/global/
- Use environment variables for secrets: Never commit
.ktx/secrets.yaml
- Re-ingest after schema changes: Run
ktx ingest when warehouse schema evolves
- Document metrics in wiki: Create wiki pages explaining complex metric logic
- Validate before commit: Add
ktx validate to pre-commit hooks
- Start MCP server before agent use: Always run
ktx mcp start before opening Claude Code/Codex
- Use descriptive entity names: Name semantic sources after business concepts, not table names
Resources