| name | ktx-ai-data-agents |
| description | Use ktx to build an executable context layer for AI data agents, enabling accurate warehouse queries through semantic layer, skills, and memory |
| triggers | ["set up ktx for data agent queries","configure ktx semantic layer for my warehouse","install ktx to let agents query my database","build ktx context from dbt and warehouse metadata","create ktx wiki pages for business metrics","integrate ktx with Claude Code for data analysis","search ktx semantic layer for revenue metrics","troubleshoot ktx context ingestion"] |
ktx AI Data Agents
Skill by ara.so — MCP Skills collection.
ktx is a self-improving context layer that teaches AI agents how to query your data warehouse accurately. It combines approved metric definitions, joinable columns, business knowledge from wikis and dbt, and a semantic layer into one searchable surface. Agents use ktx through MCP tools to fetch canonical SQL and metric logic instead of inventing queries from scratch.
What ktx Does
- Learns from company knowledge: Ingests dbt, Looker, Metabase, Notion, and wiki content; organizes and deduplicates it; flags contradictions
- Maps the data stack: Samples tables, captures metadata, detects joinable columns, resolves fan/chasm traps
- Builds a semantic layer: Combines raw tables and high-level metrics through a join graph
- Serves agents: Exposes CLI and MCP tools with full-text and semantic search across wiki and semantic entities
Works with PostgreSQL, Snowflake, BigQuery, ClickHouse, MySQL, SQL Server, SQLite.
Installation
Global CLI
npm install -g @kaelio/ktx
Project-local
npm install @kaelio/ktx
npx ktx setup
Quick Setup
ktx setup
This interactive command:
- Creates or resumes a local ktx project (
ktx.yaml)
- Configures LLM provider (Anthropic API, Vertex AI, or Claude Code SDK)
- Configures embedding provider (OpenAI, Voyage AI, etc.)
- Adds database connections (warehouse credentials)
- Adds context sources (dbt projects, Looker instances, Notion pages)
- Builds initial context via
ktx ingest
- Installs agent integration (MCP for Claude Code, Codex, Cursor, OpenCode)
Example output:
ktx project: /home/user/analytics
Project ready: yes
LLM ready: yes (claude-sonnet-4-6)
Embeddings ready: yes (text-embedding-3-small)
Databases configured: yes (warehouse)
Context sources configured: yes (dbt_main)
ktx context built: yes
Agent integration ready: yes (codex:project)
Key Commands
Check project status
ktx status
Shows readiness of LLM, embeddings, databases, context sources, and agent integration.
Build context
ktx ingest
ktx ingest --connection-id warehouse
ktx ingest --source-id dbt_main
Search semantic layer
ktx sl "revenue"
ktx sl "monthly active users" --limit 10
Returns semantic sources (metrics, dimensions, entities) matching the query.
Search wiki
ktx wiki "refund policy"
ktx wiki "customer lifecycle" --scope global
ktx wiki "my notes on churn" --scope user
Returns wiki pages from wiki/global/ or wiki/user/<user-id>/.
Start MCP server
ktx mcp start
ktx mcp start --project-dir /path/to/project
Required for agent clients (Claude Code, Codex, Cursor, OpenCode) to use ktx tools.
Validate configuration
ktx validate
ktx validate --connection-id warehouse
Tests database connections and context source access.
Configuration
ktx.yaml Structure
version: 1
project:
name: analytics
llm:
provider: anthropic
model: claude-sonnet-4-6
embeddings:
provider: openai
model: text-embedding-3-small
connections:
- id: warehouse
type: postgres
config:
host: ${POSTGRES_HOST}
port: 5432
database: analytics
user: ${POSTGRES_USER}
password: ${POSTGRES_PASSWORD}
schema: public
options:
sampleRows: 1000
maxTables: 500
contextSources:
- id: dbt_main
type: dbt
config:
profilesDir: ~/.dbt
projectDir: ./dbt
profile: analytics
target: prod
- id: notion_docs
type: notion
config:
token: ${NOTION_TOKEN}
databaseId: ${NOTION_DATABASE_ID}
Secrets go in .ktx/secrets.env:
POSTGRES_HOST=warehouse.example.com
POSTGRES_USER=readonly_user
POSTGRES_PASSWORD=secret123
NOTION_TOKEN=secret_xyz
NOTION_DATABASE_ID=abc123
LLM Providers
Anthropic API
llm:
provider: anthropic
model: claude-sonnet-4-6
apiKey: ${ANTHROPIC_API_KEY}
Google Vertex AI
llm:
provider: vertex
model: claude-sonnet-4-6
config:
projectId: ${GCP_PROJECT_ID}
region: us-central1
Claude Code SDK (local session)
llm:
provider: claude-code
model: claude-sonnet-4-6
Embedding Providers
OpenAI
embeddings:
provider: openai
model: text-embedding-3-small
apiKey: ${OPENAI_API_KEY}
Voyage AI
embeddings:
provider: voyage
model: voyage-3
apiKey: ${VOYAGE_API_KEY}
Project Layout
my-project/
├── ktx.yaml # Project configuration
├── semantic-layer/warehouse/ # YAML semantic sources per connection
│ ├── metrics/
│ │ └── revenue.yaml
│ ├── dimensions/
│ │ └── customer_segment.yaml
│ └── entities/
│ └── customer.yaml
├── wiki/global/ # Shared business context
│ ├── refund-policy.md
│ └── metric-definitions.md
├── wiki/user/alice/ # User-scoped notes
│ └── analysis-notes.md
├── raw-sources/warehouse/ # Ingest artifacts (git-ignored)
│ ├── tables.json
│ ├── columns.json
│ └── sample-data.parquet
└── .ktx/ # Local state and secrets (git-ignored)
├── secrets.env
├── embeddings.db
└── mcp-state.json
Commit: ktx.yaml, semantic-layer/, wiki/global/
Ignore: .ktx/, raw-sources/
Code Examples
TypeScript: Programmatic ktx Usage
import { KtxProject } from '@kaelio/ktx';
const project = await KtxProject.load('/path/to/project');
const results = await project.semanticLayer.search('revenue', { limit: 5 });
for (const result of results) {
console.log(`${result.type}: ${result.name}`);
console.log(`SQL: ${result.sql}`);
}
const wikiPages = await project.wiki.search('refund policy', { scope: 'global' });
for (const page of wikiPages) {
console.log(`${page.title}: ${page.path}`);
}
await project.ingest({ connectionId: 'warehouse' });
TypeScript: Custom Context Source Connector
import { ContextSource, ContextSourceConfig } from '@kaelio/ktx';
interface CustomSourceConfig extends ContextSourceConfig {
apiUrl: string;
apiKey: string;
}
class CustomContextSource extends ContextSource<CustomSourceConfig> {
async validate(): Promise<void> {
const response = await fetch(`${this.config.apiUrl}/health`, {
headers: { Authorization: `Bearer ${this.config.apiKey}` }
});
if (!response.ok) throw new Error('Invalid API key or URL');
}
async ingest(project: KtxProject): Promise<void> {
const response = await fetch(`${this.config.apiUrl}/metrics`, {
headers: { Authorization: `Bearer ${this.config.apiKey}` }
});
const metrics = await response.json();
for (const metric of metrics) {
await project.semanticLayer.upsertMetric({
id: metric.id,
name: metric.name,
sql: metric.sql,
description: metric.description,
connectionId: 'warehouse'
});
}
}
}
Python: Semantic Layer Query Planning
ktx includes a Python semantic-layer query planner (ktx-sl):
from ktx_sl import SemanticLayer, MetricQuery
sl = SemanticLayer.load('/path/to/project/semantic-layer/warehouse')
query = MetricQuery(
metrics=['revenue', 'order_count'],
dimensions=['customer_segment', 'region'],
filters={'order_date': {'gte': '2024-01-01'}}
)
plan = sl.plan(query)
print(plan.sql)
Common Patterns
Adding a Wiki Page
Create wiki/global/refund-policy.md:
# Refund Policy
Refunds are issued within 30 days of purchase for orders under $500.
Orders over $500 require manager approval.
Refund metric: `COUNT(CASE WHEN refund_issued THEN 1 END)`
Then rebuild embeddings:
ktx ingest --source-id wiki_global
Defining a Semantic Metric
Create semantic-layer/warehouse/metrics/revenue.yaml:
id: revenue
name: Revenue
type: metric
sql: SUM(orders.amount)
description: Total order revenue in USD
entity: order
connectionId: warehouse
tags:
- finance
- kpi
Detecting Joinable Columns
ktx auto-detects joins during ingestion. To manually annotate:
id: customer
name: Customer
type: entity
primaryKey: customer_id
connectionId: warehouse
joinableWith:
- entity: order
foreignKey: customer_id
Fan Trap Resolution
ktx resolves fan traps automatically. If a query joins customers -> orders -> line_items, ktx generates CTEs to prevent row multiplication:
WITH orders_agg AS (
SELECT customer_id, SUM(amount) AS revenue
FROM orders
GROUP BY customer_id
)
SELECT c.customer_id, o.revenue
FROM customers c
LEFT JOIN orders_agg o ON c.customer_id = o.customer_id;
Agent Integration via MCP
In your Claude Code / Codex project settings:
{
"mcpServers": {
"ktx": {
"command": "ktx",
"args": ["mcp", "start", "--project-dir", "/path/to/project"]
}
}
}
Or auto-install via ktx setup agent integration step.
MCP Tools Exposed
When ktx mcp start is running, agents have access to:
ktx_search_semantic_layer: Search metrics, dimensions, entities
ktx_search_wiki: Search wiki pages
ktx_get_metric_sql: Fetch canonical SQL for a metric
ktx_get_table_schema: Get table DDL and sample rows
ktx_list_connections: List configured database connections
ktx_validate_query: Validate a SQL query against the semantic layer
Example agent prompt:
Use ktx_search_semantic_layer to find the "monthly active users" metric,
then use ktx_get_metric_sql to show me the canonical SQL.
Troubleshooting
"LLM not configured"
Run ktx setup and select an LLM provider. Ensure API keys are in .ktx/secrets.env:
ANTHROPIC_API_KEY=sk-ant-...
Or use Claude Code SDK (no key required):
llm:
provider: claude-code
model: claude-sonnet-4-6
"Database connection failed"
Test connection:
ktx validate --connection-id warehouse
Verify credentials in .ktx/secrets.env. For read-only access, grant SELECT on all schemas:
GRANT SELECT ON ALL TABLES IN SCHEMA public TO readonly_user;
"Context not built"
Run ingestion:
ktx ingest
Check raw-sources/<connection-id>/ for ingestion logs. If tables are missing, verify schema permissions.
"MCP server not responding"
Restart the server:
ktx mcp stop
ktx mcp start --project-dir /path/to/project
Ensure agent client is configured to call ktx mcp start with the correct --project-dir.
"Semantic search returns no results"
Rebuild embeddings:
rm .ktx/embeddings.db
ktx ingest
Verify embedding provider is configured:
embeddings:
provider: openai
model: text-embedding-3-small
apiKey: ${OPENAI_API_KEY}
"Contradictions detected" warning
ktx flags contradictions when the same metric is defined differently across sources (e.g., dbt vs. Looker). Review raw-sources/<connection-id>/contradictions.json and reconcile definitions manually.
Advanced: Custom Database Connector
import { DatabaseConnector, TableMetadata } from '@kaelio/ktx';
class CustomDatabaseConnector extends DatabaseConnector {
async connect(): Promise<void> {
}
async listTables(schema: string): Promise<string[]> {
}
async getTableMetadata(table: string): Promise<TableMetadata> {
}
async sampleRows(table: string, limit: number): Promise<Record<string, any>[]> {
}
async executeQuery(sql: string): Promise<Record<string, any>[]> {
}
async disconnect(): Promise<void> {
}
}
Register in ktx.yaml:
connections:
- id: custom_db
type: custom
connector: ./connectors/custom-db.ts
config:
host: ${CUSTOM_DB_HOST}
apiKey: ${CUSTOM_DB_KEY}
Resources