| name | ktx-data-agent-context-layer |
| description | An executable context layer that teaches AI agents to query data warehouses accurately through MCP with skills, memory, and a semantic layer |
| triggers | ["set up ktx for data analysis","configure ktx semantic layer","integrate ktx with this agent","build context from my data warehouse","search metrics using ktx","ingest dbt models into ktx","query warehouse through ktx","add ktx wiki documentation"] |
ktx Data Agent Context Layer
Skill by ara.so — AI Agent Skills collection.
ktx is a self-improving context layer that teaches AI agents how to query data warehouses accurately. It automatically builds semantic layers from your warehouse metadata, dbt models, BI tools, and company wikis, then exposes that context through MCP (Model Context Protocol) and CLI tools. This enables agents to fetch approved metrics and write better SQL instead of reinventing logic on every query.
What ktx Does
- Builds warehouse context automatically — samples tables, captures metadata, detects joinable columns
- Ingests company knowledge — from dbt, LookML, Looker, Metabase, Notion, and wiki content
- Creates a semantic layer — with join graphs that resolve fan/chasm traps
- Flags contradictions — across different knowledge sources
- Serves agents via MCP — with full-text and semantic search across wiki and metrics
- Read-only by design — never writes to your database
Supports PostgreSQL, Snowflake, BigQuery, ClickHouse, MySQL, SQL Server, SQLite.
Installation
Global Installation
npm install -g @kaelio/ktx
Project-Specific Installation
npm install --save-dev @kaelio/ktx
First-Time Setup
ktx setup
This command:
- Creates or resumes a local ktx project
- Configures LLM and embedding providers
- Sets up database connections
- Configures context sources (dbt, wikis, etc.)
- Builds initial context
- Installs agent integration
Project Structure
ktx creates the following structure in your project:
my-project/
├── ktx.yaml # Project configuration
├── semantic-layer/<connection-id>/ # YAML semantic sources (commit)
├── wiki/global/ # Shared business context (commit)
├── wiki/user/<user-id>/ # User-scoped notes (commit)
├── raw-sources/<connection-id>/ # Ingest artifacts and reports
└── .ktx/ # Local state and secrets (git-ignore)
Important: Commit ktx.yaml, semantic-layer/, and wiki/. Add .ktx/ to .gitignore.
Core Commands
Project Management
ktx status
ktx setup
ktx init --project-dir ./analytics
ktx config validate
Building Context
ktx ingest
ktx ingest --connection warehouse
ktx ingest --source dbt_main
ktx ingest --force
Searching Context
ktx sl "revenue"
ktx sl "monthly active users"
ktx wiki "refund policy"
ktx wiki "customer churn definition"
ktx sl --list
ktx sl "arr" --show-yaml
MCP Server
ktx mcp start
ktx mcp start --project-dir ./analytics
ktx mcp status
Configuration
ktx.yaml Example
project:
name: analytics
version: 1.0.0
llm:
provider: anthropic
model: claude-sonnet-4-6
embeddings:
provider: openai
model: text-embedding-3-small
connections:
- id: warehouse
type: postgres
host: localhost
port: 5432
database: analytics
user: analyst
schemas:
- public
- analytics
- id: snowflake_prod
type: snowflake
account: xy12345.us-east-1
warehouse: compute_wh
database: prod
context_sources:
- id: dbt_main
type: dbt
connection: warehouse
manifest_path: ./target/manifest.json
catalog_path: ./target/catalog.json
- id: company_wiki
type: notion
database_id: abc123def456
- id: looker_prod
type: looker
base_url: https://company.looker.com
Environment Variables
Store secrets in environment variables:
export ANTHROPIC_API_KEY=sk-ant-...
export OPENAI_API_KEY=sk-...
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
export WAREHOUSE_PASSWORD=...
export SNOWFLAKE_USER=...
export SNOWFLAKE_PASSWORD=...
export NOTION_TOKEN=secret_...
export LOOKER_CLIENT_ID=...
export LOOKER_CLIENT_SECRET=...
export KTX_PROJECT_DIR=/path/to/analytics
TypeScript Integration
Using ktx Programmatically
import { KtxProject } from '@kaelio/ktx';
const project = await KtxProject.load({ projectDir: './analytics' });
const metrics = await project.searchSemanticLayer('revenue', {
limit: 10,
threshold: 0.7
});
for (const result of metrics) {
console.log(`${result.name}: ${result.description}`);
console.log(`Type: ${result.type}`);
console.log(`Score: ${result.score}`);
}
const wikiPages = await project.searchWiki('customer churn', {
limit: 5
});
for (const page of wikiPages) {
console.log(`${page.title}: ${page.content.substring(0, 100)}...`);
}
Working with Semantic Sources
import { SemanticLayer } from '@kaelio/ktx';
const sl = await SemanticLayer.load({
projectDir: './analytics',
connectionId: 'warehouse'
});
const metric = await sl.getMetric('arr');
console.log(metric.sql);
console.log(metric.dimensions);
const entities = await sl.listEntities();
for (const entity of entities) {
console.log(`${entity.name} (${entity.table})`);
}
const graph = await sl.getJoinGraph();
console.log(graph.paths);
Common Patterns
Integrating dbt Models
cd /path/to/dbt
dbt compile
dbt docs generate
ktx ingest --source dbt_main
Adding Wiki Documentation
mkdir -p wiki/global
cat > wiki/global/revenue-definitions.md << 'EOF'
Sum of all active subscription values normalized to annual amounts.
Excludes one-time fees and professional services.
ARR divided by 12. Updated daily based on active subscriptions.
EOF
ktx ingest --source wiki
Querying Through MCP in Claude Code
After running ktx mcp start, use in Claude Code:
Use the ktx skill to search for metrics related to "revenue"
Use ktx to find our definition of "customer churn"
Use ktx to get the SQL for calculating ARR
Creating Custom Metrics
metrics:
- name: arr
description: Annual Recurring Revenue
type: metric
sql: |
SELECT
DATE_TRUNC('month', subscription_start_date) as period,
SUM(CASE
WHEN billing_period = 'annual' THEN amount
WHEN billing_period = 'monthly' THEN amount * 12
WHEN billing_period = 'quarterly' THEN amount * 4
END) as arr
FROM subscriptions
WHERE status = 'active'
GROUP BY 1
dimensions:
- period
- customer_segment
- plan_type
entity: subscription
Detecting Join Paths
import { JoinGraphBuilder } from '@kaelio/ktx';
const builder = new JoinGraphBuilder({
connection: 'warehouse',
projectDir: './analytics'
});
await builder.buildFromIntrospection();
const paths = await builder.findPaths('customer', 'order');
for (const path of paths) {
console.log(`Path: ${path.tables.join(' -> ')}`);
console.log(`Join keys: ${path.joinKeys.join(', ')}`);
console.log(`Cardinality: ${path.cardinality}`);
}
Agent Integration
Claude Code Integration
ktx integrates automatically with Claude Code via MCP:
- Run
ktx setup and select Claude Code integration
- ktx updates your Claude Code config at
~/.claude/claude_desktop_config.json
- Restart Claude Code
- Use natural language to query: "Show me our revenue metrics"
Codex/Cursor Integration
For Codex or Cursor, manually configure MCP:
{
"mcpServers": {
"ktx": {
"command": "ktx",
"args": ["mcp", "start", "--project-dir", "/path/to/project"]
}
}
}
Troubleshooting
"ktx mcp start --project-dir ..." in Status
If ktx status prints a command to run:
ktx mcp start --project-dir /path/to/project
This starts the MCP server. Keep it running while using agent clients.
Connection Failures
ktx config validate --connection warehouse
echo $WAREHOUSE_PASSWORD
echo $ANTHROPIC_API_KEY
ktx setup --verbose
Ingestion Issues
ktx ingest --force --source dbt_main
cat raw-sources/warehouse/ingestion.log
ktx config validate --source dbt_main
Semantic Search Not Working
ktx config validate
ktx ingest --rebuild-index
ktx sl "test query" --debug
MCP Server Not Responding
ktx mcp status
ktx mcp stop
ktx mcp start
ktx mcp logs
Missing Metrics After dbt Update
cd /path/to/dbt
dbt compile
dbt docs generate
ktx ingest --source dbt_main --force
Advanced Usage
Custom LLM Configuration
llm:
provider: claude_code_session
llm:
provider: vertex_ai
model: claude-3-5-sonnet-20241022
project_id: my-gcp-project
region: us-central1
llm:
provider: ai_gateway
api_url: https://gateway.company.com/v1
model: claude-sonnet-4-6
Multi-Connection Setup
connections:
- id: prod
type: snowflake
account: prod.snowflakecomputing.com
warehouse: prod_wh
database: analytics
- id: staging
type: postgres
host: staging-db.company.com
database: analytics_staging
context_sources:
- id: prod_dbt
type: dbt
connection: prod
manifest_path: ./dbt/prod/target/manifest.json
- id: staging_dbt
type: dbt
connection: staging
manifest_path: ./dbt/staging/target/manifest.json
Scripting with ktx
#!/bin/bash
set -e
export KTX_PROJECT_DIR=/home/analytics/ktx-project
cd /home/dbt
dbt compile
dbt docs generate
ktx ingest --all --force
ktx sl --list > /tmp/metrics-catalog.txt
ktx wiki --list > /tmp/wiki-index.txt
echo "Context refreshed: $(date)" | slack-cli send analytics
Best Practices
- Commit semantic layer and wiki: These are your source of truth
- Gitignore
.ktx/: Contains secrets and local cache
- Use environment variables: Never commit API keys or passwords
- Run
ktx ingest regularly: Keep context fresh (daily or after dbt runs)
- Review contradiction reports: ktx flags conflicts between sources
- Start MCP server before agent: Required for MCP integration
- Document in wiki/: Add business context agents can search
- Validate after changes: Run
ktx config validate after editing ktx.yaml
Resources