| name | ktx-ai-data-context-layer |
| description | Expert in ktx - the executable context layer for data and analytics agents with skills, memory and semantic layer |
| triggers | ["set up ktx for my data warehouse","configure ktx semantic layer for my database","how do I use ktx with Claude Code","help me ingest data sources into ktx","configure ktx to read my dbt models","troubleshoot ktx MCP server connection","search ktx semantic layer and wiki","build ktx context from my warehouse"] |
ktx AI Data Context Layer Skill
Skill by ara.so — MCP Skills collection.
What is ktx?
ktx is a self-improving context layer that teaches AI agents how to query your data warehouse accurately. It automatically:
- Learns from company knowledge - ingests wiki content, organizes it, removes duplicates, flags contradictions
- Maps the data stack - samples tables, captures metadata, detects joinable columns
- Builds a semantic layer - combines raw tables and metrics through a join graph that resolves chasm and fan traps
- Serves agents at execution - exposes CLI and MCP tools with combined full-text and semantic search
Works with PostgreSQL, Snowflake, BigQuery, ClickHouse, MySQL, SQL Server, and SQLite. Integrates with dbt, MetricFlow, LookML, Looker, Metabase, and Notion.
Installation
Global CLI Installation
npm install -g @kaelio/ktx
Project-Specific Installation
npm install @kaelio/ktx
Quick Setup
ktx setup
This interactive command:
- Creates or resumes a local ktx project
- Configures LLM and embedding providers
- Sets up database connections
- Configures context sources (dbt, Looker, etc.)
- Builds initial context
- Installs agent integration
Project Structure
my-project/
├── ktx.yaml # Project configuration
├── semantic-layer/<connection-id>/ # YAML semantic sources
├── wiki/global/ # Shared business context
├── wiki/user/<user-id>/ # User-scoped notes
├── raw-sources/<connection-id>/ # Ingest artifacts and reports
└── .ktx/ # Local state and secrets (git-ignored)
Important: Commit ktx.yaml, semantic-layer/, and wiki/. Keep .ktx/ local and git-ignored.
Core Commands
Check Project Status
ktx status
Example output:
ktx project: /home/user/analytics
Project ready: yes
LLM ready: yes (claude-sonnet-4-6)
Embeddings ready: yes (text-embedding-3-small)
Databases configured: yes (warehouse)
Context sources configured: yes (dbt_main)
ktx context built: yes
Agent integration ready: yes (codex:project)
Build Context
ktx ingest
ktx ingest --connection warehouse
ktx ingest --source dbt_main
Search Semantic Layer
ktx sl "revenue"
ktx sl "customer lifetime value" --json
Search Wiki
ktx wiki "refund policy"
ktx wiki "how do we calculate churn"
MCP Server
ktx mcp start
ktx mcp start --project-dir /path/to/project
ktx mcp status
Configuration
ktx.yaml Structure
version: "1"
project:
name: "analytics"
description: "Company analytics warehouse"
llm:
provider: "anthropic"
model: "claude-sonnet-4-6"
apiKeyEnvVar: "ANTHROPIC_API_KEY"
embeddings:
provider: "openai"
model: "text-embedding-3-small"
apiKeyEnvVar: "OPENAI_API_KEY"
connections:
warehouse:
type: "postgres"
host: "localhost"
port: 5432
database: "analytics"
user: "readonly_user"
passwordEnvVar: "DB_PASSWORD"
ssl: false
sources:
dbt_main:
type: "dbt"
connection: "warehouse"
manifestPath: "./target/manifest.json"
catalogPath: "./target/catalog.json"
Environment Variables
Create a .env file in your project root:
ANTHROPIC_API_KEY=your_key_here
OPENAI_API_KEY=your_key_here
DB_PASSWORD=your_db_password_here
KTX_PROJECT_DIR=/path/to/project
LLM Provider Configuration
Anthropic API
llm:
provider: "anthropic"
model: "claude-sonnet-4-6"
apiKeyEnvVar: "ANTHROPIC_API_KEY"
Google Vertex AI
llm:
provider: "vertex"
model: "claude-sonnet-4-6"
projectId: "my-gcp-project"
region: "us-central1"
credentialsEnvVar: "GOOGLE_APPLICATION_CREDENTIALS"
Claude Code Session (Local)
llm:
provider: "claude-agent-sdk"
Database Connection Examples
PostgreSQL
connections:
warehouse:
type: "postgres"
host: "db.example.com"
port: 5432
database: "analytics"
user: "readonly"
passwordEnvVar: "POSTGRES_PASSWORD"
ssl: true
Snowflake
connections:
snowflake:
type: "snowflake"
account: "xy12345.us-east-1"
warehouse: "COMPUTE_WH"
database: "ANALYTICS"
schema: "PUBLIC"
user: "ktx_user"
passwordEnvVar: "SNOWFLAKE_PASSWORD"
BigQuery
connections:
bigquery:
type: "bigquery"
projectId: "my-project"
dataset: "analytics"
credentialsEnvVar: "GOOGLE_APPLICATION_CREDENTIALS"
Context Source Configuration
dbt
sources:
dbt_main:
type: "dbt"
connection: "warehouse"
manifestPath: "./target/manifest.json"
catalogPath: "./target/catalog.json"
docsPath: "./target/index.html"
Looker
sources:
looker:
type: "looker"
connection: "warehouse"
projectPath: "./looker-models"
Metabase
sources:
metabase:
type: "metabase"
connection: "warehouse"
apiUrl: "https://metabase.example.com"
apiKeyEnvVar: "METABASE_API_KEY"
Notion
sources:
notion_wiki:
type: "notion"
apiKeyEnvVar: "NOTION_API_KEY"
databaseIds:
- "abc123def456"
- "789ghi012jkl"
Agent Integration
Claude Code
After running ktx setup, the integration is automatic. From your project directory:
What is our total revenue this quarter?
Claude Code will use ktx's semantic layer to query accurately.
Codex
npx skills add Kaelio/ktx --skill ktx
Cursor / OpenCode
Configure MCP in your editor settings:
{
"mcpServers": {
"ktx": {
"command": "ktx",
"args": ["mcp", "start", "--project-dir", "/path/to/project"]
}
}
}
Semantic Layer Usage
Defining Metrics
Create YAML files in semantic-layer/<connection-id>/:
version: "1"
type: "metric"
name: "total_revenue"
description: "Sum of all order amounts"
sql: "SUM(orders.amount)"
dimensions:
- "customer_id"
- "order_date"
filters:
- "orders.status = 'completed'"
source_table: "orders"
Defining Dimensions
version: "1"
type: "dimension"
name: "customer_segment"
description: "Customer segment based on lifetime value"
sql: |
CASE
WHEN total_spent > 10000 THEN 'enterprise'
WHEN total_spent > 1000 THEN 'mid-market'
ELSE 'smb'
END
source_table: "customers"
Join Graph
ktx automatically detects joinable columns. You can override in ktx.yaml:
semantic_layer:
joins:
- left_table: "orders"
right_table: "customers"
left_column: "customer_id"
right_column: "id"
type: "inner"
Wiki Management
Adding Wiki Pages
mkdir -p wiki/global
cat > wiki/global/refund-policy.md <<EOF
# Refund Policy
Customers can request refunds within 30 days.
Full refunds issued if:
- Product not as described
- Technical issues unresolved
Partial refunds (50%) if:
- Customer changed mind
- Alternative solution offered
EOF
User-Scoped Notes
mkdir -p wiki/user/alice
cat > wiki/user/alice/analysis-notes.md <<EOF
# Q1 Analysis Notes
Revenue spike in March due to new product launch.
Check customer_acquisition_source for details.
EOF
Ingesting Wiki Content
ktx ingest
ktx wiki "refund timeline"
Common Patterns
Initial Project Setup
import { execSync } from 'child_process';
import * as fs from 'fs';
import * as path from 'path';
const projectDir = process.cwd();
const config = {
version: "1",
project: {
name: path.basename(projectDir),
description: "Analytics warehouse"
},
llm: {
provider: "anthropic",
model: "claude-sonnet-4-6",
apiKeyEnvVar: "ANTHROPIC_API_KEY"
},
embeddings: {
provider: "openai",
model: "text-embedding-3-small",
apiKeyEnvVar: "OPENAI_API_KEY"
},
connections: {
warehouse: {
type: "postgres",
host: process.env.DB_HOST || "localhost",
port: parseInt(process.env.DB_PORT || "5432"),
database: process.env.DB_NAME || "analytics",
user: process.env.DB_USER || "readonly",
passwordEnvVar: "DB_PASSWORD"
}
}
};
fs.writeFileSync(
path.join(projectDir, 'ktx.yaml'),
JSON.stringify(config, null, 2)
);
execSync('ktx setup', { stdio: 'inherit' });
Programmatic Ingestion
import { execSync } from 'child_process';
async function runDailyIngest() {
console.log('Starting daily ktx ingestion...');
try {
execSync('ktx ingest', {
stdio: 'inherit',
env: { ...process.env, KTX_PROJECT_DIR: '/path/to/project' }
});
console.log('Ingestion complete');
} catch (error) {
console.error('Ingestion failed:', error);
process.exit(1);
}
}
runDailyIngest();
Custom Metric Definition Workflow
import * as fs from 'fs';
import * as path from 'path';
import * as yaml from 'yaml';
interface MetricDefinition {
version: string;
type: 'metric';
name: string;
description: string;
sql: string;
dimensions?: string[];
filters?: string[];
source_table: string;
}
function addMetric(
connectionId: string,
metric: Omit<MetricDefinition, 'version' | 'type'>
) {
const metricDef: MetricDefinition = {
version: "1",
type: "metric",
...metric
};
const dir = path.join(
process.cwd(),
'semantic-layer',
connectionId
);
fs.mkdirSync(dir, { recursive: true });
const filename = `${metric.name}.yaml`;
const filepath = path.join(dir, filename);
fs.writeFileSync(
filepath,
yaml.stringify(metricDef)
);
console.log(`Created metric: ${filepath}`);
}
addMetric('warehouse', {
name: 'daily_active_users',
description: 'Count of unique users per day',
sql: 'COUNT(DISTINCT user_id)',
dimensions: ['event_date'],
filters: ['event_type = \'login\''],
source_table: 'user_events'
});
Searching Programmatically
import { execSync } from 'child_process';
function searchSemanticLayer(query: string): any {
const result = execSync(`ktx sl "${query}" --json`, {
encoding: 'utf-8',
env: { ...process.env, KTX_PROJECT_DIR: '/path/to/project' }
});
return JSON.parse(result);
}
function searchWiki(query: string): any {
const result = execSync(`ktx wiki "${query}" --json`, {
encoding: 'utf-8',
env: { ...process.env, KTX_PROJECT_DIR: '/path/to/project' }
});
return JSON.parse(result);
}
const revenueMetrics = searchSemanticLayer('revenue');
console.log('Revenue metrics:', revenueMetrics);
const policies = searchWiki('refund policy');
console.log('Policies:', policies);
MCP Integration Details
Available MCP Tools
When ktx MCP server is running, agents have access to:
- search_semantic_layer - Search metrics, dimensions, and tables
- search_wiki - Search wiki pages and documentation
- get_metric_definition - Get full metric SQL and metadata
- list_connections - List available database connections
- get_table_schema - Get table column details
- get_join_paths - Find join paths between tables
Example MCP Usage from Agent
const result = await useMcpTool('ktx', 'search_semantic_layer', {
query: 'total revenue by customer segment'
});
const metricDef = await useMcpTool('ktx', 'get_metric_definition', {
metric_name: 'total_revenue'
});
const joinPath = await useMcpTool('ktx', 'get_join_paths', {
from_table: 'orders',
to_table: 'customers'
});
Troubleshooting
ktx status shows "Project ready: no"
ls -la ktx.yaml
ktx setup
echo $KTX_PROJECT_DIR
LLM provider not configured
env | grep ANTHROPIC_API_KEY
env | grep OPENAI_API_KEY
echo "ANTHROPIC_API_KEY=your_key" >> .env
echo "OPENAI_API_KEY=your_key" >> .env
ktx setup
Database connection fails
psql -h localhost -U readonly -d analytics
cat ktx.yaml | grep -A 10 connections
env | grep DB_PASSWORD
ktx ingest --project-dir /path/to/project
MCP server won't start
ktx mcp status
pkill -f "ktx mcp"
ktx mcp start --verbose
tail -f ~/.ktx/logs/mcp.log
Context ingestion fails
ktx ingest --verbose
ktx ingest --source dbt_main --verbose
ls -la target/manifest.json
ls -la target/catalog.json
ktx test-connection warehouse
Search returns no results
ktx ingest
ls -la semantic-layer/
ls -la wiki/
ktx sl "revenue" --verbose
ktx wiki "policy" --verbose
Permission errors
ls -la ktx.yaml
ls -la .ktx/
chown -R $USER:$USER .ktx/
rm -rf .ktx/
ktx setup
Agent can't find ktx
ktx mcp status
ktx mcp start --project-dir $(pwd)
cat ~/.config/claude-code/mcp.json
Telemetry Opt-Out
export KTX_TELEMETRY_DISABLED=1
echo "KTX_TELEMETRY_DISABLED=1" >> .env
ktx status
Advanced Usage
Custom Join Logic
semantic_layer:
joins:
- left_table: "orders"
right_table: "customers"
left_column: "customer_id"
right_column: "id"
type: "left"
- left_table: "orders"
right_table: "products"
left_column: "product_id"
right_column: "id"
type: "inner"
cardinality: "many_to_one"
Multi-Database Setup
connections:
warehouse:
type: "postgres"
host: "warehouse.example.com"
database: "analytics"
production:
type: "postgres"
host: "prod.example.com"
database: "app_db"
sources:
dbt_warehouse:
type: "dbt"
connection: "warehouse"
manifestPath: "./warehouse/target/manifest.json"
dbt_production:
type: "dbt"
connection: "production"
manifestPath: "./production/target/manifest.json"
CI/CD Integration
name: ktx Context Build
on:
push:
branches: [main]
schedule:
- cron: '0 2 * * *'
jobs:
build-context:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
- name: Install ktx
run: npm install -g @kaelio/ktx
- name: Build context
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
DB_PASSWORD: ${{ secrets.DB_PASSWORD }}
run: |
ktx ingest
- name: Commit updated context
run: |
git config user.name "ktx Bot"
git config user.email "bot@example.com"
git add semantic-layer/ wiki/
git diff --quiet || git commit -m "Update ktx context"
git push
Best Practices
- Version Control: Commit
ktx.yaml, semantic-layer/, and wiki/ but git-ignore .ktx/
- Read-Only Access: Configure database connections with read-only users
- Regular Ingestion: Run
ktx ingest daily or on data model changes
- Metric Naming: Use clear, consistent names (e.g.,
total_revenue, not rev)
- Documentation: Document business logic in wiki pages, not just metrics
- Environment Variables: Never commit secrets; use env vars for all credentials
- Testing: Test new metrics and joins before committing to version control
- MCP Management: Keep MCP server running for active agent sessions