| name | ktx-data-agent-context-layer |
| description | Build and query a self-improving context layer for AI data agents with ktx - combines warehouse metadata, metrics, and wiki knowledge |
| triggers | ["set up ktx for data agent context","configure ktx semantic layer","build ktx warehouse context","query data using ktx MCP server","ingest dbt models into ktx","search ktx wiki or metrics","troubleshoot ktx agent integration","add database connection to ktx"] |
ktx Data Agent Context Layer
Skill by ara.so — MCP Skills collection.
ktx is a self-improving context layer that teaches AI agents how to query your data warehouse accurately. It automatically ingests metadata from databases, dbt, LookML, Looker, Metabase, and Notion, builds a semantic layer with approved metric definitions, and exposes everything via CLI and MCP for agent execution.
What ktx Does
- Learns company knowledge: Ingests wiki content, organizes it, removes duplicates, flags contradictions
- Maps the data stack: Samples tables, captures metadata, detects joinable columns, annotates sources
- Builds semantic layer: Combines raw tables and metrics through a join graph that resolves fan/chasm traps
- Serves agents: CLI and MCP tools with semantic search across wiki and semantic-layer entities
Key benefits for agents:
- Query with approved metric definitions instead of inventing SQL every time
- Reuse canonical business logic across questions
- Get context from scattered sources (dbt, Looker, Notion) in one searchable surface
Installation
Global CLI Install
npm install -g @kaelio/ktx
Project-Scoped Install
npm install --save-dev @kaelio/ktx
npx ktx setup
As an MCP Skill
From Claude Code, Codex, Cursor, or OpenCode:
Run npx skills add Kaelio/ktx --skill ktx
Quick Start
ktx setup
ktx status
ktx ingest
ktx sl "revenue"
ktx wiki "refund policy"
ktx mcp start
Project Structure
A ktx project follows this layout:
my-project/
├── ktx.yaml # Project configuration
├── semantic-layer/<connection-id>/ # YAML semantic sources
├── wiki/global/ # Shared business context
├── wiki/user/<user-id>/ # User-scoped notes
├── raw-sources/<connection-id>/ # Ingest artifacts and reports
└── .ktx/ # Local state and secrets (git-ignored)
Version control: Commit ktx.yaml, semantic-layer/, and wiki/. Keep .ktx/ local.
Configuration
ktx.yaml Example
version: 1
project:
name: analytics
description: Data warehouse context for product analytics
llm:
provider: anthropic
model: claude-sonnet-4-6
embeddings:
provider: openai
model: text-embedding-3-small
databases:
warehouse:
type: postgres
host: localhost
port: 5432
database: analytics
context_sources:
dbt_main:
type: dbt
project_path: ./dbt
profiles_dir: ~/.dbt
target: prod
notion_docs:
type: notion
page_ids:
- a1b2c3d4e5f6
- f6e5d4c3b2a1
Secrets Management
Store secrets in .ktx/secrets.yaml or environment variables:
llm:
anthropic_api_key: sk-ant-...
embeddings:
openai_api_key: sk-...
databases:
warehouse:
user: readonly
password: ...
context_sources:
notion_docs:
token: secret_...
Environment variable names follow KTX_<SECTION>_<KEY> pattern:
KTX_ANTHROPIC_API_KEY
KTX_OPENAI_API_KEY
KTX_DATABASE_WAREHOUSE_USER
KTX_DATABASE_WAREHOUSE_PASSWORD
KTX_CONTEXT_SOURCE_NOTION_DOCS_TOKEN
Key Commands
Setup & Status
ktx setup
ktx status
Building Context
ktx ingest
ktx ingest --connection warehouse
ktx ingest --context-source dbt_main
ktx ingest --force
Searching Context
ktx sl "monthly recurring revenue"
ktx sl "user signup date"
ktx wiki "customer refund policy"
ktx wiki "metric calculation rules"
ktx sl "revenue" --json
ktx wiki "refunds" --json
MCP Server
ktx mcp start
ktx mcp start --project-dir /path/to/project
ktx mcp status
Semantic Layer Query
ktx query "revenue by month for 2024"
ktx raw-query "SELECT * FROM users WHERE created_at > '2024-01-01'"
Agent Integration Patterns
Claude Code / Codex
After running ktx setup, the MCP server is automatically configured. From your agent:
Use ktx to find the definition of monthly recurring revenue
Search the wiki for our refund policy
Query revenue by product category for Q4 2024 using ktx
Programmatic MCP Client
import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js";
const transport = new StdioClientTransport({
command: "ktx",
args: ["mcp", "start", "--project-dir", "/path/to/project"],
});
const client = new Client(
{
name: "my-agent",
version: "1.0.0",
},
{
capabilities: {
tools: {},
},
}
);
await client.connect(transport);
const slResults = await client.callTool({
name: "ktx_sl_search",
arguments: {
query: "revenue",
},
});
const wikiResults = await client.callTool({
name: "ktx_wiki_search",
arguments: {
query: "refund policy",
},
});
const queryResults = await client.callTool({
name: "ktx_query",
arguments: {
query: "revenue by month for 2024",
connection_id: "warehouse",
},
});
Semantic Layer YAML
ktx builds semantic sources from ingestion and stores them as YAML:
type: source
name: users
connection_id: warehouse
table: public.users
description: User account records with signup and subscription data
columns:
- name: id
type: integer
primary_key: true
- name: email
type: varchar
description: User email address
- name: created_at
type: timestamp
description: Account creation timestamp
- name: plan_id
type: integer
foreign_key:
table: plans
column: id
dimensions:
- name: signup_date
column: created_at
type: time
grain: day
measures:
- name: user_count
aggregation: count
description: Total number of users
type: metric
name: monthly_recurring_revenue
connection_id: warehouse
description: Total MRR from active subscriptions
sql: |
SELECT
DATE_TRUNC('month', s.start_date) AS month,
SUM(p.price) AS mrr
FROM subscriptions s
JOIN plans p ON s.plan_id = p.id
WHERE s.status = 'active'
GROUP BY 1
dimensions:
- name: month
type: time
grain: month
measures:
- name: mrr
aggregation: sum
type: currency
Wiki Pages
ktx organizes wiki content in markdown:
<!-- wiki/global/refund-policy.md -->
---
title: Customer Refund Policy
tags: [policy, customer-service, finance]
---
# Customer Refund Policy
## Eligibility
Customers can request refunds within 30 days of purchase if:
- Product defect
- Service unavailability > 24 hours
- Accidental duplicate purchase
## Processing
Refunds are processed within 5-7 business days.
Finance team approval required for amounts > $500.
## Metric Impact
Refunds reduce `net_revenue` but not `gross_revenue`.
Track via `refund_rate` metric in semantic layer.
Database Connectors
Supported databases:
| Type | Configuration |
|---|
| PostgreSQL | type: postgres |
| Snowflake | type: snowflake |
| BigQuery | type: bigquery |
| ClickHouse | type: clickhouse |
| MySQL | type: mysql |
| SQL Server | type: mssql |
| SQLite | type: sqlite |
Example PostgreSQL configuration:
databases:
warehouse:
type: postgres
host: db.example.com
port: 5432
database: analytics
schema: public
Example Snowflake configuration:
databases:
snowflake_prod:
type: snowflake
account: xy12345.us-east-1
warehouse: COMPUTE_WH
database: ANALYTICS
schema: PUBLIC
role: READONLY
Context Source Integrations
dbt
context_sources:
dbt_main:
type: dbt
project_path: ./dbt
profiles_dir: ~/.dbt
target: prod
Ingests:
- Model definitions and lineage
- Column descriptions
- Metric definitions (dbt Metrics or MetricFlow)
- Tests and constraints
Looker
context_sources:
looker_prod:
type: looker
api_url: https://looker.example.com:19999
LookML
context_sources:
lookml_repo:
type: lookml
repo_path: ./lookml
Metabase
context_sources:
metabase:
type: metabase
url: https://metabase.example.com
Notion
context_sources:
notion_docs:
type: notion
page_ids:
- root-page-id-1
- root-page-id-2
Common Workflows
Initial Setup for a New Project
npm install -g @kaelio/ktx
cd ~/analytics-project
ktx setup
ktx ingest
ktx status
ktx sl "revenue"
ktx wiki "business rules"
Adding a New Database Connection
databases:
new_warehouse:
type: postgres
host: new-db.example.com
port: 5432
database: prod
export KTX_DATABASE_NEW_WAREHOUSE_USER=readonly
export KTX_DATABASE_NEW_WAREHOUSE_PASSWORD=secret
ktx ingest --connection new_warehouse
ktx status
Updating Context After Schema Changes
ktx ingest --force
ktx ingest --connection warehouse --force
ktx ingest --context-source dbt_main --force
Using with Claude Code
ktx status
Search ktx for the definition of customer lifetime value
Use ktx to query monthly active users for the last 6 months
Check the ktx wiki for our data retention policy
Troubleshooting
"Project ready: no"
ktx status
ktx setup
"LLM ready: no"
echo $KTX_ANTHROPIC_API_KEY
cat > .ktx/secrets.yaml << EOF
llm:
anthropic_api_key: sk-ant-...
EOF
"Databases configured: no"
cat ktx.yaml | grep -A 5 databases
export KTX_DATABASE_WAREHOUSE_USER=readonly
export KTX_DATABASE_WAREHOUSE_PASSWORD=secret
ktx ingest --connection warehouse
Ingest Fails
ktx ingest --connection warehouse --verbose
echo $KTX_CONTEXT_SOURCE_LOOKER_PROD_CLIENT_ID
echo $KTX_CONTEXT_SOURCE_LOOKER_PROD_CLIENT_SECRET
MCP Server Not Found by Agent
ktx status
ktx mcp start --project-dir /path/to/project
Search Returns No Results
ktx ingest
ls -la semantic-layer/*/
ls -la wiki/global/
ktx sl "revenue"
ktx sl "revenue" --json | jq .
Read-Only Database Error
ktx only reads from your database. If you see write errors:
Telemetry Opt-Out
ktx collects anonymous usage data. To disable:
export KTX_TELEMETRY_DISABLED=1
telemetry:
enabled: false
Advanced Usage
Custom Project Directory
ktx status --project-dir /path/to/project
ktx ingest --project-dir /path/to/project
ktx sl "revenue" --project-dir /path/to/project
export KTX_PROJECT_DIR=/path/to/project
ktx status
Multiple Projects
cd ~/project-a
ktx setup
ktx ingest
cd ~/project-b
ktx setup
ktx ingest
ktx sl "revenue" --project-dir ~/project-a
ktx wiki "policy" --project-dir ~/project-b
Programmatic Query Execution
import { execSync } from 'child_process';
function ktxSearch(query: string, type: 'sl' | 'wiki'): any {
const cmd = type === 'sl'
? `ktx sl "${query}" --json`
: `ktx wiki "${query}" --json`;
const output = execSync(cmd, { encoding: 'utf-8' });
return JSON.parse(output);
}
const metrics = ktxSearch('revenue', 'sl');
const policies = ktxSearch('refund', 'wiki');
console.log('Found metrics:', metrics.length);
console.log('Found wiki pages:', policies.length);
Project Resolution
ktx finds your project in this order:
--project-dir flag
KTX_PROJECT_DIR environment variable
- Nearest
ktx.yaml in current or parent directories
- Current working directory
Always use --project-dir or KTX_PROJECT_DIR in scripts for reliability.