mit einem Klick
sf-knowledge-repo-development
// How to develop, extend, and maintain the SF Documentation Knowledge System
// How to develop, extend, and maintain the SF Documentation Knowledge System
| name | sf-knowledge-repo-development |
| description | How to develop, extend, and maintain the SF Documentation Knowledge System |
This is a TypeScript (ESM) project that collects Salesforce documentation from
developer.salesforce.com, processes it into structured knowledge, and serves it
to LLM agents via Context Engineering (curated Markdown files) and
MCP Server (on-demand tool access).
No traditional RAG — no embeddings, no vector stores, no blind chunking.
See docs/architecture.md for the full system design.
src/
├── config/ # Domain definitions & release schedule
├── collectors/ # Fetch raw docs from Salesforce
├── processors/ # Transform raw HTML → tagged Markdown
├── generators/ # Produce knowledge files, skills, docs
├── mcp/ # MCP Server (Model Context Protocol)
├── cli/ # CLI entry points
└── utils/ # Logger, hash, shared utilities
scripts/ # Utility scripts, test runners, and bulk collection tools
knowledge/ # Git-tracked output (curated .md files)
skills/ # Generated per-domain SKILL.md files
data/ # Intermediate data (gitignored)
Each Salesforce documentation area (CLI, Revenue Cloud, Apex, etc.) is a "domain"
defined in src/config/domains.ts. Each domain has:
id (e.g. cli-commands)atlas deliverable identifier for the SF docs APIP0, P1, P2)Discover (index API) → Collect (fetch raw) → Process (HTML→MD + tag) → Generate (knowledge files + graph)
Each step is a separate CLI command. Use --discover on collect/process/generate to automatically handle all 121+ SF deliverables.
knowledge/current/<domain>/_index.md — deduplicated routing table for LLMs (one row per file, with descriptions)knowledge/current/<domain>/<topic>.md — self-contained topic with full page-level documentationknowledge/current/graph.json — semantic knowledge graph (53k+ nodes, 450k+ edges)title, domain, topic, apiVersion, release, docType, namespace, estimatedTokens, keywordsThe graph connects documents with semantic edges:
references — doc → doc cross-references (52k+ edges)belongs_to_namespace — doc → Apex namespace (System, ConnectApi, etc.)belongs_to_service — domain → service category (analytics, commerce, etc.)is_type — doc → docType (api-reference, developer-guide, concept, etc.)tagged_with — doc → keywordAdd domain config to src/config/domains.ts:
{
id: 'my-domain',
name: 'My Domain',
priority: 'P1',
atlas: 'atlas.en-us.my_domain_guide',
description: 'What this covers',
tags: ['my-tag'],
}
Collect and verify:
npm run build
npm run collect -- --domain my-domain
Process and generate:
npm run process -- --domain my-domain
npm run generate -- --domain my-domain
Check output in knowledge/current/my-domain/
Create tool file in src/mcp/tools/:
export const myTool = {
name: "my_tool",
description: "What this tool does",
parameters: {
/* JSON Schema */
},
handler: async (params) => {
/* return result */
},
};
Register in src/mcp/server.ts
Test: npm run mcp:dev
# Build
npm run build # Compile TypeScript
# Discovery
npm run discover # List all available SF documentation deliverables
# Pipeline (single domain)
npm run collect -- -d cli # Collect specific domain
npm run process -- -d cli # Process specific domain
npm run generate -- -d cli # Generate specific domain
# Pipeline (full — all 129 domains)
npm run collect -- --discover # Auto-discover + collect all
npm run process -- --discover # Process all collected
npm run generate -- --discover # Generate all + rebuild graph
# Graph
npm run graph:stats # Print graph statistics
# MCP Server
npm run mcp:dev # Start MCP server (dev)
# Testing
npm test # Run all tests
npm run test:watch # Watch mode
The MCP server supports restricting all tools to specific documentation domains, reducing noise when working on a specific Salesforce product area.
SF_ACTIVE_DOMAINS=revenue-cloud,apex-guide (comma-separated)sf_set_active_domains(domains: ["revenue-cloud", "apex-guide"])sf_suggest_domains("contract lifecycle management") suggests relevant domain IDssf_set_active_domains(clear: true) removes all restrictionsactiveDomains is a mutable Set<string> | null in src/mcp/server.tsresolveEffectiveDomains(perCallDomain?) merges global + per-call filtersfilterResultsByActiveDomains() post-filters graph results by nodeId prefixgq.clearCache() is called when active domains change (LRU cache: 200 entries, 5-min TTL)domains?: string[] in where clause for efficient pre-filteringsf_apex_lookup and sf_object_reference are hardcoded to specific domains — they warn if those domains are not in the active setsf_read_topic shows a gentle note but still allows reads outside the active setsf_limits uses hardcoded data and is not affected by domain restrictionsrc/mcp/server.ts — Main server: env var parsing, helpers, tool integrationssrc/utils/graph-query.ts — SearchOptions.domains, clearCache()src/utils/search-engine.ts — SearchQuery.domains for Oramasrc/mcp/code-index.ts — CodeIndex.search() domains optiondocs/domains.md — Comprehensive domain reference by service category"type": "module")console.log in library code.js extensions in imports (ESM requirement)data/ (gitignored), knowledge output in knowledge/ (tracked)Apply Salesforce knowledge and best practices for SOAP API Developer Guide
Apply Salesforce knowledge and best practices for Communications, Media, Energy and Utilities (CME) Developer Guide
Apply Salesforce knowledge and best practices for Marketing Cloud API
Apply Salesforce knowledge and best practices for Marketing Cloud Package Development
Apply Salesforce knowledge and best practices for Get Started, Marketing Cloud Developers
Apply Salesforce knowledge and best practices for Programmatic Marketing Content