| name | tree-sitter-language-pack |
| description | Parse and extract code intelligence from 306 programming languages using tree-sitter grammars. Use when writing code that parses source, extracts structure/imports/exports/symbols/docstrings/comments, detects a language, runs syntax diagnostics, or produces syntax-aware chunks for LLMs — in Rust, Python, Node.js/TypeScript, or the ts-pack CLI. Covers installation, the CLI surface, the SDK surface, and parser-cache management. |
| license | MIT |
| metadata | {"author":"kreuzberg-dev","version":"0.1.0","repository":"https://github.com/xberg-io/tree-sitter-language-pack"} |
Tree-Sitter Language Pack
tree-sitter-language-pack is a polyglot code parsing and analysis library
with a high-performance Rust core and native bindings for Python,
Node.js/TypeScript, Ruby, Go, Java, C#, PHP, Elixir, and WebAssembly. It
compiles 306 tree-sitter grammars into efficient parsers and exposes code
intelligence on top of them.
Capabilities
- Parse 306 languages into concrete syntax trees (s-expression or JSON).
- Extract structure — functions, classes, methods, modules with line
and byte spans, parent nesting, and visibility.
- Extract imports and exports — statements, sources, exported kinds.
- Extract symbols — all identifiers, for search and indexing.
- Extract docstrings and comments — attached to definitions or standalone.
- Language detection — from a file path, an extension, or content.
- Syntax diagnostics — error nodes and syntax errors with positions.
- Syntax-aware chunking — split source on syntactic boundaries for LLM
context windows, not on arbitrary byte offsets.
- Custom tree-sitter queries — run your own query patterns over a tree
(SDK only).
- Parser cache management — download, list, inspect, and clean the
on-demand parser cache for offline/CI use.
- Distribution — 10 language bindings plus a WebAssembly build and the
ts-pack CLI.
Use this skill when writing code that parses source in any supported
language, extracts code metadata, chunks code for an LLM, detects a
language, or validates syntax.
When to use
Reach for tree-sitter-language-pack when you need a real syntax tree or
structured code metadata — building a code index, feeding code to an LLM
in semantically coherent chunks, linting for syntax errors across many
languages, or detecting a file's language. For plain text extraction from
documents (PDF, Office, HTML), use Kreuzberg instead.
CLI vs SDK
- CLI (
ts-pack) — quick one-shot parsing and extraction over files,
parser-cache management, and shell/CI pipelines. parse and process
print to stdout (process always emits JSON). Auto-detects the language
from the file extension.
- SDK — embed parsing in an application, run custom tree-sitter
queries, detect a language from raw content (not just a path), or hold
parsers across many calls. Use the Rust core or the Python/Node bindings.
- MCP server (
ts-pack mcp) — exposes parse/process/detect as agent
tools, so an MCP client can parse and analyze code directly with no shell-out.
This plugin auto-registers it; see the using-the-mcp-server skill.
Prefer the CLI for ad-hoc work in an agent session; prefer the SDK when the
result feeds back into a larger program; prefer the MCP server when the agent
should call parsing directly as a tool.
Installation
CLI
brew install xberg-io/tap/ts-pack
npx @kreuzberg/ts-pack-cli --help
uvx --from ts-pack-cli ts-pack --help
cargo install --git https://github.com/xberg-io/tree-sitter-language-pack ts-pack-cli
Python
pip install tree-sitter-language-pack
Node.js / TypeScript
npm install @kreuzberg/tree-sitter-language-pack
Rust
[dependencies]
tree-sitter-language-pack = { version = "1", features = ["download"] }
Other bindings: Ruby (gem install tree_sitter_language_pack), Go, Java,
C#, PHP, Elixir, and WebAssembly
(npm install @kreuzberg/tree-sitter-language-pack-wasm). See
https://github.com/xberg-io/tree-sitter-language-pack.
CLI surface
ts-pack has these subcommands:
| Command | Purpose |
|---|
parse <file> | Parse a file into a syntax tree (`--format sexp |
process <file> | Run the code-intelligence pipeline, emit JSON. |
detect* | No standalone command — parse/process auto-detect from the extension. Use the SDK for content detection. |
list | List available languages (--downloaded, --manifest, --filter). |
info <language> | Show whether a language is known and cached. |
download [langs...] | Download parser libraries (--all, --groups, --fresh). |
clean | Remove all cached parser libraries (--force). |
cache-dir | Print the effective cache directory. |
init | Write a language-pack.toml config (--languages, --cache-dir). |
completions <shell> | Generate shell completions. |
* Language detection is implicit in parse/process. There is no
detect or validate subcommand; diagnostics come from
process --diagnostics.
Parse
ts-pack parse src/main.rs
ts-pack parse src/main.rs --format json
ts-pack parse - --language python
Process (code intelligence)
ts-pack process src/app.ts
ts-pack process src/app.ts --structure --imports --symbols --docstrings
ts-pack process src/app.ts --all
ts-pack process src/app.ts --chunk-size 2000
ts-pack process src/app.ts --diagnostics
process always prints JSON. Top-level keys: language, metrics,
structure, imports, exports, comments, docstrings, symbols,
diagnostics, and chunks (when --chunk-size is set).
SDK quick start
from tree_sitter_language_pack import process, ProcessConfig, detect_language_from_path
config = ProcessConfig("python").all()
result = process(source_code, config)
for item in result["structure"]:
print(item["kind"], item["name"], item["start_line"])
lang = detect_language_from_path("src/app.py")
import { process, detectLanguageFromPath } from "@kreuzberg/tree-sitter-language-pack";
const lang = detectLanguageFromPath("src/app.ts");
const result = process(source, { language: lang ?? "typescript", structure: true, imports: true });
Parser cache
Parsers download on demand the first time a language is used and are cached
locally. For offline or CI runs, prefetch:
ts-pack download python rust typescript
ts-pack download --all
ts-pack download --groups web,systems
ts-pack list --downloaded
ts-pack cache-dir
ts-pack clean --force
See the managing-parsers skill for the full cache workflow.
Related skills
parsing-source — parse a file into a syntax tree / s-expression.
extracting-code-structure — process with structure/imports/exports/
symbols/docstrings.
chunking-for-llms — --chunk-size syntax-aware splitting.
detecting-languages — detect by path, extension, or content.
managing-parsers — download/clean/list/info parser-cache management.
using-the-mcp-server — call parse/process/detect over MCP instead of
the CLI.
Full documentation: https://github.com/xberg-io/tree-sitter-language-pack