Run any Skill in Manus with one click

chunking-for-llms

Stars5

Forks0

UpdatedJune 20, 2026 at 10:09

Use when the user wants to split source code into chunks for an LLM context window without breaking syntax mid-construct. Covers `ts-pack process --chunk-size`, why syntax-aware splits beat fixed-byte splits, picking a size, and the chunk JSON shape.

Installation

Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.

Run Skill in Manus

Source

xberg-io

xberg-io/plugins

View GitHub Repository View Creator Repositories

Download

Run Skill in Manus

SKILL.md

readonly

name	chunking-for-llms
description	Use when the user wants to split source code into chunks for an LLM context window without breaking syntax mid-construct. Covers `ts-pack process --chunk-size`, why syntax-aware splits beat fixed-byte splits, picking a size, and the chunk JSON shape.

Syntax-aware chunking for LLMs

Splitting code on a fixed byte or line count cuts functions in half and strips context. ts-pack process <file> --chunk-size <bytes> splits on syntactic boundaries (whole functions, classes, blocks) so each chunk is a coherent unit, and emits them in the JSON chunks array.

Quick recipe

# ~2 KB chunks aligned to syntax boundaries
ts-pack process src/app.ts --chunk-size 2000

--chunk-size is a maximum size in bytes. The splitter packs whole syntactic units up to that bound; an oversized single construct becomes its own chunk rather than being cut. Chunks are added to the normal process JSON output under chunks.

Picking a size

Match the downstream model's token budget. A rough rule: bytes ÷ 4 ≈ tokens for code, so --chunk-size 4000 is on the order of ~1k tokens.
Larger chunks preserve more local context but fit fewer per request.
Leave headroom for the prompt, the surrounding messages, and the response — do not size chunks to the full context window.

Combining with extraction

Chunking composes with the other process features, so you can attach structure metadata to each request:

ts-pack process src/service.py --structure --chunk-size 3000 \
  | jq '{chunks: (.chunks | length), functions: (.structure | length)}'

Chunk output

chunks is a list of code-chunk objects in the process JSON. Each chunk carries its source text plus span information (line/byte offsets), so you can cite or re-locate a chunk back in the original file. Iterate the array to feed an LLM one coherent unit at a time:

ts-pack process big_module.py --chunk-size 2500 \
  | jq -c '.chunks[]'

SDK equivalent

The SDK exposes chunking through the process config. Use with_chunking to set the maximum chunk size — the config field chunk_max_size is read-only (the CLI flag is --chunk-size):

from tree_sitter_language_pack import process, ProcessConfig

config = ProcessConfig("python").with_chunking(2500)
result = process(source_code, config)
for chunk in result["chunks"]:
    send_to_llm(chunk)

When not to chunk

For a single small file that already fits the context window, skip chunking and pass the file whole. Reach for chunking when a file is large, when you are batching many files into a RAG index, or when you need stable, syntactically coherent units to cite.

More from this repository

same repository

automating-the-browser

xberg-io/plugins

Use when extracting a page needs scripted interaction first — click, type, press a key, scroll, wait, screenshot, or run JS before capturing the DOM. Covers `crawlberg interact <url> --actions` with the real action schema, result shape, limits, and external-CDP options.

2026-06-255

crawlberg

xberg-io/plugins

Crawl, scrape, and convert websites to Markdown using the local crawlberg CLI and its MCP server. Use when the user wants to fetch a page, follow links across a domain, enumerate URLs, or drive a real browser. Covers installation, the subcommands (scrape, crawl, map, interact, mcp, serve), output formats (JSON + Markdown), browser fallback, and when to prefer the MCP server over shelling out.

2026-06-255

crawling-a-site

xberg-io/plugins

Use when the user wants to follow links across a domain and capture every reachable page as Markdown. Covers `crawlberg crawl` with depth, page caps, concurrency, rate limiting, domain scoping, robots, and output selection.

2026-06-255

headless-fallback

xberg-io/plugins

Use when a static fetch returns nothing useful and the page needs a real browser. Covers `--browser-mode auto|always|never`, external CDP via `--browser-endpoint`, symptoms of JS-only pages and WAF blocks, and the performance cost.

2026-06-255

mapping-urls

xberg-io/plugins

Use when the user wants the list of URLs on a site rather than the page content — sitemap analysis, link planning, or seeding another tool. Covers `crawlberg map <url>` with `--limit`, `--search`, robots, output, and how it differs from a full crawl.

2026-06-255

scraping-html-to-markdown

xberg-io/plugins

Use when the user wants a single page rendered as clean Markdown plus structured metadata. Covers `crawlberg scrape <url>`, JSON vs Markdown output, what metadata is returned, and how to handle JS-heavy pages.

2026-06-255

name	chunking-for-llms
description	Use when the user wants to split source code into chunks for an LLM context window without breaking syntax mid-construct. Covers `ts-pack process --chunk-size`, why syntax-aware splits beat fixed-byte splits, picking a size, and the chunk JSON shape.

Syntax-aware chunking for LLMs

Quick recipe

# ~2 KB chunks aligned to syntax boundaries
ts-pack process src/app.ts --chunk-size 2000

Picking a size

Match the downstream model's token budget. A rough rule: bytes ÷ 4 ≈ tokens for code, so --chunk-size 4000 is on the order of ~1k tokens.
Larger chunks preserve more local context but fit fewer per request.
Leave headroom for the prompt, the surrounding messages, and the response — do not size chunks to the full context window.

Combining with extraction

Chunking composes with the other process features, so you can attach structure metadata to each request:

ts-pack process src/service.py --structure --chunk-size 3000 \
  | jq '{chunks: (.chunks | length), functions: (.structure | length)}'

Chunk output

ts-pack process big_module.py --chunk-size 2500 \
  | jq -c '.chunks[]'

SDK equivalent

The SDK exposes chunking through the process config. Use with_chunking to set the maximum chunk size — the config field chunk_max_size is read-only (the CLI flag is --chunk-size):

from tree_sitter_language_pack import process, ProcessConfig

config = ProcessConfig("python").with_chunking(2500)
result = process(source_code, config)
for chunk in result["chunks"]:
    send_to_llm(chunk)