| name | rlm |
| description | Process files too large to fit in context (>100KB, >2000 lines). Uses Python REPL for structural analysis, LLM queries for semantic reasoning, and subagents for final synthesis. Triggers - large file, big document, massive log, full codebase, entire repo, long transcript, context window exceeded. |
rlm (Recursive Language Model)
Core Principle
Large content stays in the REPL environment, not in your context.
The REPL holds the full file in memory. You write Python to analyze it. Only your print() output returns to your context—never raw file content.
When to Use
- Files >100KB or >2000 lines
- Need to query the same large file multiple times in a session
- Structural or semantic analysis of logs, transcripts, codebases
Quick Start
python3 ~/skills/rlm/scripts/rlm_repl.py init <file>
python3 ~/skills/rlm/scripts/rlm_repl.py --state <state_path> exec -c "
hits = grep('ERROR')
print(f'Found {count(hits)} errors')
for item in expand(hits, limit=5):
print(item['snippet'])
"
The Escalation Ladder
Level 1: REPL Analysis (Default)
Use for: Pattern matching, structure extraction, aggregation, JSON parsing.
Context cost: Only your print() output returns.
python3 ~/skills/rlm/scripts/rlm_repl.py --state <path> exec <<'PY'
import json
lines = content.split('\n')
print(f'Total lines: {len(lines)}')
hits = grep_raw('error|exception', max_matches=50)
print(f'Found {len(hits)} error lines')
for i, line in enumerate(lines[:10]):
if line.strip():
data = json.loads(line)
print(f"Line {i}: type={data.get('type')}")
sizes = [(len(line), i) for i, line in enumerate(lines)]
sizes.sort(reverse=True)
print(f'Biggest lines: {sizes[:5]}')
PY
When Level 1 is sufficient:
- Finding all occurrences of a pattern
- Counting/sizing content
- Extracting fields from structured data (JSON, logs)
- Computing statistics
Level 2: REPL + llm_query()
Use for: Semantic reasoning where you need LLM judgment, but want results to stay in the REPL.
Context cost: Only your print() output returns. The LLM call happens in a subprocess.
python3 ~/skills/rlm/scripts/rlm_repl.py --state <path> exec <<'PY'
errors = grep_raw('ERROR', max_matches=10)
for err in errors[:3]:
snippet = err['snippet'][:2000]
result = llm_query(f"Classify this error as critical/warning/info:\n{snippet}")
print(f"Line {err['line_num']}: {result}")
add_buffer(result)
PY
Batch processing:
python3 ~/skills/rlm/scripts/rlm_repl.py --state <path> exec <<'PY'
chunks = list(Path(session_dir / 'chunks').glob('chunk_*.txt'))
prompts = [f"Summarize:\n{c.read_text()[:50000]}" for c in chunks[:10]]
results, failures = llm_query_batch(prompts, concurrency=5)
for i, result in enumerate(results):
if "[ERROR:" not in result:
add_buffer(f"Chunk {i}: {result}")
print(f"Chunk {i}: {result[:100]}...")
PY
When to use Level 2:
- Classifying or categorizing content
- Summarizing sections
- Semantic search ("find discussions about X")
- Any task requiring judgment, not just pattern matching
Level 3: Subagent Synthesis
Use for:
- Final answer generation after accumulating findings
- Protecting main context when you'll query the same file many times
- When synthesis itself is complex enough to warrant a fresh context
Context cost: ~5KB max per subagent (enforced by max-output-chars).
{
"agent": "rlm-subcall",
"task": "Query: <user's question>\nChunk file: /absolute/path/to/chunk_0001.txt"
}
For final synthesis of accumulated buffers:
python3 ~/skills/rlm/scripts/rlm_repl.py --state <path> export-buffers > findings.txt
{
"agent": "rlm-subcall",
"task": "Synthesize these findings into a structured report:\n$(cat findings.txt)"
}
Parallel chunk analysis (when Level 2 isn't sufficient):
{
"tasks": [
{"agent": "rlm-subcall", "task": "Query: Find security issues\nChunk file: /path/chunk_0000.txt"},
{"agent": "rlm-subcall", "task": "Query: Find security issues\nChunk file: /path/chunk_0001.txt"}
]
}
Limits:
- Max 8 parallel tasks per batch
- Expected output: ~2KB per chunk (JSON)
- Total subagent returns should stay under 400KB
Decision Tree
Is this a structural query? (find X, count Y, extract fields, parse JSON)
└─ YES → Level 1: REPL
└─ NO ↓
Do I need LLM judgment? (classify, summarize, interpret meaning)
└─ YES → Does it need to return to my context immediately?
└─ NO → Level 2: llm_query() in REPL
└─ YES → Level 3: Subagent
└─ NO → Level 1: REPL
Am I synthesizing final results from accumulated findings?
└─ YES → Level 3: Subagent (protects your context for future queries)
Will I query this file multiple times in this session?
└─ YES → Prefer Levels 1-2 (keep main context free for multiple queries)
REPL Reference
Initialization
python3 ~/skills/rlm/scripts/rlm_repl.py init <context_path>
python3 ~/skills/rlm/scripts/rlm_repl.py --state <path> status
python3 ~/skills/rlm/scripts/rlm_repl.py --state <path> reset
Environment Variables (available in exec)
content - Full file content as string
state_path - Path to state.pkl
session_dir - Path to session directory
buffers - List of accumulated text
Content Exploration
| Function | Returns | Description |
|---|
peek(start, end) | str | View slice of raw content |
grep(pattern) | handle | Regex search, returns handle stub |
grep_raw(pattern) | list[dict] | Raw results with line_num, snippet |
write_chunks(out_dir) | list[str] | Write chunks to disk |
add_buffer(text) | None | Accumulate text for synthesis |
Handle System
| Function | Returns | Description |
|---|
count(handle) | int | Count items |
expand(handle, limit) | list | Materialize items |
filter_handle(handle, pattern) | handle | Filter results |
last_handle() | str | Most recent handle name |
LLM Queries (Level 2)
| Function | Returns | Description |
|---|
llm_query(prompt) | str | Single LLM call in subprocess |
llm_query_batch(prompts) | (list, dict) | Parallel calls (max 5 concurrent) |
Finalization
| Function | Description |
|---|
set_final_answer(value) | Mark JSON-serializable result |
export-buffers (CLI) | Dump accumulated buffers |
Chunking (when needed)
For very large files where you need to process sections:
python3 ~/skills/rlm/scripts/rlm_repl.py --state <path> exec <<'PY'
paths = write_chunks(str(session_dir / 'chunks'), size=200000)
print(f"Created {len(paths)} chunks")
PY
cat <session_dir>/chunks/manifest.json
Use manifest hints to skip irrelevant chunks before processing.
Context Protection
Budget: Assume 200K tokens (~800KB). Reserve:
- 50K for system prompt
- 50K for your reasoning
- 100K for tool returns (~400KB)
Warning signs:
- Single subagent returned >10KB → it misbehaved
- Total returns >400KB → stop and synthesize
If overwhelmed:
- Stop dispatching more subagents
- Synthesize from what you have
- Use smaller batches (4 instead of 8)
Anti-Patterns
❌ Reading full chunks into your context with read tool
❌ Jumping straight to subagents for structural queries
❌ Dispatching subagents before exploring with REPL
❌ Ignoring manifest hints (processing all chunks blindly)
✅ Always start with REPL exploration
✅ Use grep to find relevant sections first
✅ Escalate only when semantic reasoning is needed
✅ Use subagents for synthesis, not initial analysis