mit einem Klick
llmx-guide
// Critical gotchas when calling llmx from Python or Bash. Non-obvious bugs and incompatibilities. Use when writing code that calls llmx, debugging llmx failures, or choosing llmx model/provider options.
// Critical gotchas when calling llmx from Python or Bash. Non-obvious bugs and incompatibilities. Use when writing code that calls llmx, debugging llmx failures, or choosing llmx model/provider options.
[HINT] Laden Sie das komplette Skill-Verzeichnis einschließlich SKILL.md und aller zugehörigen Dateien herunter
| name | llmx-guide |
| description | Critical gotchas when calling llmx from Python or Bash. Non-obvious bugs and incompatibilities. Use when writing code that calls llmx, debugging llmx failures, or choosing llmx model/provider options. |
| user-invocable | true |
| argument-hint | [model name or issue description] |
| effort | medium |
Most agents should not call llmx directly for normal repo automation. Use the shared wrapper first:
uv run python3 ~/Projects/skills/scripts/llm-dispatch.py \
--profile fast_extract \
--context context.md \
--prompt "Analyze this" \
--output result.md
If the context was built by the shared packet layer, pass its manifest too:
uv run python3 ~/Projects/skills/scripts/llm-dispatch.py \
--profile fast_extract \
--context context.md \
--context-manifest context.manifest.json \
--prompt "Analyze this" \
--output result.md
Use this skill when:
llmx.api.chat()Agent note: the repo hook blocks raw llmx chat-style Bash automation. The CLI examples below are for manual terminal debugging or maintainer reference, not for normal agent execution through the Bash tool.
Detail files in
references/: models.md | error-codes.md | transport-routing.md | codex-dispatch.md | subcommands.md
claude-sonnet-4-6 not claude-sonnet-4.6)--timeout 600 or --stream. Max allowed: 900s. If dispatching from an agent shell, set the outer shell timeout above this (for Claude Code, use at least 1200000 ms).shell=True? Don't — parentheses in prompts break it. Use list args + input=-o FILE? Never use > file shell redirects — they buffer until exitgemini-3.1-pro-preview not gemini/gemini-3.1-pro-preview.google prefers gemini CLI (free). Gemini falls back to API for: --schema, --search, --stream, --max-tokens. Codex CLI also falls back for --search and --stream, but can keep --schema via codex exec --output-schema. GPT goes direct to API unless you explicitly force -p codex-cli.-p is PROVIDER. llmx chat -m gpt-5.4 -f context.md "Analyze this" — prompt goes LAST as a bare string. -p means --provider (openai, google, codex-cli), NOT prompt. Using -p "long text..." sends the text as a provider name → "Unknown provider" error. Context goes in -f, system message in -s. Two -f flags with no positional prompt = model invents a task from context. (Evidence: 2026-04-05 — Gemini hallucinated; 2026-04-12 — 4 consecutive failures from -p misuse.)-f has recurring failure modes with Gemini/CLI transport, including silently dropping earlier files. Pre-concatenate first, but preserve file boundaries in the combined file.Never swap to a weaker model as a "fix." The problem is the dispatch, not the model.
--debug on a small prompt--timeout, add --stream, reduce context, check API key--debug smoke test before assuming CLI vs API routing from docs or memory.See error-codes.md for full exit code table and Python patterns.
No --stream needed for Gemini. Without it, llmx routes through CLI (free tier). Add --stream only if CLI hits rate limits (forces paid API fallback).
# FREE — routes through Gemini CLI:
llmx chat -m gemini-3.1-pro-preview -f context.md --timeout 300 "Review this"
# FORCES API (costs money) — only use if CLI rate-limited:
llmx chat -m gemini-3.1-pro-preview -f context.md --timeout 300 --stream "Review this"
What still forces API: --max-tokens (CLI caps at 8K), --schema, --search, --stream.
-f Is Not Reliable Enough For Critical Review FlowsIf the task is high-stakes or review-oriented, do this:
awk 'FNR==1{print "\n# File: " FILENAME "\n"}1' overview.md diff.md touched-files.md > combined-context.md
llmx chat -m gemini-3.1-pro-preview -f combined-context.md --timeout 300 "Review this"
Do not assume this is equivalent:
llmx chat -m gemini-3.1-pro-preview -f overview.md -f diff.md -f touched-files.md --timeout 300 "Review this"
Known failure mode: earlier -f files may be silently dropped or incompletely
forwarded. This is acceptable for casual exploration, not for plan-close or
adversarial review.
GPT-5.4 with reasoning burns time BEFORE producing output. Non-streaming holds the connection idle during reasoning. Default timeout: 300s. Max: 900s (hard cap). GPT-5.4 xhigh on domain-heavy prompts can exceed 900s; for those, chunk the task, stream, or switch to an async/batch path if available. Do not punt operational work to a GUI tool.
max_completion_tokens includes reasoning tokens. If you set --max-tokens 4096 on GPT-5.4 with reasoning, the model may exhaust the budget on thinking. Use 16K+ for reasoning models.
-o FILE, Never > file# CORRECT — llmx writes the output file itself:
llmx -m gpt-5.4 -f context.md --timeout 600 -o output.md "query"
# BROKEN — 0 bytes until exit:
llmx -m gpt-5.4 "query" > output.md
-o does not imply --stream. Current llmx preserves the requested transport and writes the returned result itself when needed. If the file is still 0 bytes, llmx emits [llmx:WARN] to stderr.
Agent background mode: Claude Code's run_in_background captures stdout in its own task file. Shell redirects (> file) produce 0 bytes in background mode. Always use -o for background llmx calls. Read the -o file after the task-complete notification, not before.
For GPT specifically:
llmx -m gpt-5.4 routes to the OpenAI API in current llmx-o preserves that transport; it does not force a transport switch-p codex-cli, diagnose any failure from stderr and output size, not shell exit aloneIf you need to verify the actual route, run:
llmx chat -p codex-cli -m gpt-5.4 --debug -o /tmp/probe.txt "Reply with exactly OK."
Then inspect the debug line for transport.
These are bad diagnostic patterns:
llmx chat -m gpt-5.4 "query" 2>/dev/null | head -200
llmx chat -m gpt-5.4 "query" | sed -n '1,80p'
Why:
2>/dev/null discards llmx's real diagnosticsset -o pipefail, the shell returns the last consumer's exit code (head, sed), not llmx'sSafer pattern:
set -o pipefail
llmx chat -m gpt-5.4 --debug -o /tmp/review.md "query" 2> /tmp/review.err
echo $?
tail -n 200 /tmp/review.err
sed -n '1,80p' /tmp/review.md
From Claude Code: set Bash tool timeout: 1200000 (20 min) — it must exceed llmx's --timeout.
# BREAKS if prompt has ():
subprocess.run(f'echo {repr(prompt)} | llmx ...', shell=True)
# CORRECT — always use list args:
subprocess.run(['llmx', '--provider', 'google'], input=prompt, capture_output=True, text=True)
gemini-3-flash -- missing -previewgemini-flash-3 -- wrong ordergpt-5.3 -- needs -chat-latest suffixclaude-sonnet-4.6 -- dots, needs hyphensgrok-4.20-reasoning -- needs full -0309- snapshot suffix: grok-4.20-0309-reasoning. Not in llmx's _RECOMMENDED_MODELS — pass full name explicitly via -m.See models.md for full model table, token limits, and reasoning effort values.
Use -p xai (env: XAI_API_KEY or GROK_API_KEY). xAI is OpenAI-SDK-compatible at https://api.x.ai/v1.
--reasoning-effort errors out on grok-4.20-0309-reasoning. The model reasons automatically. Strip the flag from any wrapper before dispatching to xAI.reasoning.effort — on grok-4.20-multi-agent-0309, effort=low|medium→4 agents, effort=high|xhigh→16 agents. Selects agent count, not depth. Cost scales with agent count.Also: logprobs is silently ignored. xAI web search not yet supported via OpenAI SDK (llmx provider config explicitly warns and ignores --search for xai) — use Exa/Perplexity/Brave for grounding instead.
# Correct invocation (as of 2026-04-16)
llmx chat -p xai -m grok-4.20-0309-reasoning -f context.md --timeout 600 -o out.md "Verify these claims"
# WRONG — errors:
llmx chat -p xai -m grok-4.20-0309-reasoning --reasoning-effort high "..."
# WRONG — uses obsolete default model:
llmx chat -p xai "..." # llmx default is still `grok-4`, superseded by 4.20 family
| Provider | Default transport | Forces API fallback |
|---|---|---|
google | Gemini CLI (free) | --schema, --search, --stream, --max-tokens |
openai | OpenAI API | explicit -p codex-cli if you want Codex CLI instead |
claude | Claude CLI | v0.6.0+, non-nested contexts only |
Both CLIs ignore explicit --reasoning-effort — they use their own defaults. See transport-routing.md for CLI vs API decision table, context budget, piping patterns.
-o Gotcha (Parallel Dispatch)-o FILE captures the agent's last text message only. If the agent spends all turns on tool calls with no final text response, -o writes 0 bytes. Prompt must include: "End with a COMPLETE markdown report as your final message." Without this, ~50% produce empty output.
See codex-dispatch.md for full parallel dispatch pattern, Brave contention, Perplexity quota.
llmx research, llmx image, llmx vision, llmx svg. Flags: --fast (Flash+low), --use-old, --no-thinking. See subcommands.md.
$ARGUMENTS