| name | consult-llm |
| description | How to invoke the consult-llm CLI. Canonical reference for the invocation contract, flags, stdin/stdout format, and multi-turn. Load this before calling consult-llm from any workflow skill (/consult, /collab, /debate, /collab-vs, /debate-vs). |
| allowed-tools | Bash |
Reference for invoking the consult-llm CLI. Workflow skills delegate here for mechanics; they focus on orchestration.
Invocation
Run consult-llm with the prompt on stdin, using a quoted heredoc.
cat <<'__CONSULT_LLM_END__' | consult-llm -m <selector> -f src/foo.rs -f src/bar.rs
<prompt body>
__CONSULT_LLM_END__
Rules:
- Run Bash in the foreground (synchronous, no
run_in_background). Only background the call when the caller explicitly passes --background. Always set timeout: 600000 (10 minutes) — LLM calls routinely exceed the 2-minute default.
- ALWAYS use
<<'__CONSULT_LLM_END__' (quoted, with this exact terminator). The single quotes prevent shell expansion of $var, backticks, and escapes. The specific terminator __CONSULT_LLM_END__ is chosen because it won't appear in model responses — never use EOF or PROMPT which commonly appear in code samples and would silently truncate the prompt.
- Fallback to
--prompt-file <path> if the prompt contains __CONSULT_LLM_END__, or on Windows/PowerShell. Write the prompt to a temp file with $(mktemp), then pass it via consult-llm --prompt-file "$f" ….
- Stdout layout. First line is
[model:<id>] [thread_id:<id>], then a blank line, then the response body. In --web mode the prefix is just [model:<id>] (no thread).
- Multi-turn. Read
[thread_id:xxx] from line 1 and pass it back with -t <id> on the next call. Thread IDs are opaque strings — don't modify them. Not portable across backends.
- Stderr carries progress/spinner output. Ignore it.
- Exit codes.
0 success, 1 backend/network error (includes thread-not-found), 2 usage error, 3 configuration error (missing API key, unsupported backend).
Models
Selectors and allowed models resolvable in this environment (availability depends on which API keys are configured):
!`consult-llm models`
Pass a selector or an exact model ID to -m. Only enabled selectors are listed — anything not shown has no available model. For workflow skills that fan out to multiple models, use the ordered Default models list from consult-llm models when the user did not pass explicit model flags; duplicates in that list are intentional and must be preserved. For same-prompt calls, translate that list to repeated -m <model> args (the Default -m args: line is a convenience). For --run, create one --run model=<model>,prompt-file=<path> entry per default model instead; do not paste -m args into a --run invocation. For ordinary single-response CLI use, omit -m to use default_model/fallback. -m is ignored when --web is used.
Multi-model: repeat -m to consult multiple model positions in parallel (e.g. -m gemini -m openai, max 5 total runs). You may repeat the same selector/model (e.g. -m openai -m openai) to get independent calls with the same prompt. The response is a group format: first line is [thread_id:group_xxx], each model's answer under a ## Model: <id> header preceded by [model:<id>] [thread_id:<per-model-id>]. When the same resolved model appears more than once, only those duplicate sections use ## Model: <id>#K and [model:<id>#K] labels. Pass -t group_xxx to resume all group positions together on the next turn; pass an individual per-model thread ID with a single -m <model> to resume just that model outside the group context.
Task modes
Pick a --task mode based on the kind of question. Omit for neutral general-purpose.
| Mode | When to use |
|---|
general (default) | Neutral prompt. Defers to instructions in the prompt body. Use for open questions. |
review | Critical code reviewer — bugs, security issues, quality problems. |
debug | Root-cause troubleshooter from errors/logs/stack traces. Ignores style. |
plan | Constructive architect — explore trade-offs, design solutions. Always ends with a recommendation. |
create | Generative writer for docs, content, or design output. |
Web mode
--web copies the formatted prompt (system prompt + user prompt + file context) to the clipboard and exits 0 instead of calling an LLM. Only use when the user specifically asks for browser/web mode. After invoking, wait for the user to paste the external LLM's response back — do not continue implementation on your own. -m is ignored in this mode.
Prompt authoring
Ask neutral, open-ended questions. Do not suggest specific solutions in the prompt body — that biases the analysis. Let the LLM form its own view.
Flags
| Flag | Purpose |
|---|
-m, --model <selector|id> | See "Models" above. Usually omit. |
-f, --file <path> | Repeatable. File context — path + code block. |
-t, --thread-id <id> | Resume a multi-turn conversation. See "Multi-turn". |
--task <mode> | Persona. See "Task modes" above. |
--web | Clipboard mode. See "Web mode" above. |
--prompt-file <path> | Read prompt from file instead of stdin. |
--diff-files <path> | Repeatable. Include git diff for this file as context. |
--diff-base <ref> | Base ref for diff (default HEAD — shows uncommitted changes). |
--diff-repo <path> | Repo path (default cwd). |
--run <spec> | Per-model run. See "Per-model runs" below. |
Run consult-llm --help for the authoritative flag list.
File context (-f) best practices
The consulted LLM has no access to your conversation history. Anything
it needs - source files, logs, command output, traces, timelines,
error messages - must be attached with -f.
- Include conversation artifacts. If the current session already
produced diagnostic output relevant to the question (log excerpts,
traces, reproduction steps, command output), attach it as a temp
file. Prefer raw evidence over prose summaries when both exist.
- Re-run the original command piping to a temp file
(
cmd > /tmp/artifact.txt) instead of writing output from memory.
This is cheaper, faster, and preserves the exact output.
- Source files and diagnostic artifacts are both first-class
-f
inputs. Do not limit context gathering to source code.
Per-model runs
Use --run when a workflow needs to query multiple models in parallel with different prompt bodies. Do not use it for ordinary multi-model calls where the same prompt goes to every model — repeat -m for that.
GEMINI_PROMPT=$(mktemp)
CODEX_PROMPT=$(mktemp)
cat <<'__CONSULT_LLM_END__' >| "$GEMINI_PROMPT"
[prompt for Gemini]
__CONSULT_LLM_END__
cat <<'__CONSULT_LLM_END__' >| "$CODEX_PROMPT"
[prompt for Codex]
__CONSULT_LLM_END__
consult-llm \
--run "model=gemini,prompt-file=$GEMINI_PROMPT" \
--run "model=openai,prompt-file=$CODEX_PROMPT"
consult-llm \
--run "model=gemini,thread=$GEMINI_THREAD,prompt-file=$GEMINI_PROMPT" \
--run "model=openai,thread=$CODEX_THREAD,prompt-file=$CODEX_PROMPT"
consult-llm \
--run "model=openai,prompt-file=$PROMPT_A" \
--run "model=openai,prompt-file=$PROMPT_B"
Each --run value accepts model=<selector-or-id>, prompt-file=<path>, and optionally thread=<id>. Use mktemp for temporary prompt files and always use __CONSULT_LLM_END__ as the heredoc terminator. Use >| to overwrite temp files in zsh (avoids noclobber errors).
Constraints: max 5 total runs, cannot combine with -m/-t/--prompt-file/--web, duplicate resolved models are allowed, duplicate explicit thread=<id> values are rejected, thread=group_* is rejected because --run uses per-run thread IDs, shared -f and --diff-* context applies to every run, prompt-file paths with commas are unsupported.
Output is the same group format as multi-model -m calls. Extract per-run thread IDs from each section header for subsequent --run thread=... turns.