Run any Skill in Manus with one click

Get Started

llm-council

Query multiple LLM models in parallel from CodeAct and cross-reference their responses

Run Skill in Manus

Overview

Query multiple LLM models in parallel from CodeAct and cross-reference their responses

Install command

npx skills add https://github.com/nearai/ironclaw --skill llm-council

Copy and paste this command into Claude Code to install the skill

Source

nearai/ironclaw

Stars12,395

Forks1,443

UpdatedApril 11, 2026 at 06:23

SKILL.md

readonly

name	llm-council
version	1.0.0
description	Query multiple LLM models in parallel from CodeAct and cross-reference their responses
activation	{"keywords":["council","compare models","multiple models","cross-reference","second opinion","opinions","consensus","different models","model comparison","diverse perspectives","ask several","ask multiple","vote"],"exclude_keywords":["routine","schedule"],"patterns":["(?i)(ask\|query\|consult\|compare)\\s.(models\|llms\|ais)","(?i)(what do\|how do)\\s.(different\|other\|multiple)\\s.(models\|llms\|ais)\\s(think\|say)","(?i)council","(?i)cross[- ]?referenc"],"tags":["llm","analysis","research"],"max_context_tokens":1200}

LLM Council

You can query multiple LLM models with the same prompt directly from CodeAct using the built-in llm_query() and llm_query_batched() functions. Both accept a model= (or models=) keyword that overrides the configured model for that call.

Per-request model override support varies by backend:

Backend	Honors `model=`?	Cross-vendor routing?
NEAR AI	Yes	Yes (aggregator — hosts models from many vendors)
Anthropic OAuth	Yes	No (Anthropic models only)
GitHub Copilot	Yes	No (Copilot-exposed models only)
Bedrock	No	— (model fixed at construction)
OpenAI / Ollama / Tinfoil via rig	No (silent fallback with warning log)	—

A genuine cross-vendor council (Anthropic + Google + OpenAI in one batch) therefore only works on an aggregator backend like NEAR AI. On single-vendor backends, use a lineup of models available within that vendor.

When to use a council

The user wants diverse perspectives on a question or analysis
Cross-referencing answers to increase confidence
Comparing reasoning approaches across models
Getting a "second opinion" from different AI models
Research or evaluation tasks that benefit from multiple viewpoints

Default council line-up

Check the configured backend first (e.g. from LLM_BACKEND or the user's settings) before picking a lineup. Unless the user requests specific models, use the matching default below.

NEAR AI (aggregator — default council):

COUNCIL = [
    "anthropic/claude-opus-4-6",
    "google/gemini-3-pro",
    "zai-org/GLM-latest",
    "openai/gpt-5.4",
]

This 4-model lineup spans the major frontier providers and reasoning styles. It only works on NEAR AI (or another aggregator) — the prefixed model names route inside NEAR AI to the respective vendors.

Anthropic OAuth (Anthropic-only, no cross-vendor routing):

COUNCIL = [
    "claude-opus-4-6",
    "claude-sonnet-4-6",
    "claude-haiku-4-5",
]

Use different Anthropic tiers for diversity of reasoning depth vs. speed.

GitHub Copilot (whatever Copilot exposes):

COUNCIL = ["gpt-5.4", "claude-opus-4-6", "gemini-3-pro"]

Copilot's available models shift over time — call llm_query with the user's configured default if you're unsure which are reachable.

Bedrock / OpenAI / Ollama / Tinfoil: per-request model= is not honored by these backends. A council is not possible without switching backends — tell the user and fall back to a single-model answer.

If the user names specific models, always use those instead of the defaults.

API

Single call with a model override

answer = llm_query(
    prompt="What is X?",
    context="Optional background",     # optional
    model="anthropic/claude-opus-4-6", # optional per-call override
)

Parallel council (same prompt, many models)

COUNCIL = [
    "anthropic/claude-opus-4-6",
    "google/gemini-3-pro",
    "zai-org/GLM-latest",
    "openai/gpt-5.4",
]
responses = llm_query_batched(
    prompts=["What are the main risks of X?"] * len(COUNCIL),
    models=COUNCIL,                    # parallel array, length must match prompts
    context="Answer in 3-5 bullet points.",
)
# `responses` is a list of strings in the same order as `models`.
# If a specific model is unavailable, that slot returns "Error: ..." —
# the rest of the batch still completes.

Single model applied to many prompts

results = llm_query_batched(
    prompts=["Q1", "Q2", "Q3"],
    model="anthropic/claude-opus-4-6", # singular: applies to every prompt
)

Mixing `models=` slots with `None`

A None slot inside models=[...] means "no override for this prompt" — that call uses the configured default model. The singular model= kwarg does NOT backfill None slots; it is only used when models= is omitted entirely.

After collecting responses

Identify consensus — note where models agree
Flag disagreements — analyze where they diverge and why
Synthesize — produce a unified answer that accounts for all perspectives
Cite — reference which model contributed each insight

Full example

COUNCIL = [
    "anthropic/claude-opus-4-6",
    "google/gemini-3-pro",
    "zai-org/GLM-latest",
    "openai/gpt-5.4",
]
question = "What are the main risks of relying on a single LLM provider?"

responses = llm_query_batched(
    prompts=[question] * len(COUNCIL),
    models=COUNCIL,
    context="Answer concisely in 3-5 bullet points.",
)

# Build a synthesis prompt
labelled = "\n\n---\n\n".join(
    f"**{model}**:\n{resp}" for model, resp in zip(COUNCIL, responses)
)

synthesis = llm_query(
    prompt=f"Synthesize a balanced answer from these {len(COUNCIL)} expert opinions:\n\n{labelled}",
    context="Identify consensus, flag disagreements, and produce a unified answer.",
)

FINAL(synthesis)

Notes

llm_query_batched() runs all calls in parallel — total latency is roughly the slowest model.
If models= is provided, it must be the same length as prompts.
Use model= (singular) when you want one model applied to every prompt; use models= (plural list) for the council pattern.
Errors from individual models (unavailable model, provider timeout, etc.) surface as "Error: ..." strings in the result list, so a single bad model does not fail the whole batch. Argument validation errors — wrong types, or models/prompts length mismatch — still raise exceptions.

LLM Council

Per-request model override support varies by backend:

Backend	Honors `model=`?	Cross-vendor routing?
NEAR AI	Yes	Yes (aggregator — hosts models from many vendors)
Anthropic OAuth	Yes	No (Anthropic models only)
GitHub Copilot	Yes	No (Copilot-exposed models only)
Bedrock	No	— (model fixed at construction)
OpenAI / Ollama / Tinfoil via rig	No (silent fallback with warning log)	—

When to use a council

The user wants diverse perspectives on a question or analysis
Cross-referencing answers to increase confidence
Comparing reasoning approaches across models
Getting a "second opinion" from different AI models
Research or evaluation tasks that benefit from multiple viewpoints

Default council line-up

Check the configured backend first (e.g. from LLM_BACKEND or the user's settings) before picking a lineup. Unless the user requests specific models, use the matching default below.

NEAR AI (aggregator — default council):

COUNCIL = [
    "anthropic/claude-opus-4-6",
    "google/gemini-3-pro",
    "zai-org/GLM-latest",
    "openai/gpt-5.4",
]

Anthropic OAuth (Anthropic-only, no cross-vendor routing):

COUNCIL = [
    "claude-opus-4-6",
    "claude-sonnet-4-6",
    "claude-haiku-4-5",
]

Use different Anthropic tiers for diversity of reasoning depth vs. speed.

GitHub Copilot (whatever Copilot exposes):

COUNCIL = ["gpt-5.4", "claude-opus-4-6", "gemini-3-pro"]

Copilot's available models shift over time — call llm_query with the user's configured default if you're unsure which are reachable.

If the user names specific models, always use those instead of the defaults.

API

Single call with a model override

answer = llm_query(
    prompt="What is X?",
    context="Optional background",     # optional
    model="anthropic/claude-opus-4-6", # optional per-call override
)

Parallel council (same prompt, many models)

COUNCIL = [
    "anthropic/claude-opus-4-6",
    "google/gemini-3-pro",
    "zai-org/GLM-latest",
    "openai/gpt-5.4",
]
responses = llm_query_batched(
    prompts=["What are the main risks of X?"] * len(COUNCIL),
    models=COUNCIL,                    # parallel array, length must match prompts
    context="Answer in 3-5 bullet points.",
)
# `responses` is a list of strings in the same order as `models`.
# If a specific model is unavailable, that slot returns "Error: ..." —
# the rest of the batch still completes.

Single model applied to many prompts

results = llm_query_batched(
    prompts=["Q1", "Q2", "Q3"],
    model="anthropic/claude-opus-4-6", # singular: applies to every prompt
)

Mixing `models=` slots with `None`

After collecting responses

Identify consensus — note where models agree
Flag disagreements — analyze where they diverge and why
Synthesize — produce a unified answer that accounts for all perspectives
Cite — reference which model contributed each insight

Full example

COUNCIL = [
    "anthropic/claude-opus-4-6",
    "google/gemini-3-pro",
    "zai-org/GLM-latest",
    "openai/gpt-5.4",
]
question = "What are the main risks of relying on a single LLM provider?"

responses = llm_query_batched(
    prompts=[question] * len(COUNCIL),
    models=COUNCIL,
    context="Answer concisely in 3-5 bullet points.",
)

# Build a synthesis prompt
labelled = "\n\n---\n\n".join(
    f"**{model}**:\n{resp}" for model, resp in zip(COUNCIL, responses)
)

synthesis = llm_query(
    prompt=f"Synthesize a balanced answer from these {len(COUNCIL)} expert opinions:\n\n{labelled}",
    context="Identify consensus, flag disagreements, and produce a unified answer.",
)

FINAL(synthesis)

Notes

llm_query_batched() runs all calls in parallel — total latency is roughly the slowest model.
If models= is provided, it must be the same length as prompts.
Use model= (singular) when you want one model applied to every prompt; use models= (plural list) for the council pattern.
Errors from individual models (unavailable model, provider timeout, etc.) surface as "Error: ..." strings in the result list, so a single bad model does not fail the whole batch. Argument validation errors — wrong types, or models/prompts length mismatch — still raise exceptions.

llm-council

LLM Council

When to use a council

Default council line-up

API

Single call with a model override

Parallel council (same prompt, many models)

Single model applied to many prompts

Mixing models= slots with None

After collecting responses

Full example

Notes

More from this repository

More from this repository

LLM Council

When to use a council

Default council line-up

API

Single call with a model override

Parallel council (same prompt, many models)

Single model applied to many prompts

Mixing models= slots with None

After collecting responses

Full example

Notes

Mixing `models=` slots with `None`

Mixing `models=` slots with `None`