بنقرة واحدة
update-llm-model-list
// Audit and update the supported LLM model list in assets.py against litellm's registry (models.litellm.ai). Use when adding new models, pruning outdated ones, or verifying the list is correct.
// Audit and update the supported LLM model list in assets.py against litellm's registry (models.litellm.ai). Use when adding new models, pruning outdated ones, or verifying the list is correct.
Sync the findings record against local review artifacts and optionally a GitHub PR. Accept optional `path` and GitHub PR `url`; when `url` is provided, sync against both remote PR state and local state, otherwise default to local-only sync. Default to `path=infer`. Confirm effective variables before starting.
Update the API reference documentation by downloading the latest OpenAPI spec from production and regenerating the Docusaurus API docs
Resolve findings by implementing the chosen fix path in code, tests, or docs. Accept optional `path` and a `priority` selector; by default resolve only the next highest remaining priority bucket, in order `P0`, `P1`, `P2`, `P3`. Also accept explicit levels or `all`. Default to `path=infer`. Confirm effective variables before starting.
Perform a fresh-context scan of code and docs that turns verification observations and missing-test gaps into findings. Accept optional `path` and `depth` parameters and default to `path=infer`, `depth=deep`. Confirm effective variables before starting.
Run or inspect the relevant validation paths and turn failures, regressions, or missing coverage into findings. Accept optional `path` and `depth` parameters and default to `path=infer`, `depth=deep`. Confirm effective variables before starting.
Coordinate findings work with the user, decide whether scan, test, or sync should run, and turn the current findings set into a ready plan. Accept optional `path` and GitHub PR `url`; default to `path=infer`. Confirm effective variables before starting.
| name | update-llm-model-list |
| description | Audit and update the supported LLM model list in assets.py against litellm's registry (models.litellm.ai). Use when adding new models, pruning outdated ones, or verifying the list is correct. |
The canonical model list lives in sdks/python/agenta/sdk/utils/assets.py → supported_llm_models.
It drives the model dropdown in the playground, cost metadata, and the model_to_provider_mapping.
The authoritative external source is litellm.model_cost (2 600+ entries), which mirrors
https://models.litellm.ai/.
A pytest guard lives at:
sdks/python/oss/tests/pytest/unit/test_supported_llm_models.py
litellm.model_cost (direct key, or with provider prefix stripped).
anthropic/claude-* → litellm stores as claude-* (prefix is intentional for routing, stripped for cost lookup)cohere/command-* → litellm stores as command-*gemini/, groq/, together_ai/)"anthropic", "gemini", …) must match the Secrets API enum in
api/oss/src/core/secrets/enums.py (StandardProviderKind).Run this with uvx (no local install needed):
cat > /tmp/check_agenta_models.py << 'SCRIPT'
# /// script
# requires-python = ">=3.11"
# dependencies = ["litellm"]
# ///
import litellm, sys
# paste supported_llm_models here or import it
from agenta.sdk.utils.assets import supported_llm_models
mc = set(litellm.model_cost.keys())
def exists(m):
if m in mc: return True
if "/" in m and m.split("/", 1)[1] in mc: return True
return False
fails = []
for provider, models in supported_llm_models.items():
for model in models:
if not exists(model):
fails.append((provider, model))
total = sum(len(v) for v in supported_llm_models.values())
print(f"Total models checked: {total}")
if fails:
for p, m in fails:
print(f" MISSING [{p}] {m}")
sys.exit(1)
else:
print("All models valid ✓")
SCRIPT
uvx --with litellm python /tmp/check_agenta_models.py 2>/dev/null
Alternatively, run the pytest unit test directly (requires agenta installed):
pytest sdks/python/oss/tests/pytest/unit/test_supported_llm_models.py -v
This script finds models in litellm that Agenta doesn't list yet, filtered to remove noise (audio, video, embeddings, codex, snapshots):
cat > /tmp/find_missing.py << 'SCRIPT'
# /// script
# requires-python = ">=3.11"
# dependencies = ["litellm"]
# ///
import litellm, re
AGENTA_ANTHROPIC = set() # fill from assets.py (bare names, no prefix)
AGENTA_OPENAI = set() # fill from assets.py
AGENTA_GEMINI = set() # fill from assets.py (with gemini/ prefix)
mc = set(litellm.model_cost.keys())
NOISE = [
"audio","tts","speech","whisper","transcri","realtime","diarize",
"dall-e","image","video","veo","embed","moderat","search",
"babbage","davinci","ada","instruct","codex","computer-use",
"robotics","learnlm","gemma","live","v1:0",
]
KEEP = {"gpt-4o","gpt-4o-mini"}
DATED = re.compile(r"-\d{4}-\d{2}-\d{2}$")
EXP = re.compile(r"exp-\d{4}|\d{2}-\d{2}$")
def noise(m):
if m in KEEP: return False
return any(kw in m.lower() for kw in NOISE)
def dated(m):
return bool(DATED.search(m)) or bool(EXP.search(m))
def report(label, candidates, known, prefix=""):
print(f"\n=== {label} ===")
for m in sorted(candidates):
bare = m[len(prefix):] if prefix else m
if bare in known or m in known: continue
tag = "[dated/exp]" if dated(m) else "[alias]" if m.endswith("-latest") else "*** MISSING ***"
print(f" {m} {tag}")
# Anthropic
report("ANTHROPIC", [m for m in mc if m.startswith("claude-") and not noise(m)],
AGENTA_ANTHROPIC)
# OpenAI (no slash, starts with gpt- / o1 / o3 / o4)
OAI = [m for m in mc if any(m.startswith(p) for p in ("gpt-","o1","o3","o4","chatgpt"))
and "/" not in m and not noise(m)]
report("OPENAI", OAI, AGENTA_OPENAI)
# Gemini
report("GEMINI", [m for m in mc if m.startswith("gemini/") and not noise(m)],
AGENTA_GEMINI, prefix="gemini/")
SCRIPT
uvx --with litellm python /tmp/find_missing.py 2>/dev/null
Fill in the AGENTA_* sets from the current assets.py before running.
assets.pyFile: sdks/python/agenta/sdk/utils/assets.py
"gemini".o1-pro, o3-pro): add after their base model.litellm.groq_models — Groq rotates its model catalogue frequently.litellm.deepinfra_models / litellm.together_ai_models for current names.| Provider key | Agenta prefix | litellm cost key prefix |
|---|---|---|
anthropic | anthropic/ | claude- (no prefix) |
cohere | cohere/ | command- (no prefix) |
gemini | gemini/ | gemini/ |
groq | groq/ | groq/ |
mistral | mistral/ | mistral/ |
openai | (none) | (none) |
openrouter | openrouter/ | openrouter/ |
perplexityai | perplexity/ | perplexity/ |
together_ai | together_ai/ | together_ai/ |
deepinfra | deepinfra/ | deepinfra/ |
# Format + lint
uvx --from ruff==0.14.0 ruff format sdks/python/agenta/sdk/utils/assets.py
uvx --from ruff==0.14.0 ruff check --fix sdks/python/agenta/sdk/utils/assets.py
# Validate all models against litellm (no agenta install needed)
uvx --with litellm python /tmp/check_agenta_models.py 2>/dev/null
All checks must pass before committing.
| File | Purpose |
|---|---|
sdks/python/agenta/sdk/utils/assets.py | Canonical model list + cost metadata builder |
sdks/python/oss/tests/pytest/unit/test_supported_llm_models.py | Pytest guard (parametrized per model) |
api/oss/src/core/secrets/enums.py | Provider keys — must stay in sync |
api/oss/src/resources/evaluators/evaluators.py | Separate (shorter) model list for evaluator dropdown |