with one click
hermes-model-switching
Quick provider/model switching for Hermes CLI — aliases, fallbacks, task routing matrix, and utilization audit pattern.
Menu
Quick provider/model switching for Hermes CLI — aliases, fallbacks, task routing matrix, and utilization audit pattern.
Adversarially review the last 24h of multi-agent work by combining git history, GitHub issue state, generated analysis artifacts, governance tests, and duplicate-checked follow-up issue creation.
Analyze session quality trends — identify high-churn patterns, report waste, flag sessions exceeding 500 tool calls
Build self-contained interactive web applications as single HTML files. Use for creating demos, prototypes, interactive tools, and standalone web experiences that work without external servers.
Quick provider/model switching for Hermes CLI — aliases, fallbacks, task routing matrix, and utilization audit pattern.
Build complex AI systems with declarative programming, optimize prompts automatically, create modular RAG systems and agents with DSPy - Stanford NLP's framework for systematic LM programming
Mandatory planning workflow for ALL GitHub issues — plan, review, approve, then implement.
| name | hermes-model-switching |
| description | Quick provider/model switching for Hermes CLI — aliases, fallbacks, task routing matrix, and utilization audit pattern. |
| version | 2.0.0 |
| tags | ["hermes","model-switching","multi-provider","gemini","openai","Codex","routing"] |
| related_skills | ["agent-usage-optimizer","overnight-parallel-agent-prompts","multi-machine-ai-readiness-and-issue-triage"] |
Installed in ~/.bash_aliases (updated 2026-04-09).
h # hermes chat (default = config-driven; keep Codex-first)
h-codex # gpt-5.4 via openai-codex
h-gpt # same as h-codex
h-mini # gpt-5.4-mini via openai-codex
h-batch # gpt-5.2 via openai-codex
h-opus # Codex-opus-4-6 via Anthropic (explicit use)
h-sonnet # Codex-sonnet-4-6 via Anthropic (explicit use)
h-gemini # gemini-2.5-pro via Google AI Studio direct (explicit use)
h-gemini-flash # gemini-2.5-flash via Google AI Studio direct (explicit use)
h-router-gemini # google/gemini-2.5-pro via OpenRouter (explicit use)
h-router # qwen/qwen3.6-plus:free via OpenRouter (explicit use)
h-qwen # same as h-router
h-nemotron # nvidia/nemotron-3-super-120b-a12b:free via OpenRouter
h-copilot # Codex-sonnet-4.6 via Copilot (explicit use)
h-copilot-gemini # gemini-2.5-pro via Copilot (explicit use)
h-copilot-gpt # gpt-5.4 via Copilot (explicit use)
h-o3 # o3 via OpenAI
h-which # show current model block from ~/.hermes/config.yaml
h-quick # show quick_commands block from ~/.hermes/config.yaml
Policy note: Gemini and Copilot remain available, but they are explicit-use only and should not be described or treated as automatic fallback/default routing.
NOTE: h-gemini and h-gemini-flash aliases exist but use --provider google which is NOT a valid Hermes CLI provider. Use h-copilot-gemini for Gemini via Copilot quota, OR use the direct AI Studio path below to route through your $20/mo Gemini sub instead.
Google AI Studio exposes an OpenAI-compatible endpoint. Hermes can reach it via the
openrouter provider with a custom --base-url. GEMINI_API_KEY in ~/.hermes/.env (scanner-allow:hermes_env_access — doc reference, not an instruction to read secrets)
is NOT auto-detected by any provider — it must be passed explicitly.
Test first:
hermes chat --provider openrouter \
--api-key $GEMINI_API_KEY \
--base-url https://generativelanguage.googleapis.com/v1beta/openai/ \
-m gemini-2.5-pro \
-e "say hello in 5 words"
If that works, add aliases to ~/.bash_aliases:
alias h-gemini='hermes chat --provider openrouter --api-key $GEMINI_API_KEY --base-url https://generativelanguage.googleapis.com/v1beta/openai/ -m gemini-2.5-pro'
alias h-gemini-flash='hermes chat --provider openrouter --api-key $GEMINI_API_KEY --base-url https://generativelanguage.googleapis.com/v1beta/openai/ -m gemini-2.5-flash'
Then reload: source ~/.bash_aliases
Pitfall: GEMINI_API_KEY must be set in the shell environment (loaded from .env) for
the $GEMINI_API_KEY expansion to work in the alias. Verify with: echo $GEMINI_API_KEY
/model deepseek:deepseek-chat or /model copilot:gemini-2.5-prohermes config set model.default <model> && hermes config set model.provider <provider>hermes chat -m <model> --provider <provider>hermes model (TUI picker — slowest method)Valid --provider values: auto, openrouter, nous, openai-codex, copilot-acp, copilot, anthropic, huggingface, zai, kimi-coding, minimax, minimax-cn, kilocode, deepseek
DeepSeek was missing from the hardcoded argparse choices in hermes_cli/main.py despite being in the auth registry. Patched on 2026-04-04 (line ~4041). If Hermes updates overwrite this, re-patch or use /model deepseek:deepseek-chat in-session instead.
Gemini is now a valid Hermes CLI provider (hermes chat --provider gemini ...). However, observed on 2026-04-09 that direct Gemini calls returned HTTP 429 free-tier quota exhaustion (generate_content_free_tier_requests) and Hermes then auto-fell back to Copilot, which was also exhausted. So Gemini should not be used as Hermes automatic cheap/default routing until quota behavior is stabilized.
| Task Type | Primary | Fallback | Rationale |
|---|---|---|---|
| Code implementation | Codex (h-opus) | Codex gpt-5.4 (h-gpt) | Deepest context, best code quality |
| Code review (adversarial) | Codex + Gemini | Codex | Independent eyes, cheaper |
| Research & literature | Gemini (h-gemini) | Codex | 1M+ context window, web grounding |
| Large document analysis | Gemini (h-gemini) | Codex | Long-context advantage |
| Data analysis / viz | Gemini (h-gemini) | Codex | Good at structured data |
| Plan review (adversarial) | All three | — | Policy already mandates this |
| Quick lookups / triage | Gemini Flash (h-gemini-flash) | Codex | Preserve Codex quota |
| Overnight batch prompts | Split evenly | — | Distribute load across all 3 |
Config claims vs actual usage — routing must be actively enforced:
When asked about utilization, check these sources in order:
| Source | Path | Freshness Pattern |
|---|---|---|
| cost-tracking.jsonl | .Codex/state/session-signals/ | Usually fresh (246K records, Codex-only) |
| weekly-trends.jsonl | .Codex/state/trends/ | Usually fresh (5 weeks) but tracks violations, not credits |
| agent-quota.json | ~/.cache/ | Often stale (no cron refreshes it) |
| config/*_usage.json | config/ | Usually DEAD placeholders (all zeros, 2025 dates) |
| Codex history.jsonl | config/agents/codex/state-snapshots/ | Active raw session logs |
| check_claude_usage.sh | scripts/operations/monitoring/ | Script exists but log file often never created |
Critical gap: No automated Codex or Gemini spend tracking. No cron job refreshes any quota data. See #1855.
REASON: repeated Gemini 429 quota exhaustion and Copilot fallback exhaustion caused retry waste and stale-session confusion. Current baseline is Codex-first with smart routing disabled and automatic fallback disabled.
| Model | Description | Use Case |
|---|---|---|
| gpt-5.4 | Latest frontier agentic coding model | Default — general coding, implementation |
| gpt-5.4-mini | Smaller frontier agentic coding model | Quick tasks, cost-conscious |
| gpt-5.3-codex | Frontier Codex-optimized agentic coding model | Deep code generation, Codex-native tasks |
| gpt-5.2 | Optimized for professional work and long-running agents | Overnight batch, long context |
model:
default: gpt-5.4
provider: openai-codex
base_url: https://chatgpt.com/backend-api/codex
fallback_providers: []
fallback_model: {}
credential_pool_strategies:
openai-codex: round_robin
anthropic: fallback
smart_model_routing:
enabled: false
cheap_model:
provider: openai-codex
model: gpt-5.4-mini
quick_commands:
research: { provider: openai-codex, model: gpt-5.4-mini }
code: { provider: openai-codex, model: gpt-5.4 }
review: { provider: openai-codex, model: gpt-5.4 }
quick: { provider: openai-codex, model: gpt-5.4-mini }
data: { provider: openai-codex, model: gpt-5.4-mini }
batch: { provider: openai-codex, model: gpt-5.2 }
delegation:
model: gpt-5.4
provider: openai-codex
DEFAULT: openai-codex/gpt-5.4
QUICK: openai-codex/gpt-5.4-mini
BATCH: openai-codex/gpt-5.2
FALLBACK: disabled
SMART ROUTING: disabled
Important runtime nuance:
run_agent.py restores the original primary runtime at the start of each new turn in long-lived sessionsPrimary runtime restored for new turn: gemini..., restart the long-lived Hermes process/session; config alone is not enoughTo use Codex explicitly: h-opus, h-sonnet, h-copilot To use Gemini explicitly: h-copilot-gemini, h-gemini, h-router-gemini
anthropic: 2 creds (hermes_pkce + Codex OAuth)
openai-codex: 2 creds (both $20/mo subs pooled, round-robin)
copilot: 1 cred (GH_TOKEN from gh auth token)
deepseek: 1 cred (DEEPSEEK_API_KEY in .env)
openrouter: 1 cred (OPENROUTER_API_KEY in .env — added 2026-04-04)
gemini-direct: n/a (GEMINI_API_KEY in .env — used via openrouter provider + custom base-url)
Add more Codex accounts: hermes auth add openai-codex → OAuth device flow.
View pool: hermes auth list. Reset exhaustion: hermes auth reset <provider>.
If default provider is openai-codex but delegation.model is Codex-sonnet-4-6, subagents try Codex on Codex endpoint → fails. Always set delegation.provider explicitly.
Similarly, if default is openai-codex and you run hermes chat --provider nous, it sends the default model name gpt-5.4 to Nous — which Nous routes through their paid gateway, burning Nous credits instead of using a free model. Always specify BOTH --provider AND -m when switching.
If default provider is openai-codex but delegation.model is Codex-sonnet-4-6, subagents will try to run a Codex model on the Codex endpoint — which fails. Always set delegation.provider explicitly when it differs from the default.
ALL providers use subscription billing — NOT pay-per-token API:
Hermes authenticates to Codex via read_claude_code_credentials() which reads the Codex OAuth token. No separate API key needed — the ANTHROPIC_API_KEY in ~/.hermes/.env is empty/unused. (scanner-allow:hermes_env_access — doc reference, not an instruction to read secrets)
Optimization is about maximizing VALUE from fixed spend, not reducing cost per token. Every unused Gemini query slot is capacity already paid for but wasted.
Gemini should run FIRST to prepare context for Codex/Codex coding:
Simple dispatch pattern:
# Gemini preps, Codex implements:
h-gemini "Research [topic]. Write findings to /tmp/prep-notes.md"
h-opus "Read /tmp/prep-notes.md. Implement [feature] based on the research."
In ~/.hermes/config.yaml, set delegation model to Sonnet (applied 2026-04-04):
delegation:
model: 'Codex-sonnet-4-6' # was empty (inherited Opus)
provider: '' # inherits parent provider
This stops subagents from burning Opus quota on research/triage work.
Stronger option: Route delegation through Copilot to fully bypass Anthropic OAuth:
delegation:
model: 'Codex-sonnet-4.6'
provider: 'copilot'
This routes ALL subagent work through $9/mo Copilot sub instead of $200/mo Codex OAuth. Use if Anthropic starts billing Hermes as "extra usage."
Within a single session, route subagent work to cheapest capable model:
Gemini limitation: Google/Gemini is NOT a native Hermes provider. Cannot route subagents to Gemini. Use separate terminal sessions (h-gemini) or OpenRouter.
Quantified shift opportunity from cross-agent audit:
Decision: Don't over-engineer routing automation now. The priority order is:
Every agent should always have 5+ tasks queued. File: notes/agent-work-queue.md
Anthropic announced that starting Apr 4, 2026, third-party harnesses using Codex subscription OAuth will draw from "extra usage" (costs extra) instead of subscription quota.
Hermes IS affected. It spoofs Codex identity:
~/.Codex/.credentials.jsonCodex-20250219 beta headersMitigations:
--provider copilot as alternative Codex access path (GitHub auth, not Anthropic OAuth)Monitor: Check ~/.hermes/logs/errors.log for new error types: extra_usage_required, billing-related 402/403. Check https://console.anthropic.com/settings/billing.
As of Apr 4 12:37pm: Hermes still working on subscription. No extra charges detected yet. But OpenAI Plus (one of two subs) is maxed out until Apr 8.
Gemini is most cost-effective when running 5-6 related research tasks per session (~2 min, 26 tool calls, $0 from $20/mo sub).
h-router-gemini -t terminal,file,web -q "$(cat /tmp/gemini-batch-N.txt)"
| Batch | Tasks | Duration | Tool Calls | Issues Closed |
|---|---|---|---|---|
| 1 | Hydro mapping + standards gaps (5 tasks) | 5m 17s | 88 | 6 (#1823, #1821, #1822, #1819, #1860, #1864) |
| 2 | Subdomains + research tools (6 tasks) | 2m 3s | 26 | 3 (#1397, #1624, #152) |
| 3 | Research pipeline (5 tasks) | 2m 3s | 26 | 3 (#182, #120, #55) |
# 4 batches spaced 75 min apart through the night
# Each runs once, uses gemini-2.5-pro via openrouter provider
- name: gemini-overnight-batch-1
schedule: 00:30 # starts at midnight
model: google/gemini-2.5-pro
provider: openrouter
- name: gemini-overnight-batch-2
schedule: 01:30 # 60 min later
model: google/gemini-2.5-pro
provider: openrouter
Gemini hermes sessions run in isolated sandboxes. Files created during the session are destroyed when it ends EXCEPT:
h-router-gemini # WORKS: hermes chat --provider openrouter -m google/gemini-2.5-pro
h-gemini # BROKEN: uses custom base-url that Hermes CLI doesn't support
h-copilot-gemini # WORKS: gemini via Copilot sub (uses Copilot quota, not $20 Gemini)
Key finding: h-router-gemini works perfectly for batch research. Use this one, not h-gemini.
Hermes supports 18 providers.
Hermes supports 18 providers. Several offer FREE open-source models:
| Provider | Command | Free Models | Auth |
|---|---|---|---|
| Nous Portal | --provider nous | qwen/qwen3.6-plus:free, nvidia/nemotron-3-super-120b:free | Free account |
| DeepSeek | --provider deepseek | deepseek-chat, deepseek-reasoner | Free tier API key |
| HuggingFace | --provider huggingface | Qwen3.5-397B, DeepSeek-V3.2, Kimi-K2.5 | Free HF token |
| GitHub Copilot | --provider copilot | Codex, Gemini, GPT, Grok — all models! | $9/mo (already paid) |
Copilot is the hidden gem: Your $9/mo annual sub gives Hermes access to Codex Opus/Sonnet + Gemini 2.5 Pro + GPT-5.4 through GitHub's auth (separate from Anthropic OAuth).
Copilot aliases (READY — GH_TOKEN configured Apr 4, 2026):
h-copilot='hermes chat --provider copilot -m Codex-sonnet-4.6'
h-copilot-gemini='hermes chat --provider copilot -m gemini-2.5-pro'
h-copilot-gpt='hermes chat --provider copilot -m gpt-5.4'
DeepSeek aliases (WORKING — DEEPSEEK_API_KEY configured, CLI patched):
h-deepseek='hermes chat --provider deepseek -m deepseek-chat'
h-deepseek-reason='hermes chat --provider deepseek -m deepseek-reasoner'
Nous Portal gotcha: :free models are free to RUN, but creating an API key requires a funded account ("out of funds" error at portal.nousresearch.com). Bot detection blocks headless browser signup. Alternative: hermes setup model → Nous → OAuth device flow.
DeepSeek CLI gotcha: deepseek was missing from Hermes CLI --provider argparse choices. Patched in ~/.hermes/hermes-agent/hermes_cli/main.py line ~4041. Hermes updates may overwrite this — re-check after updates.
Local inference NOT viable: dev-primary has GTX 750 Ti (2GB VRAM). Use cloud-hosted OSS instead.
Copilot is the best multi-model hedge: $9/mo sub gives Codex+Gemini+GPT via GitHub auth, independent of Anthropic OAuth billing. GH_TOKEN auto-generated from gh auth token.
When a session says the config is fixed, do not stop at codex --version or TOML inspection. Verify an actual model invocation.
Use this exact smoke test:
codex exec --skip-git-repo-check --sandbox read-only -m gpt-5.4 \
'Reply with exactly CODEX_GPT54_OK and nothing else.'
Expected success signals:
model: gpt-5.4CODEX_GPT54_OKObserved on 2026-04-09:
~/.codex/config.tomlprovider: openai in the exec banner even though the subscription workflow is the Codex CLI pathImportant CLI gotcha:
codex auth status is NOT a valid subcommand in codex-cli 0.118.0codex --help, codex login --help, and especially the codex exec ... smoke test above instead of trying statusCheck for billing/auth issues: grep -i '401\|403\|429\|billing\|extra.usage\|rate.limit' ~/.hermes/logs/errors.log
Known error patterns:
~/.hermes/.env for bad overrides like GEMINI_BASE_URL=h-which. A malformed env base URL can leak into active model state and then into config.yaml. (scanner-allow:hermes_env_access — doc reference, not an instruction to read secrets)_model_flow_api_key_provider() ignores invalid existing env base URLs and invalid typed base URL overrides unless they start with http:// or https://..env, also inspect ~/.hermes/auth.json credential_pool entries for stale provider base_url residue. We observed Gemini still carrying base_url: h-which in auth metadata even after .env was corrected.hermes_cli/codex_quota.py for a complete solution. The critical check is whether update_codex_credential_usage(...) is actually called from live runtime paths. If it is not wired, /quota and startup warnings are mostly structural/stale.GITHUB_TOKEN while the actual machine used GH_TOKEN).source ~/.bash_aliases in new terminals for aliases to take effecthermes model / command-window provider selection can prompt for Base URL [...]. If a nonsense token appears there, leave it blank; Hermes now ignores invalid values, but the safest operator habit is still blank unless you intend an advanced override.128k context message for Codex, treat it as likely generic metadata fallback, not proof of the real Codex endpoint limit.deepseek provider was missing from CLI argparse choices — patched in main.py; may need re-patching after Hermes updates--provider nous without -m sends the default model (e.g. gpt-5.4) to Nous gateway, burning Nous credits — always specify both provider AND modelhermes --h-deepseek does NOT work — aliases are shell commands, type h-deepseek directly (no hermes prefix)auth.json error fields, ~/.cache/agent-quota.json synthetic entries, and codex_quota.py state. As of 2026-04-10, codex_quota.py was patched with: (1) last_error_reset_at recovery check, (2) last_error_message-based detection, (3) 12h staleness expiry for exhausted_* cache entries. If false warnings recur, clear all three layers — see #2107.python3 -c "import json,time; [code to clear auth.json and agent-quota.json]" — or use hermes auth reset openai-codex followed by query-quota.sh --refresh.