with one click
workstation-aware-provider-orchestration
Plan and operate a Hermes-led control plane that routes AI provider work across workstations using quota urgency, machine readiness, GitHub issue gates, and a dispatch ledger.
Menu
Plan and operate a Hermes-led control plane that routes AI provider work across workstations using quota urgency, machine readiness, GitHub issue gates, and a dispatch ledger.
Karpathy's LLM Wiki — build and maintain a persistent, interlinked markdown knowledge base. Ingest sources, query compiled knowledge, and lint for consistency.
Operate ace-linux-1 as the continuous AI-agent control surface for overnight and continuous batches that keep GTM material moving toward client outreach.
Canonical GitHub issue planning route — issue intake, strengthened resource intelligence, repo-tracked plan artifact, adversarial review, GitHub progress posting, future-issue capture, explicit approval gate before execution, machine-dispatch readiness checks, and execution-ready delegation packaging for Claude agent teams.
Build a docs-only execution handoff bundle after a completed implementation wave — follow-up issue drafts, scoped authorization note, deploy checklist, operator note, copy/paste command bundle, and incremental commit hygiene.
Audit and dispose of session residue (orphan files, scratch dirs, sibling-repo state, locks, trash-stages) BEFORE claiming a task complete. Required gate before any agent says "all done", "task complete", or hands work back to user/orchestrator.
Wire Hermes into workspace-hub ecosystem — multi-repo skills, config sync, session export to learning pipeline, memory cross-pollination, skill patch tracking, and cross-machine health checks.
| name | workstation-aware-provider-orchestration |
| description | Plan and operate a Hermes-led control plane that routes AI provider work across workstations using quota urgency, machine readiness, GitHub issue gates, and a dispatch ledger. |
| version | 1.1.0 |
| author | Hermes Agent |
| tags | ["hermes","provider-routing","workstation-orchestration","quota-management","dispatch-ledger","github-issues"] |
| related_skills | ["agent-usage-optimizer","gh-work-planning","licensed-machine-prompt-orchestration","ace-linux-1-control-surface"] |
Use this skill when the user wants Hermes to coordinate AI provider/model usage across multiple machines or workstations, especially when provider quota/credits are time-sensitive and work must still respect GitHub plan/approval gates.
Design or operate a central AI workflow control plane that combines provider quota urgency, GitHub issue readiness, workstation availability, and safe dispatch prompts/ledgers.
agent:*, machine:*, status:*, priority:*, cat:*, or domain:* labels.ace-linux-1 is the primary Hermes/operator control-plane workstation for almost all AI workflow orchestration: provider usage decisions, queue review, prompt generation, dispatch ledger updates, GitHub state changes, and cross-workstation reconciliation.ace-linux-1 is also the continuous user-facing control surface: this is where approvals, plan decisions, work approvals, and morning reconciliation happen even when background lanes continue elsewhere.ace-linux-2 is the first overflow/execution worker node, not an equal peer control plane unless failover is explicitly chosen.Before doing client/company ecosystem work, check whether that company has a designated dispatch surface and worker-machine route. If the work belongs to a specific company channel, keep intake/status there and route execution to the named workstation rather than defaulting to the current Hermes host. Example pattern from Doris: the doris Telegram channel is the Doris company-admin dispatch surface, and Doris ecosystem execution should route to ace-linux-2; the current Hermes session remains the control/coordination surface only when explicitly asked.
If the worker route is blocked by auth or reachability, preserve the work as repo-owned artifacts and a runnable handoff script, then report the exact block and the command to run on the correct workstation. Do not silently substitute local execution for routed execution.
When the user asks to decide machine roles or tier-1 repo placement, do not turn the stream into recurring repo-placement, memory-layout, skill-layout, artifact-format, output-format, or cross-repo file-structure governance. Treat those as canonical infrastructure unless a narrow enforcement defect is explicitly in scope.
For workstation planning, the decision surface should be throughput-oriented:
For per-machine throughput-lane issues, prefer throughput(workstations): activate <machine> provider/machine lane over broad governance titles. The body should define provider fit, workload class, readiness probe, first approved batch candidate, and proof-of-throughput metric. Add labels such as cat:ai-orchestration, cat:operations, domain:ai-orchestration, domain:workstations, domain:agent-cost-tracking, machine:<host>, and a lifecycle label. See references/per-machine-throughput-lane-issues.md for a concise example pattern from the May 2026 correction.
If the user explicitly asks to decide which tier-1 repos should live on each machine, treat that as a separate repo-placement decision issue class, not a throughput activation issue. Use github-issues → references/machine-repo-placement-decision-issues.md: create/reuse one decision issue per machine in the requested order, separate recommendations from implementation, and tier evidence by live/remote/registry verification.
When the user asks to make these decisions interactively, do not batch assumptions across machines. Finish the current machine's repo-placement decision surface before moving to the next. For each machine, separate repo clone placement from large data residency: a worker may need a local tier-1 repo clone for code/tests/agents while the repo's large raw data remains on a canonical data host and is mounted read-only/read-mostly or staged as bounded subsets. Do not recommend running development, tests, agents, commits, or package installs against another machine's live working tree; use local clones per machine and reconcile through GitHub. Detailed pattern: references/machine-repo-placement-data-boundary.md.
Open or update a GitHub issue first
gh-work-planning and github-issues.status:plan-approved is present unless the task is planning/review-only.Refresh provider telemetry
bash scripts/cron/provider-utilization-refresh.sh
# or at minimum:
bash scripts/ai/assessment/query-quota.sh --refresh --json
Then inspect:
docs/reports/provider-utilization-weekly.mddocs/reports/provider-routing-scorecard.mddocs/reports/provider-work-queue.mdReconcile telemetry with user-visible state
unavailable, estimated, stale, or contradictory.Verify control-plane readiness from ace-linux-1
Check or plan checks for:
Rank issue candidates Prefer:
status:plan-approvedagent:* labelsMap provider + workstation together A dispatch decision should consider both:
ace-linux-1 control plane, ace-linux-2 overflow)Emit a dispatch ledger or Kanban execution board Minimum fields for either artifact:
When generating tier-1 Kanban boards, every issue row should carry enough provider + machine routing information that it can be converted into a safe dispatch ledger without reclassifying from scratch. Keep user-decision and plan-review lanes visible as first-class queues; do not hide them inside generic backlog.
Get approval before long-running execution Present the operator with the shortlist and launch plan before starting cross-machine or high-credit-burn batches.
When a provider credit expires within about 24 hours:
ace-linux-1; use ace-linux-2 for overflow only if readiness and zero-git-contention are verified.Use this overlay when the goal is not just throughput, but steady movement toward outreach.
docs/gtm/, issue comments, job-market outputs, or research notes.ace-linux-1 control surface: approvals, queue selection, dispatch ledger, morning synthesis, outreach packaging decisions.ace-linux-1 local lanes: Codex/Gemini/Claude planning, synthesis, bounded implementation, or packaging work with repo-owned prompts.ace-linux-2 overflow lanes: isolated implementation/review worktrees only after readiness checks.docs/gtm/*.md, demo reports, website drafts, outreach templates, issue comments, and evidence summaries.ace-linux-1, not inside a remote worker lane.Before delegating work to any worker workstation, especially ace-linux-2, run and record a reviewable readiness probe. The worker may be repo-ready but still unsafe for AI-provider execution.
Minimum checks:
Host reachability
getent hosts <host> || true
ping -c 1 -W 2 <host> || true
ssh -o BatchMode=yes -o ConnectTimeout=8 <host> 'hostname; uname -a; pwd'
Canonical workspace root
/mnt/local-analysis/workspace-hub for Linux workers unless evidence says otherwise./home/vamsee/workspace-hub) contain the tier-1 repo clones.Tier-1 repo readiness For each target repo, capture:
ssh <host> 'cd /mnt/local-analysis/workspace-hub/<repo> && \
git branch --show-current && \
git rev-parse --short HEAD && \
git remote get-url origin && \
git status --short && \
git rev-list --left-right --count @{u}...HEAD 2>/dev/null || true && \
test -f pyproject.toml && echo pyproject=yes || echo pyproject=no && \
test -f uv.lock && echo uv_lock=yes || echo uv_lock=no && \
test -d .venv && echo venv=yes || echo venv=no'
Treat root/workspace-hub dirty state separately from child repo cleanliness; root dirt can still block workspace-hub-root work.
GitHub auth readiness
ssh <host> 'gh auth status 2>&1'
If invalid, the worker cannot safely mutate GitHub state, create PRs, or push via gh until re-authenticated.
AI provider runtime readiness
Always check the worker's login shell as well as plain SSH. User-level installs may live in ~/.local/bin or ~/.npm-global/bin and be invisible in non-login SSH.
ssh <host> 'for c in hermes claude codex gemini; do command -v "$c" && "$c" --version 2>&1 | head -3 || echo "$c:not-found"; done'
ssh <host> 'bash -lc '\''for c in hermes claude codex gemini; do command -v "$c" && "$c" --version 2>&1 | head -3 || echo "$c:not-found"; done; hermes config 2>/dev/null | grep -Ei "provider|model|base_url|gpt" | head -20'\'''
Do not route expiring provider-credit work to a worker unless the relevant CLI/auth path exists in the launch environment and is known to consume the intended account/credit. If only the login shell exposes the tools, dispatch with ssh <host> 'bash -lc "<command>"' or explicitly source the user's environment.
For Codex specifically, CLI presence and ~/.codex/ files are only a weak signal. Before assigning real Codex burn work to a remote/overflow machine, run a tiny real codex exec smoke through the exact login-shell/tmux path you will use for the lane and confirm it does not fail with 401 Unauthorized or Failed to refresh token: refresh token was already used. If that smoke fails, mark the host Codex-blocked and use it only for Claude fallback/validation until codex login is refreshed.
Engineering software readiness Check both package/command presence and a task-appropriate smoke test. Presence alone is not enough. Useful Linux engineering probes:
ssh <host> 'command -v openfoam-selector && openfoam-selector --list || true'
ssh <host> 'command -v gmsh && gmsh --version || true'
ssh <host> 'command -v freecad || command -v FreeCAD || true'
ssh <host> 'command -v blender && blender --background --version 2>&1 | head -5 || true'
ssh <host> 'command -v pvbatch && pvbatch --version 2>&1 | head -5 || true'
ssh <host> 'command -v ccx && ccx 2>&1 | head -5 || true'
ssh <host> 'command -v qgis && qgis --version 2>&1 | head -3 || true'
ssh <host> 'command -v gdalinfo && gdalinfo --version || true'
GUI Qt tools may fail over SSH without display; prefer headless modes (--background, pvbatch) and record display/GPU caveats.
GPU/display caveat
ssh <host> 'nvidia-smi --query-gpu=name,memory.total,driver_version --format=csv,noheader 2>/dev/null || true'
Do not assign GPU or GUI-dependent work unless driver/display/headless readiness is explicitly validated.
Dispatch ledger evidence
Store the probe result in a durable report (for example docs/reports/YYYY-MM-DD-issue-NNN-<host>-readiness-probe.md) and link it from the GitHub issue before delegating.
Use this as a starting hypothesis, not a substitute for a fresh probe:
ace-linux-2 and canonical repo root was /mnt/local-analysis/workspace-hub.digitalmodel, worldenergydata, assetutilities, and teamresumes existed and were clean on main; teamresumes lacked .venv.workspace-hub root itself was dirty, so root-level work needed a separate dirty-state decision.openfoam2312, Gmsh, FreeCAD, Blender, ParaView/pvbatch, CalculiX, QGIS, and GDAL/OGR.hermes/codex, but a login shell did: bash -lc found /home/vamsee/.local/bin/hermes and /home/vamsee/.npm-global/bin/codex.ace-linux-2 reported default provider/model openai-codex / gpt-5.5 with base URL https://chatgpt.com/backend-api/codex; Codex auth files existed under ~/.codex/.gh auth was invalid, so keep GitHub mutation authority on ace-linux-1 unless gh is repaired on ace-linux-2.ace-linux-2 is repo-ready and Hermes/Codex-runtime-ready when launched through a login shell, but not ready for local GitHub mutation via gh.When the user asks to execute work on another workstation (not just prepare a prompt), use real remote process orchestration rather than delegate_task:
scp local-worker-prompt.md ace-linux-2:/tmp/worker-prompt.md
tmux session over SSH from a login shell so user-level CLIs are on PATH:
ssh ace-linux-2 "bash -lc 'mkdir -p /mnt/local-analysis/ace2-worker-logs /mnt/local-analysis/ace2-worker-reports; \
SESSION=ace2-overflow-$(date +%Y%m%d); \
tmux kill-session -t \$SESSION 2>/dev/null || true; \
tmux new-session -d -s \$SESSION -c /mnt/local-analysis/workspace-hub \
\"bash -lc \\\"claude --print --dangerously-skip-permissions < /tmp/worker-prompt.md 2>&1 | tee /mnt/local-analysis/ace2-worker-logs/\$SESSION.log\\\"\"; \
tmux list-sessions | grep \$SESSION'
ssh ace-linux-2 "bash -lc 'tmux capture-pane -t ace2-overflow-YYYYMMDD -p -S -80; find /mnt/local-analysis/ace2-worker-reports -maxdepth 1 -type f -printf \"%f %s bytes\\n\"'"
Do not use shell-level nohup ... & wrappers through Hermes foreground terminal; Hermes blocks that pattern. Use Hermes terminal(background=true) for local tracked processes, or remote tmux for SSH-launched workers.
Use this when a control-plane workstation reboots or a context handoff indicates in-flight Hermes/Claude/Codex/tmux work may have survived, stalled, or been partially landed. Work in this order:
Salvage current work first
todo, process tables, tmux sessions, logs, and GitHub issue labels/state.session_id when available; do not assume an empty process list means the run is gone.ps PIDs/PGIDs before killing anything. Avoid pkill -f; it can self-match and terminate the orchestrator.Research/restart ongoing work second
ace-linux-2, rerun readiness (scripts/operations/agent-execution/ace2-readiness.sh when available) and keep it report-only if gh auth status is invalid./tmp prompts whenever they exist.Set off future work last
docs/plans/machine-prompts/<date>/... and scripts/operations/agent-execution/, then validate (bash -n, --help, dry-run) before committing.For workspace-hub orchestration, prefer committed scripts over ad hoc /tmp launch commands when they exist:
bash scripts/operations/agent-execution/ace2-readiness.sh
bash scripts/operations/agent-execution/launch-ace1-control-plane.sh --dry-run
bash scripts/operations/agent-execution/launch-ace2-overflow-worker.sh --dry-run
bash scripts/operations/agent-execution/launch-2518-finalizer.sh --dry-run
These scripts encode the current safety defaults: login-shell PATH on ace-linux-2, tmux-based remote execution, repo-owned prompts, explicit logs/reports, and dry-run/help validation.
Use this when the user wants to keep reviewing decisions in the current chat while a separate Hermes session performs read-only orchestration readiness inspection.
docs/plans/machine-prompts/<date>/execution/orchestration-readiness-interactive-handoff.mdSESSION=orch-readiness-$(date +%Y%m%d)
PROMPT=/mnt/local-analysis/workspace-hub/docs/plans/machine-prompts/$(date +%F)/execution/orchestration-readiness-interactive-handoff.md
LOG=/mnt/local-analysis/workspace-hub/docs/plans/machine-prompts/$(date +%F)/execution/orchestration-readiness-interactive-session.log
tmux new-session -d -s "$SESSION" -c /mnt/local-analysis/workspace-hub \
"bash -lc 'hermes --pass-session-id 2>&1 | tee -a $LOG'"
tmux capture-pane.tmux load-buffer -b orch_prompt "$PROMPT"
tmux paste-buffer -t "$SESSION" -b orch_prompt
tmux send-keys -t "$SESSION" Enter
When the user asks to keep work lanes going until a reset/expiry time, update or create a Hermes cron job rather than launching uncontrolled duplicate agents.
Recommended guardrails for the cron prompt:
RUNNING, READY_FOR_REVIEW, STALLED_NO_OUTPUT, or BLOCKED.status:working autonomously.ace-linux-1 as control plane and avoid ace-linux-2 GitHub mutations unless fresh auth/readiness proves safe.Use cronjob(action='update') to retarget an existing burn/controller job when one already exists, instead of creating overlapping cron jobs. Set enabled_toolsets narrowly (usually terminal,file) and a repeat count that covers the reset window with a small buffer.
status:plan-approved.ace-linux-2 become an untracked peer control plane instead of a worker/overflow node.This skill intentionally overlaps with agent-usage-optimizer for provider quota routing, but adds the workstation/control-plane layer. Future consolidation could merge this workstation section into agent-usage-optimizer if external skill write access is available.