with one click
autocontext
// Use when a Hermes agent needs to evaluate agent behavior, run Autocontext scenarios, inspect Hermes curator state, export reusable knowledge, or prepare local MLX/CUDA training data through the autoctx CLI.
// Use when a Hermes agent needs to evaluate agent behavior, run Autocontext scenarios, inspect Hermes curator state, export reusable knowledge, or prepare local MLX/CUDA training data through the autoctx CLI.
| name | autocontext |
| description | Use when a Hermes agent needs to evaluate agent behavior, run Autocontext scenarios, inspect Hermes curator state, export reusable knowledge, or prepare local MLX/CUDA training data through the autoctx CLI. |
| version | 1.0.0 |
| author | Autocontext |
| license | Apache-2.0 |
| metadata | {"hermes":{"tags":["autocontext","evaluation","traces","cli","curator","mlx","cuda","skills"],"related_skills":["native-mcp","hermes-agent-skill-authoring","axolotl"]}} |
Autocontext is a control plane for evaluating agent behavior, preserving useful run artifacts, exporting training data, and distilling stable behavior into local runtimes. In Hermes, use this skill when the work calls for measurement, replay, datasets, local MLX/CUDA training, or read-only analysis of Hermes skill curation.
Hermes Curator owns Hermes skill mutation. Autocontext should inspect, evaluate, replay, export, and recommend. Do not use Autocontext as a replacement for Hermes Curator, and do not edit Hermes skills directly unless the user explicitly asks for that operation.
Do not use this skill for normal Hermes memory updates, direct skill consolidation, or user-local skill deletion. Those are Hermes Curator responsibilities.
Use the CLI first. The autoctx CLI is the default surface because Hermes agents can run it with normal terminal tools, see stdout and stderr, preserve logs, and debug failures without special host configuration.
MCP is optional. Use MCP when the environment already has Autocontext MCP configured and the task benefits from typed schemas, constrained invocation, or tool discovery. Do not require MCP just to wrap a command that the CLI already exposes cleanly.
Use a native Hermes runtime or OpenAI-compatible gateway when Autocontext is calling Hermes as an agent provider. Use a Hermes plugin emitter only when the user specifically needs high-fidelity live traces beyond read-only import of existing Hermes artifacts.
From a checkout of Autocontext:
cd autocontext
uv run autoctx --help
Inspect Hermes skill and curator state without modifying Hermes:
uv run autoctx hermes inspect --json
For a custom profile or test fixture:
uv run autoctx hermes inspect --home "$HERMES_HOME" --json
Install or refresh this skill into a Hermes profile:
uv run autoctx hermes export-skill --output ~/.hermes/skills/autocontext/SKILL.md --json
If the file already exists and the user wants to replace it:
uv run autoctx hermes export-skill --output ~/.hermes/skills/autocontext/SKILL.md --force --json
Use --json whenever Hermes needs to parse the result.
RUN_ID="hermes_$(date +%s)"
uv run autoctx run --scenario grid_ctf --gens 3 --run-id "$RUN_ID" --json
uv run autoctx status "$RUN_ID" --json
uv run autoctx replay "$RUN_ID" --generation 1
For a plain-language task:
uv run autoctx solve --description "Improve the support-triage response policy." --gens 3 --json
For one-shot judgment or improvement:
uv run autoctx judge --task-prompt "..." --output "..." --rubric "..." --json
uv run autoctx improve --task-prompt "..." --rubric "..." --rounds 3 --json
When Autocontext should call a Hermes-served model through an OpenAI-compatible gateway:
export AUTOCONTEXT_AGENT_PROVIDER=openai-compatible
export AUTOCONTEXT_AGENT_BASE_URL=http://localhost:8080/v1
export AUTOCONTEXT_AGENT_API_KEY=no-key
export AUTOCONTEXT_AGENT_DEFAULT_MODEL=hermes-3-llama-3.1-8b
uv run autoctx solve --description "..." --gens 3 --json
Keep provider configuration outside the skill when possible. The user or profile should own secrets, base URLs, and model names.
Hermes v0.12 writes Curator reports under ~/.hermes/logs/curator/<timestamp>/run.json and REPORT.md. It tracks skill usage in ~/.hermes/skills/.usage.json, and protects bundled or hub-installed skills through .bundled_manifest and .hub/lock.json.
Use:
uv run autoctx hermes inspect --json
Read the output as an inventory:
agent_created_skill_count means Curator-eligible user or agent skills.bundled_skill_count and hub_skill_count are upstream-owned skills and should not be pruned by Autocontext.pinned_skill_count identifies skills Curator and agents should not modify.curator.latest.counts summarizes the latest consolidation, pruning, and archive activity.Autocontext can use these signals for reports, datasets, and recommendations. Hermes Curator remains the writer for Hermes skill lifecycle changes.
Curator decision reports are decision metadata and safe to import without redaction. Session and trajectory imports are different: they contain raw model prompts and responses, which may include secrets, tokens, or content the operator did not intend for external storage.
Before recommending or running autoctx hermes ingest-sessions or autoctx hermes ingest-trajectories, explain the privacy tradeoff: the importer is read-only against ~/.hermes, but the output JSONL contains the same content unless redaction is applied. Default is --redact standard (Anthropic/OpenAI keys, bearer tokens, emails, IPs, env values, paths, high-risk file refs). --redact strict adds user-defined regexes. --redact off writes raw content and the importer surfaces an explicit opt-in marker. Sessions in particular live in a SQLite store: an unwarranted ingest creates a new copy of every prompt and response. Prefer --dry-run first when the operator is unsure of the blast radius.
For Autocontext-owned runs, export training data and train locally:
uv run autoctx export-training-data --scenario grid_ctf --all-runs --output training/grid_ctf.jsonl
uv run autoctx train --scenario grid_ctf --data training/grid_ctf.jsonl --backend mlx --time-budget 300 --json
uv run autoctx train --scenario grid_ctf --data training/grid_ctf.jsonl --backend cuda --time-budget 300 --json
Use MLX on Apple Silicon hosts. Use CUDA on Linux GPU hosts with a CUDA-enabled PyTorch install. Do not run host-GPU training inside a sandbox unless the user has already provided a host bridge or direct GPU access.
For Hermes Curator artifacts, start with read-only inspection and dataset design before training. Curator reports are decision traces; they are best suited for advisor/ranker/classifier training, not full autonomous skill mutation.
MCP is optional. If the user has already configured Autocontext MCP, prefer it for structured tool calls that are easier or safer than shell commands. Otherwise, stay with the CLI.
Check the local integration guide before inventing tool names:
uv run autoctx mcp-serve --help
Use MCP only when it adds value beyond the CLI: stable schemas, lower parsing burden, managed tool discovery, or a host policy that disallows shell access.
~/.hermes/skills after inspection. autoctx hermes inspect is read-only; keep it that way during analysis.--json when Hermes needs to parse command output.autoctx hermes inspect --json before making claims about local Hermes skill state.Progressive-disclosure docs available alongside this skill. Load only when relevant.
references/hermes-curator.md — How Hermes Curator and autocontext cooperate; who owns what; the read-only-first rule.references/cli-workflows.md — Exact autoctx commands for inventory, curator ingest, dataset export, judging, replay.references/mcp-workflows.md — MCP server setup, CLI-to-MCP tool name mapping, when to prefer MCP over CLI.references/local-training.md — How autocontext-exported datasets feed local MLX/CUDA advisor training; what the advisor predicts; expected scope.Operators can write all references next to this skill via autoctx hermes export-skill --with-references --output <dir>/SKILL.md.
Iterative strategy generation and evaluation system. Use when the user wants to evaluate agent output quality, run improvement loops, queue tasks for background evaluation, check run status, inspect runtime artifacts and session branch lineage, or discover available scenarios. Provides LLM-based judging with rubric-driven scoring.
Operational knowledge for the grid_ctf scenario including strategy playbook, lessons learned, and resource references. Use when generating, evaluating, coaching, or debugging grid_ctf strategies.