with one click
server
// Load ops context before any server work in this workspace. MCP-only — all access goes through the ops MCP server on onyx, never direct SSH.
// Load ops context before any server work in this workspace. MCP-only — all access goes through the ops MCP server on onyx, never direct SSH.
| name | server |
| description | Load ops context before any server work in this workspace. MCP-only — all access goes through the ops MCP server on onyx, never direct SSH. |
This workspace talks to servers exclusively through the ops MCP on onyx. Direct SSH and Bash are blocked by .claude/hooks/block-ssh.py. All mcp__ops__* tools are allowlisted — no per-call prompts.
When this skill is loaded, immediately read all three files in parallel. No exceptions, no "on demand":
~/Documents/Vaults/Server/Server/AI-Remediation-Agent.md — what the remediation agent can and cannot do~/Documents/Vaults/Server/Server/Open-Tasks.md — open task backlog; what is already done vs pending~/Documents/Vaults/Server/Reference/Audit-Framework.md — the five audit layers and where checks belongWhy: Presenting already-completed work as a gap (e.g. "install Livepatch" when it was already active) wastes the user's time and erodes trust. These files tell you what is already in place. Read them first.
These are distilled from repeated failures. Violating them causes the exact thrashing the user keeps calling out.
Service inventory must stay current:
server-config/edumusik-1/srv/<stack>/, a deploy-map.yaml entry, a services.yaml entry, and a row in services.md.list_containers(host="main") compose_dir values against server-config/edumusik-1/srv/ contents. Any mismatch = fix it now.Never assume local docs are current. services.md is manually maintained. If the live server shows a container whose stack isn't in server-config, that's a gap — fix it immediately, don't work around it.
Read your feedback before acting. The memory index (MEMORY.md) lists rules learned from past failures. The most critical ones are embedded here. If you find yourself doing something that matches a past incident pattern, stop and re-read the relevant feedback file.
SSH edumusik-admin — break-glass only. Direct SSH is blocked by hook
(.claude/hooks/block-ssh.py) and is NOT the
normal path for any operational work. Route everything through mcp__ops__*
tools (read_file, tail_logs, run_hardening, compose_up, …). The block-ssh
kill switch (~/.ops-mcp/block-ssh-off) and the ssh -t edumusik-admin "sudo <cmd>" form exist only for genuine break-glass recovery (MCP itself
broken, no other path). Do not propose ssh edumusik-admin … as a routine
diagnostic — extend the relevant MCP tool or YAML allowlist instead.
The kill switches (~/.ops-mcp/block-ssh-off, ~/.ops-mcp/budget-off)
disable their hooks fully and have no TTL — remove them as soon as the
blocking task ends. A stale switch is the failure mode.
Sensitive hardening tasks run a reachability preflight. Tasks in
server/tasks.yaml flagged requires_reachability_check: true (cis-sshd,
cis-sysctl, cis-sudo-log, cis-timesyncd, onyx-docker-setup) probe the
host with true over SSH before any step. A failed probe aborts the task
with reachability_preflight_failed before any mutation. This catches
pre-existing reachability loss; it does not prevent a step from breaking SSH
post-state. mariadb_exec is mutating and policy-gated like write_file.
Before touching any CVE, kernel module, sysctl, or security advisory:
memory/reference_monitoring_scripts.md — inventory of every automated monitoring/security script on onyx. The automated system may already handle it.KNOWN_MITIGATIONS in server-config/onyx/usr/local/bin/usn-kernel-cve-check.py — if the CVE is already there, verify it ran rather than re-deploying manually.cis-modules in server/tasks.yaml — module blacklists live there.Do not manually apply a kernel/module/sysctl fix until you have confirmed automated coverage is absent. Presenting manual remediation theater when the automated system already handles it wastes the user's time.
When the prompt contains a Telegram alert with Source: run-cron and a FAIL/TIMEOUT:
<alert-name> in tests/monitoring-coverage.yaml (local file) → get the log: path.read_file(host, log_path) — read that log and find items with "status": "fail".Do NOT guess paths. Do NOT call CronList. Do NOT read other files unless a failure entry's fix: field points there.
Do this before the first operational mcp__ops__* action, not automatically
for local-only documentation or test work:
lookup_runbook("<intended action or symptom>") — required before any operational action and enforced server-side.fleet_status()server_status(host)tail_logs(host, container) or list_containers(host)read_file(host, path)read_doc("ops-map")Only load these when the task calls for them:
read_doc("rules") — MCP-served behavioural rules.read_doc("guard-rules") — destructive-command patterns..claude/skills/server/services.md — service map: URL,
compose path, MCP client, UI fallback. Resolve this path relative to this
skill directory. Use it when answering "how do I talk to service X" or when
the knowledge-gap rule below triggers.No local memory file reads by default. MCP-served docs are synced from onyx and are canonical for operational state, but they are still loaded on demand.
Source of truth: 1Password vault Edumusik. Always add/update there first.
Working copy (canonical, lives in this project): ~/.claude/projects/-Users-stephan-Documents-ops-agent/memory/reference_all_api_keys.md — every Cloudflare, Hetzner, Mailgun, Stripe, Gamma, DB, SSH, and third-party credential. Rebuilt from 1Password 2026-04-18.
Do NOT store credentials in the edumusik-net (course) workspace, or in /opt/ops-mcp/.env — that file only holds the ops-MCP's own runtime creds (Hetzner, Cloudflare, Anthropic) and the canonical list here is the source clients should read.
These live outside the ops-agent memory dir and are not auto-loaded. Read them when the task calls for them — do not copy them here (source-of-truth drift).
Behavioural / rules:
~/.claude/projects/-Users-stephan/memory/feedback_consolidated.md — global behavioural rules (pipeline, safety, WP ops, workflow). Ops-agent memory has its own feedback files that layer on top of these.Infrastructure reference:
~/.claude/projects/-Users-stephan/memory/reference_operations_map.md — containers, ports, compose paths, cron, restart cmds. Stale by design — always verify live via MCP. read_doc("ops-map") is the MCP-served equivalent and is preferred when available.~/.claude/projects/-Users-stephan/memory/reference_cloudflare_zones.md — DNS zone IDs, WAF rules, SSL state. Load before any cloudflare_dns(...) work.~/.claude/projects/-Users-stephan/memory/reference_alert_inventory.md — alert functions, HC pings, credential sources. Load before touching alerting.Narrative server docs — Server vault (~/Documents/Vaults/Server/, MASTER):
Server/Open-Tasks.md — open server/infra task backlog (update at end of session)Server/Decisions/Tiered-Access-Architecture.md — claude-ops restricted SSH + server-ops-gate modesServer/WP-Panel.md — wp-panel service paths, deployment, KSM quirksServer/Security-Tab-Coverage.md — InSpec + security-audit.sh + Wazuh SCA matrixServer/AI-Remediation-Agent.md — agent design notesReference/Audit-Framework.md, Reference/Backup-Architecture.md, Reference/Wazuh-Self-Monitoring.mdReference/Sites/{KSM,Edumusik-Net,Edumusik-Blog-Production,Other-Sites}-Plugins.mdProduct / strategy context:
~/.claude/projects/-Users-stephan/memory/project_edumusik.md — LearnDash toolkit, strategic docs.Out of scope for ops-agent (ignore unless the task explicitly requires them): course catalogue/backlog, Bricks docs, Figma MCP/API rules, YubiKey setup, weekly cron heartbeats.
| Need | Tool |
|---|---|
| Fleet overview in one call | fleet_status() |
| One host snapshot | server_status(host) |
| All containers on a host | list_containers(host) |
| Log digest (level counts + samples) | tail_logs(host, container) |
| Read a safe-path file on a host | read_file(host, path) |
| Topology + restart allowlist | describe_server(host) |
| Live Hetzner firewall | hetzner_firewall(server) |
| Live Cloudflare DNS | cloudflare_dns(zone) |
| Runbook lookup | lookup_runbook(problem_description) |
| AI cost breakdown | ai_cost_summary() |
~/.ops-mcp/state.db)| Tool | Structural safety |
|---|---|
safe_restart(host, container) | Per-host allowlist in /opt/ops-mcp/hosts.yaml |
compose_up(host, stack) | Rejects traefik / shared-infra / monitoring |
systemctl_restart(host, unit) | Excludes docker / nftables / ssh |
wp_cli(host, container, cmd, write=True) | Blocks shuffle-salts, plugin install/delete, user delete, db drop |
This skill is a reference. It does NOT auto-call MCP tools or pre-load docs.
For local-only work, no fleet calls are needed. For operational work, do the
minimal bootstrap above before the first mcp__ops__* call.
Routine policy/allowlist/inventory changes are YAML + git_sync, no restart:
| Concern | File |
|---|---|
| Service inventory | server/services.yaml |
Hardening tasks (run_hardening) | server/tasks.yaml |
| Policy categories | server/policy.yaml |
write_file allowlist | server/write-allowlist.yaml |
wp_cli verbs | server/wp-verbs.yaml |
| Deploy file set | server/sync-manifest.txt |
| server-config deploy paths | server-config/deploy-map.yaml |
service_version_info() reports python_drift (restart needed) vs
config_drift (next call picks it up) separately.
rg and heading indexes before loading large local
docs; load only the section needed for the current task.ssh edumusik-admin "..." or run deploy.py, that guidance is wrong here. Route it through mcp__ops__* tools instead.read_file/write_file rejects a legitimate path, extend _READ_FILE_PREFIXES / _WRITE_FILE_PREFIXES in server/files.py — not via SSH bypass, Playwright, or copying into an allowlisted prefix. Pair any new prefix addition with explicit blocks for sensitive subpaths (e.g. adding /etc/ requires blocking /etc/shadow, /etc/letsencrypt/keys|archive/, /etc/ssl/private/, /etc/wazuh/telegram.env). The fix requires a Claude Code restart to take effect — plan accordingly.[HINT] Download the complete skill directory including SKILL.md and all related files