تشغيل أي مهارة في Manus بنقرة واحدة

hops

النجوم١٣

التفرعات٠

آخر تحديث١٧ مايو ٢٠٢٦ في ١٦:٣٨

Use when adding, modifying, debugging, refactoring, or reviewing `hops` CLI commands and domain modules in `scripts/hops/` and `scripts/hops.sh`; creating new subcommands, click groups, or output formatters; changing core helpers (`core/runner.py`, `core/format.py`, `core/nodes.py`, `core/workload.py`, `core/time.py`, `core/resolve.py`, `core/helm.py`); extending cluster introspection coverage (node, storage, app, flux, query, debug, dns, backup, validate). Triggers on phrases like "add a hops command", "fix hops output", "new hops domain", "extend hops", the `hops` escape hatch in AGENTS.md, or any edit to files under `scripts/hops/`. Do NOT use for simply running existing `hops` commands during diagnosis (no skill needed) or for non-cluster/app-specific scripts (e.g., `hass.sh`).

التثبيت

التثبيت باستخدام Codex أو Claude انسخ هذا Prompt والصقه في Codex أو Claude أو مساعد آخر ليراجع صفحة Skill ويثبّتها لك.

تشغيل في Manus

المصدر

rcdailey

rcdailey/home-ops

فتح مستودع GitHub عرض مستودعات المنشئ

تنزيل

تشغيل في Manus

المهن ذات الصلةSOC

استنادا إلى تصنيف SOC المهني

مطوّرو البرمجياتمهن الحاسوب والرياضيات·SOC 15-1252

SKILL.md

readonly

المزيد من هذا المستودع

نفس المستودع

paperless-classify

rcdailey/home-ops

Use when classifying, triaging, or organizing Paperless documents; reviewing the paperless inbox; assigning document metadata; discussing taxonomy conventions; or running `./scripts/paperless.sh classify`. Triggers on phrases like "classify documents", "triage paperless", "check the inbox", "what needs classification", "paperless taxonomy". Do NOT use for uploading documents, managing users/groups/workflows, or general paperless CRUD.

2026-05-1813

home-assistant

rcdailey/home-ops

Use when querying or mutating Home Assistant via `./scripts/hass.sh` (entity states, attributes, templates, history, logbook, areas, energy dashboard, Lovelace dashboards, automations, scripts, repairs, registry entries); authoring, editing, or debugging HA automation/script YAML or Jinja templates; inspecting entities, devices, integrations, or areas on the HA instance at `home.${SECRET_DOMAIN}`; firing events or calling services. Triggers on phrases like "check HA", "Home Assistant entity", "trigger this automation", "what's the state of sensor.X", "run the script", "HA template", or any edit under `scripts/hass/`. Do NOT use for unrelated smart-home/IoT platforms.

2026-05-1713

dns-debug

rcdailey/home-ops

Use when diagnosing DNS resolution failures, investigating blocked domains, debugging website or app connectivity problems reported by users, querying Blocky DNS query logs, or editing Blocky allowlist/denylist configuration in `kubernetes/apps/dns-private/blocky/`. Covers `./scripts/hops.sh dns` subcommands (`search`, `logs`, `blocked`, `test`) and the `log_entries` table in the Blocky CNPG cluster. Triggers on phrases like "site X is broken", "DNS isn't working", "why is this domain blocked", "check Blocky logs", "allowlist this domain", or any edit to `blocky/data/config.yaml`. Do NOT use for authoritative/external-dns troubleshooting (different providers, different tooling).

2026-05-1713

outline-cli

rcdailey/home-ops

Use when searching, reading, creating, updating, moving, archiving, or deleting Outline wiki documents and collections via the `ol` CLI (`@doist/outline-cli`, installed via mise); managing the household knowledge base at `wiki.${SECRET_DOMAIN}`; scripting bulk document operations with `--json`/`--ndjson`; authenticating or updating the CLI. Triggers on phrases like "search the wiki", "create an Outline doc", "update the Outline page", "list collections", "move this doc", or any invocation of the `ol` command. Do NOT use for content that belongs in the repo `docs/` directory (ADRs, investigations, runbooks, AGENTS.md directives) per the Outline vs docs/ boundary.

2026-05-1713

name

hops

description

hops CLI Development

hops is an LLM-optimized cluster operations CLI at scripts/hops/. It wraps kubectl, talosctl, flux, ceph, and other cluster tools into domain-oriented commands that produce compact, pre-filtered output designed for LLM context windows.

Inclusion Litmus Test

A command belongs in hops only if it relates to cluster infrastructure: kubectl, talosctl, helm, flux, ceph, Prometheus/VictoriaMetrics, VictoriaLogs, Blocky DNS, or similar infrastructure tooling.

Commands that fail the test (stay standalone):

App-specific utilities that happen to use kubectl exec (Home Assistant API, Recyclarr config)
Pure utility scripts (icon search, YAML annotation, git hooks)
Dev tooling (Vector testing, pre-commit hooks)

Ceph passes (storage infrastructure). Blocky DNS passes (cluster DNS infrastructure). hass fails (application-level automation).

Architecture

Run ./scripts/hops.sh <domain> --help for command details. Do not maintain a parallel command list in documentation; the CLI is authoritative.

scripts/hops.sh          Shell wrapper (invokes uv run)
scripts/hops/
  hops/                  Package root (nested layout)
    cli.py               Root group with auto-discovery
  core/                  Shared helpers (domains import from here, never from each other)
    runner.py            Subprocess runner (JSON, JSONL, kubectl, tools_curl, ceph_json)
    format.py            Tables, key-value, truncation, age_str, human_bytes
    time.py              TimeRange dataclass, time_options decorator
    nodes.py             Node name/IP resolution (cached per session)
    workload.py          Workload resolution (cascading match strategies)
    resolve.py           Unified resolver registry (Workload, Gateway, Pod)
    helm.py              Helm chart resolution and YAML value helpers
  app/                   Package domain (auto-discovered via __init__.py cli attribute)
  flux/                  Package domain
  dns/                   Package domain
  query/                 Package domain (nested subgroups for logs)
  node.py, storage.py, debug.py, db.py, backup.py, validate.py  Flat domains

Domains are either flat files (node.py) or package directories (app/). Both expose a cli click Group. Package domains split commands across submodules that register on the shared group.

Auto-Discovery

cli.py scans the package for modules and packages exposing a cli attribute (a click Group or Command). Modules starting with _ are skipped. Adding a new domain means creating a .py file or a package directory with a cli click group in its __init__.py; no registration needed. For package-based domains (app/, flux/, dns/), the __init__.py defines the click group and imports submodules that register commands on it.

Module Structure

File Size

All files MUST stay under 400 lines. When a module approaches the limit, split into a package directory or extract logic into a sibling module.

Splitting signals: a file has grown past 300 lines, a single function exceeds 80 lines, or two functions in the same file share no imports or data flow. Flat modules (single .py file) that cross 400 lines MUST be converted to a package directory.

Core Layer (`core/`)

Shared logic lives in core/. Domain modules import from core.*, never from each other. Before writing a new utility function, check whether one already exists:

Concern	Module	Examples
Formatting, display	`core.format`	`table`, `kv`, `age_str`, `format_timestamp`, `human_bytes`, `truncate`
Subprocess, HTTP	`core.runner`	`run`, `run_json`, `kubectl_json`, `tools_curl`, `ceph_json`
Time ranges	`core.time`	`TimeRange`, `time_options`
Workload resolution	`core.workload`	`resolve_app`, `resolve_pods`, `find_running_pod`, `pick_pod_for_logs`
Unified resolution	`core.resolve`	`resolve`, `ResolvedTarget`, `TargetKind`, resolver registry
Node resolution	`core.nodes`	`get_all`, `resolve_ip`, `resolve_name`
Gateway introspection	`app.gateway`	`find_httproute`, `fetch_gateway`, `fetch_envoy_proxy`
Helm chart resolution	`core.helm`	`resolve_hr`, `helm_chart_args`
VM API	`query._vm`	`query_vm`, `query_vmalert`, `is_ignored_alert`
VL API	`query._client`	`VictoriaLogsClient`

MUST NOT duplicate logic that already exists in a helper. Common violations to watch for:

Timestamp-to-age conversion: use core.format.age_str, not a local _age_str
Bytes-to-human formatting: use core.format.human_bytes, not a local format_memory
In-cluster HTTP: use core.runner.tools_curl, not hand-rolled kubectl exec ... curl commands
Workload-to-pod resolution: use core.workload.resolve_pods, not local pod-fetching logic
Target resolution: use core.resolve.resolve() for the unified resolver, not ad-hoc fallback chains

Unified Resolver (`core/resolve.py`)

The resolver eliminates ad-hoc fallback chains for target resolution. It tries three resolvers in priority order (Workload, Gateway, Pod) and returns a ResolvedTarget dataclass with the matched kind, name, namespace, optional workload, and pods.

To add a new resource category (e.g., a new operator-managed resource), create a class implementing the Resolver protocol with a try_resolve method and append it to _REGISTRY.

The --explain flag on app diagnose prints the resolver trace, showing which resolvers were tried and what matched.

In-Cluster HTTP

All HTTP requests to in-cluster services (VictoriaMetrics, VictoriaLogs, future services) MUST use core.runner.tools_curl. This function executes curl via the rook-ceph-tools pod with standardized connection timeout, error classification (unreachable vs. failed), and one-line error output. Callers parse the returned string into JSON or use it raw.

Do not construct kubectl exec curl commands directly in domain modules.

Domain Module Boundaries

One click group per domain package or module. Package domains (app/, flux/, dns/) split commands across submodules that all register on the shared cli group from __init__.py.
Click decorators and argument wiring stay in command modules. Implementation bodies exceeding ~30 lines SHOULD move to a sibling module (see app.pod_detail for pod diagnostics).
When multiple click commands share the same resolve-then-exec pattern, extract a shared helper (see app.commands._exec_in_pod for ls/cat/du).
Commands that share 80%+ of their body MUST extract a shared implementation parameterized by the difference (see dns.render.query_dns_logs for logs/blocked).

Data Fetching Discipline

MUST NOT fetch the same Kubernetes resource twice in one command. Store the result and reuse it.
When a command correlates multiple resources (e.g., HTTPRoute + Gateway + policies), fetch each once and pass the data dict to helper functions rather than re-fetching inside helpers.
Tab-delimited or structured CLI output (psql, talosctl) SHOULD use a generic parser parameterized by field names, not per-query parser functions.

SQL Safety

The dns/psql.py module constructs SQL from user input for psql queries. All user-provided values (client names, domain patterns, timestamps) MUST be escaped via sql_escape before interpolation into SQL strings.

Alias and Redirect Commands

Do not create commands that just delegate to another command without adding value. If hops storage disks would do exactly what hops node disks does, the command should not exist. Every command MUST add correlation, heuristics, or context beyond what the target already provides.

Design Principles

Workflow, Not Passthrough

hops commands MUST embody an investigative workflow; they MUST NOT be thin reformatters around a single upstream command. The entire reason hops exists is that raw CLIs force multi-step drill-downs that waste LLM context. A command that just prettifies one kubectl get call has no reason to live here; use kubectl directly.

A command earns its place when it does at least one of:

Correlates multiple data sources in a single call (e.g., pod state + containers + events + previous logs for crash diagnosis)
Applies heuristics so the caller does not have to drill down (e.g., auto-fetch --previous logs when restartCount > 0; pick most recent Succeeded pod when no Running pod exists)
Resolves inputs flexibly across failure modes (e.g., workload name, app label, pod name prefix, orphan pods whose parent workload was deleted)
Hides transient/edge-case noise that the caller does not need to handle (e.g., terminated pods, missing parent controllers, empty event lists) unless the situation is genuinely blocking

Design check when adding or editing a command: what sequence of kubectl/talosctl/flux/helm invocations would an investigator run to answer this question end-to-end? Fold that sequence into the single hops command, in the order the investigator needs it. If the answer is "one kubectl invocation," either the command adds no value or the workflow has not been fully identified yet.

Anti-patterns to reject in code review:

A command whose body is effectively kubectl get X -o json | reformat
Returning early on edge cases the caller could handle transparently (terminated pods, deleted workloads, missing optional fields)
Requiring a follow-up command to fetch context that every caller of the primary command needs
Resolver that exits with not found when a less strict match (pod name, label selector, fuzzy suffix) would have succeeded

Reference implementation: hops app pod (implemented in app/pod_detail.py). One call resolves the target (workload or orphan pod), emits pod summary, container state machine, previous-termination table, auto-fetched --previous logs for each restarted container, and pod-scoped events. That is the bar.

Stewardship / Audit Obligation

Loading this skill carries an obligation beyond the immediate task: passthrough drift in scripts/hops/ MUST be audited and fixed even when unrelated to the reason the skill was loaded. The no-passthrough rule is a stewardship discipline, not a code-review checkbox; without active maintenance it decays into an aspirational comment while the codebase fills with thin wrappers.

When this skill is loaded for any reason, MUST run this audit pass in the current session:

MUST scan the domain module being edited plus one adjacent module in scripts/hops/ for the signals in the AGENTS.md hops stewardship obligation (thin wrappers, missing correlation, strict resolvers rejecting reasonable edge cases, repeated call patterns that should be one command).
MUST fix bounded drift inline (single function, single module, clear intent).
Drift requiring broader refactor MUST be surfaced explicitly in the session response so the user can prioritize. MUST NOT silently widen scope. MUST NOT defer with a "TODO for later" that will never be noticed.
MUST cap audit-driven refactors at two per session. The goal is steady erosion of drift, not a single-session rewrite that destabilizes the tool.

A passthrough caught during an unrelated session is more valuable than a perfect refactor delayed until someone happens to notice. Err toward acting.

The following rationalizations for skipping the audit MUST be rejected:

"The user did not ask for it." The AGENTS.md directive does; that is the mandate.
"It is out of scope for this change." Stewardship is cross-cutting by design.
"It is only a small passthrough; it is fine." Small passthroughs are how the rule dies.
"I will open a follow-up." Follow-ups without user action die in the backlog.

Read-Only by Design

hops never mutates cluster state. No kubectl apply, no helm upgrade. Two controlled exceptions exist:

Ephemeral debug pods (hops debug): creates a pod, captures output, deletes in try/finally
Flux suspend/resume (hops flux suspend/resume): reversible state toggle for maintenance (storage migrations, immutable field changes). Finds Kustomization + HelmRelease namespaces automatically and handles both in one call.

Output Standards

All output is plain text optimized for LLM token efficiency:

Fixed-width tables with space-aligned columns, no borders, no decorators
Short header abbreviations (cp not control-plane)
Key-value format for single-resource summaries
One line per entity in tables
Omit healthy/normal items when showing problems
No ANSI color codes, no unicode symbols, no emoji
Truncate long messages (120 chars default)

Click Conventions

All Groups default to no_args_is_help=True (shows help without subcommand)
Use @click.group() for domain modules, @cli.command() for leaf commands
Common patterns: -n/--namespace, --json for raw output, --limit
Time options via the time_options() decorator factory in core.time

Dependencies

Only external dependency: click (in pyproject.toml; auto-installed by uv)
Shell out to cluster tools; parse their -o json output in Python
core.runner handles subprocess execution, JSON/JSONL parsing, error handling, tools_curl (in-cluster HTTP via rook-ceph-tools pod)
core.nodes caches node name/IP mapping per process
core.workload provides cascading workload resolution
core.resolve provides the unified resolver registry

Error Handling

One-line error messages to stderr, then sys.exit(1)
No stack traces in normal operation
Failed subprocess: show first line of stderr
Missing tool: "error: <tool> not found in PATH"

Adding a New Command

Identify the investigative workflow: enumerate the sequence of raw CLI calls a human or LLM would run to fully answer the question. If the list has one item, reconsider whether the command belongs in hops.
Decide which domain module the command belongs to (or create a new one).
Check the core layer (see Module Structure) for existing utilities before writing new ones. Common needs: core.workload.resolve_app for app resolution, core.runner.tools_curl for in-cluster HTTP, core.format.age_str for timestamp display, core.time.TimeRange for time range options, core.resolve.resolve for unified target resolution.
Add a click command function with appropriate arguments and options.
Use core.runner.run_json() or core.runner.kubectl_json() for data fetching.
Fold the full workflow (correlation, heuristics, flexible resolution, auto-fetch of downstream context) into the single command; see "Workflow, Not Passthrough" above.
If the implementation exceeds ~30 lines, extract it to a sibling module. If a flat domain module would exceed 400 lines, convert to a package directory.
Use core.format.table() or core.format.kv() for output.
Test against live cluster: ./scripts/hops.sh <domain> <command> <args>
Measure token count: ./scripts/hops.sh <command> 2>&1 | ttok

Testing

The test suite at scripts/hops/tests/ is the primary validation mechanism. All changes to hops MUST maintain, update, or extend the test suite. Run tests via:

# All tests (unit + integration against live cluster)
uv run --project scripts/hops pytest scripts/hops/tests/ -v

# Unit tests only (no cluster access needed)
uv run --project scripts/hops pytest scripts/hops/tests/ -v -m "not integration"

Test Categories

Unit tests (test_format.py, test_time.py, test_dns_psql.py): Pure function tests for core.format, core.time, and dns.psql. No cluster access. These test formatting, parsing, and escaping logic that has caused regressions.

Resolver integration tests (test_resolver.py): Call core.resolve.resolve() directly against the live cluster. Cover every resource category: Deployment, StatefulSet, DaemonSet, CronJob, CNPG pods, subchart naming. These are the tests that would have caught the 4 resolver fix commits.

Command integration tests (test_commands.py): Invoke ./scripts/hops.sh via subprocess and assert on exit codes and output structure (header presence, section markers). Cover all domains.

Test Obligations

When modifying hops code, MUST:

Run the full test suite before considering the change complete
Fix any test failures caused by the change (update assertions, not delete tests)
Add tests for new commands (at minimum: exit 0 for valid input, non-zero for invalid)
Add tests for new resolver strategies or resource categories
Add unit tests for new pure functions in core/
Update test fixtures when cluster apps are added or removed

When a test fails due to a removed or renamed cluster app, update the fixture data in the test to reference a current app. Do not delete the test.

Test Patterns

Integration tests are marked @pytest.mark.integration
Use conftest.run_hops() for subprocess invocations
Assert on output structure (headers, section markers), not exact values
Assert on app names and namespaces (stable); skip pod suffixes and timestamps (volatile)
Test both success and failure paths (valid app, nonexistent app)

Verification

# Verify a specific command works
./scripts/hops.sh <domain> <command>

# Compare token usage vs raw equivalent
./scripts/hops.sh node list 2>&1 | ttok
kubectl get nodes -o wide 2>&1 | ttok

hops

المزيد من هذا المستودع

المزيد من هذا المستودع

hops CLI Development

Inclusion Litmus Test

Architecture

Auto-Discovery

Module Structure

File Size

Core Layer (core/)

Unified Resolver (core/resolve.py)

In-Cluster HTTP

Domain Module Boundaries

Data Fetching Discipline

SQL Safety

Alias and Redirect Commands

Design Principles

Workflow, Not Passthrough

Stewardship / Audit Obligation

Read-Only by Design

Output Standards

Click Conventions

Dependencies

Error Handling

Adding a New Command

Testing

Test Categories

Test Obligations

Test Patterns

Verification

hops CLI Development

Inclusion Litmus Test

Architecture

Auto-Discovery

Module Structure

File Size

Core Layer (core/)

Unified Resolver (core/resolve.py)

In-Cluster HTTP

Domain Module Boundaries

Data Fetching Discipline

SQL Safety

Alias and Redirect Commands

Design Principles

Workflow, Not Passthrough

Stewardship / Audit Obligation

Read-Only by Design

Output Standards

Click Conventions

Dependencies

Error Handling

Adding a New Command

Testing

Test Categories

Test Obligations

Test Patterns

Verification

Core Layer (`core/`)

Unified Resolver (`core/resolve.py`)

Core Layer (`core/`)

Unified Resolver (`core/resolve.py`)