mit einem Klick
refresh-subsystem
// Guide for safely modifying the refresh/streaming subsystem — covers the full domain lifecycle, registration points, and known fragility areas
// Guide for safely modifying the refresh/streaming subsystem — covers the full domain lifecycle, registration points, and known fragility areas
Add support for a Kubernetes resource type by choosing the required catalog, refresh, detail, object-map, permission, frontend, docs, and test surfaces
Work on Luxury Yacht object-panel details, YAML, actions, logs, shell/debug tabs, docked panels, related objects, and tests
Work on Luxury Yacht RBAC permission checks, capability descriptors, permission-denied diagnostics, object action availability, YAML/edit/delete/scale/restart gating, and capability tests
Work on Luxury Yacht canonical Kubernetes resource identity, status presentation, facts, ResourceLink relationships, DTO projection, table/detail/object-map parity, and shared resource model tests
Work on Luxury Yacht kubeconfig selection, multi-cluster client lifecycle, auth failure/recovery, selected/background clusters, cluster tabs, refresh subsystem rebuilds, and object catalog lifecycle
Work on logs, shell exec, debug containers, port-forward, node drain/maintenance, long-running operations, permissions, lifecycle, and cleanup tests
| name | refresh-subsystem |
| description | Guide for safely modifying the refresh/streaming subsystem — covers the full domain lifecycle, registration points, and known fragility areas |
| user-invocable | false |
This subsystem is fragile. Changes historically break things. Read this before touching any refresh, streaming, snapshot, or domain code.
The refresh subsystem manages how Kubernetes resource data flows from clusters to the UI. Each connected cluster gets its own independent subsystem (manager, registry, informers, permission checker). An aggregate layer multiplexes across clusters for the HTTP API.
Kubernetes API
↓ (informers / polling)
Per-Cluster Subsystem (manager, registry, informers, permissions)
↓ (snapshots, SSE, WebSocket)
Aggregate Mux (routes requests to correct cluster)
↓ (HTTP API on loopback)
Frontend RefreshManager + RefreshOrchestrator
↓ (per-cluster runtimes, stream managers, store writes)
React UI
The frontend has one global coordinator for app lifecycle concerns, with per-cluster runtimes underneath it. Each runtime owns enabled scopes, in-flight work, stream health, metrics freshness, and streaming cleanup for exactly one cluster. Refresh domains are single-cluster by contract; background cluster refresh fans out as separate per-cluster requests instead of using multi-cluster refresh scopes.
Global coordinator
↓
ClusterRefreshRuntime(cluster-a) ClusterRefreshRuntime(cluster-b)
↓ ↓
single-cluster snapshots/streams single-cluster snapshots/streams
↓ (callbacks, store updates)
React UI
Order matters. Don't rearrange.
Key files:
backend/app_refresh_setup.go — orchestrates steps 1-5backend/app_refresh_update.go — updates active per-cluster subsystems without restarting the HTTP serverbackend/app_refresh_subsystems.go — replaces aggregate subsystem state and shared handlersbackend/app_refresh_recovery.go — teardown, auth recovery, transport rebuildbackend/refresh/system/manager.go — per-cluster subsystem creationDomains are registered in a fixed order in backend/refresh/system/registrations.go. Order matters — some domains depend on others (e.g., cluster-crds before cluster-custom).
Three registration kinds:
| Kind | Permission Gate | Fallback |
|---|---|---|
direct | None — always registers | None |
list | Checks list permission for required resources | Skips if denied |
listWatch | Checks list + watch permissions | Can fall back to list-only |
Two-layer permission checking:
defaultPermissionChecks() in backend/refresh/snapshot/permission_checks.goTo register a new domain, add it to domainRegistrations() in registrations.go. Consider:
defaultPermissionChecks() in permission_checks.gobackend/refresh/informer/factory.goDomain metadata is authored in
backend/refresh/domain/refresh-domain-contract.json. It owns domain category,
frontend refresher name, timing, diagnostics stream, orchestrator kind, backend
registration kind, permission policy, and resource-stream participation.
Keep behavior explicit in backend registration functions and frontend stream
managers. frontend/src/core/refresh/domainRegistry.ts imports the contract
directly and derives metadata maps from it; the contract removes duplicate
metadata, not real behavior.
Every backend domain has a frontend counterpart:
| File | What to update |
|---|---|
frontend/src/core/refresh/types.ts | Add to RefreshDomain union + DomainPayloadMap |
frontend/src/core/refresh/refresherTypes.ts | Add refresher name + map view to refresher |
frontend/src/core/refresh/domainRegistrations.ts | Register the explicit orchestrator/stream wiring |
backend/refresh/domain/refresh-domain-contract.json | Add shared metadata consumed by backend tests and frontend registry |
These must stay synchronized through the contract tests. A backend domain without a frontend mapping breaks diagnostics. A frontend refresher without a backend domain gets empty snapshots.
Resource WebSocket domains also require:
| File | What to update |
|---|---|
frontend/src/core/refresh/streaming/resourceStreamDomains.ts | Scope kind, row collection, row identity, sort, drift keys, metric preservation |
frontend/src/core/refresh/streaming/resourceStreamRows.ts | Pure row replacement, deletion, stable reuse, and metrics-preserving merge logic |
frontend/src/core/refresh/streaming/resourceStreamConnection.ts | WebSocket connection lifecycle, queued sends, reconnect, pause/resume |
frontend/src/core/refresh/streaming/resourceStreamSubscriptions.ts | Single-cluster scope resolution, subscription state, unsubscribe debounce, resume tokens |
backend/refresh/resourcestream/domains.go | Supported streamed refresh domain list |
backend/refresh/resourcestream/stream_registration_*.go | Informer registration and lister/indexer setup |
backend/refresh/resourcestream/update_helpers_test.go and manager tests | Stream envelope metadata and row-shape parity |
Resource stream descriptors describe row behavior only. Domain descriptors must not reintroduce multi-cluster capability flags; cross-cluster UI should derive from separate per-cluster domain state above the refresh store.
ResourceStreamManager should remain responsible for refresh-store mutation,
snapshot resync, drift detection, health, telemetry, and fallback decisions.
Keep connection lifecycle in ResourceStreamConnection, subscription mechanics
in ResourceStreamSubscriptionStore, and pure row math in
resourceStreamRows.ts. Ready/resync/error store status transitions should use
one domain-id path; do not add copied branches per streamed domain. Terminal
stream error notification should use streamErrorNotifier.ts.
Resource stream row updates and deletes carry identity only through the
top-level ref (resourcemodel.ResourceRef). Legacy top-level identity fields
(uid, name, namespace, kind, apiGroup, apiVersion) have been
removed from the wire payload; clusterId / clusterName remain as envelope
routing metadata. Do not add new key logic that guesses GVK from kind/name.
COMPLETE is scope-level resync, not targeted row invalidation — any ref on
COMPLETE is diagnostic context only.
Stream selectors are typed (resourcestream.StreamSelector). Validate and
canonicalize transport scope strings at the WebSocket boundary via
ParseStreamSelector; the canonical selector string remains the subscription
key. Convert selectors to concrete ResourceRef values only when resolving a
specific affected row.
Snapshot vs stream row parity is enforced by
backend/refresh/snapshot/parity_test.go. When you add a streamed domain you
must add a parity case (or, for COMPLETE-only contracts like
namespace-helm, an explicit excluded entry in
TestSnapshotStreamRowParityCoversAllSupportedDomains). When you add a field
to a *Summary struct, add an assertion in either an existing
TestBuild*SummaryPopulatesAllFields test or the parity case so a missed
population fails CI rather than silently dropping the field on stream rows.
Per-domain stream metadata (scope kind, primary/related resources, metrics
dependency) is authored once in the resourceStream.domains block of
backend/refresh/domain/refresh-domain-contract.json. Backend
(TestResourceStreamDomainsMatchProjectionDescriptors) and frontend
(resource stream domain descriptors > matches the backend-authored projection contract) tests both lock that JSON to their respective descriptor tables.
Metric-bearing projectors accept the latest usage maps as parameters; they do
not reach into metrics.Provider themselves. Use
Manager.podMetricsSnapshot() / Manager.nodeMetricsSnapshot() at the call
site and pass the maps in, so per-row construction stays deterministic for
tests and parity comparisons.
File: backend/refresh/snapshot/service.go
:bypass to key to isolate from normal requestsFour stream types use the refresh HTTP server, with different transports:
| Stream | Transport | Backend | Frontend |
|---|---|---|---|
| Events | SSE (EventSource) | backend/refresh/eventstream/ | frontend/src/core/refresh/streaming/eventStreamManager.ts |
| Resources | WebSocket | backend/refresh/resourcestream/ | frontend/src/core/refresh/streaming/resourceStreamManager.ts |
| Catalog | SSE (EventSource) | backend/refresh/snapshot/catalog_stream.go | frontend/src/core/refresh/streaming/catalogStreamManager.ts |
| Container logs | SSE (EventSource) | backend/refresh/containerlogsstream/ | frontend/src/core/refresh/streaming/containerLogsStreamManager.ts |
Frontend SSE managers share frontend/src/core/refresh/streaming/sseStreamTransport.ts
for EventSource URL creation and listener cleanup. Reconnect delay calculation
lives in frontend/src/core/refresh/streaming/streamTiming.ts, and visibility
suspend/resume lives in
frontend/src/core/refresh/streaming/streamVisibilityController.ts. Stream
error notification and kubeconfig-change suppression live in
frontend/src/core/refresh/streaming/streamErrorNotifier.ts. The resource
WebSocket manager also uses the shared timing, visibility, and terminal-error
notification helpers. Keep event, catalog, log, and resource reducers separate
unless tests prove their state semantics are identical.
Event stream resume: Backend buffers recent events in a circular buffer per scope. On reconnect, frontend sends ?since=<sequence> to resume. If the buffer overflowed, resume returns empty and the client must re-snapshot. Resume is not guaranteed.
Resource stream resume: Resource WebSocket subscriptions are keyed by a single cluster, domain, and normalized scope. The frontend sends resume tokens per subscription; expired buffers trigger RESET and a snapshot resync. Multi-cluster resource stream scopes are rejected on both the frontend subscription path and backend stream mux path, matching the broader single-cluster refresh-domain contract.
Stream endpoints:
/api/v2/stream/events/api/v2/stream/resources/api/v2/stream/catalog/api/v2/stream/container-logsFile: frontend/src/core/refresh/RefreshManager.ts
Lifecycle per refresher: idle → refreshing → cooldown → idle
Key behaviors:
Promise.allSettled — one failure doesn't kill others, but the refresh is marked failedcooldown * 2^(errorCount-1), capped at 60sBackend resource stream registration is split by behavior:
| File | Purpose |
|---|---|
stream_registration_helpers.go | Permission checks and Add/Update/Delete event mapping |
stream_registration_direct.go | Direct object-to-stream handlers without manager listers/indexers |
stream_registration_network.go | Network and Gateway API handlers, including service/route/policy listers |
stream_registration_related.go | Pod/node/workload registrations that seed related-object lookup state |
domains.go | Supported resource stream domain list used for parity guardrails |
Keep permission checks before lazy informer creation. Do not replace these files with a large descriptor table if the behavior-specific split is clearer.
Ordinary object updates may use shared newObjectUpdate/newObjectRowUpdate
helpers, but keep pods, endpoint slices, workloads, custom resources,
node-derived updates, and Helm resync signals explicit.
Do not assign Update.Row in stream handlers; add or reuse projection helpers
so snapshot and stream rows are built by the same canonical constructor path.
Resource-stream permission resources live in
backend/refresh/resourcestream/permission_contract.go and are checked against
snapshot runtime permissions by
TestDomainPermissionContractsJoinExpectedRequirementSources.
Permission gate ordering — Preflight must run before domain registration. Domain registration order is fixed. Moving things around causes cascading failures where later domains can't find data from earlier ones.
Metrics polling — Can be disabled for two different reasons (permissions vs discovery) with different UI messages. Getting the disabled reason wrong makes diagnostics confusing.
Multi-cluster add/remove — Aggregate handlers must be updated via the update path, not just init. They route requests to per-cluster subsystems; they must not merge multiple clusters into one refresh-domain result.
Refresh scope ownership — Refresh domains must target exactly one cluster. Do not pass multi-cluster scopes to snapshot, manual refresh, or resource stream domains; fan out to per-cluster runtimes instead.
Stream reconnection — Event/resource buffer overflow means resume fails and the frontend must fall back to full re-snapshot. If this detection is wrong, the UI shows stale data with no indication.
Rapid context changes — Switching namespaces/clusters quickly can leave refreshers in undefined state. The abort→retrigger path has race conditions if context updates arrive faster than abort completes.
Informer shutdown — Shutdown() clears references but doesn't stop informers (context cancellation does that). If the context isn't cancelled before shutdown, informers leak.
namespaces and cluster-overview remain ordinary per-cluster domains, not aggregate-domain exceptions