| name | project-create-topology |
| description | Developer workflow for creating or updating Netdata topology producers and topology Function payloads using the production netdata.topology.v1 schema. Use when adding or migrating topology:network-connections, topology:streaming, topology:snmp, vSphere topology, correlation rules, graph presentation, drilldowns, direction semantics, telemetry overlays, or Cloud topology aggregation fixtures. |
| type | project |
Create Netdata Topologies
What This Skill Is
This is a developer skill for assistants working in this repository. It is not
an end-user/operator skill. Use it when changing topology producers, schema
fixtures, validation, topology developer documentation, or Cloud/frontend
handoff artifacts.
Required References
Read these before designing or changing topology payloads:
| File | Purpose |
|---|
src/plugins.d/FUNCTION_TOPOLOGY_SCHEMA.json | JSON Schema for production topology payloads |
src/plugins.d/FUNCTION_TOPOLOGY_DEVELOPER_GUIDE.md | Human-readable topology schema contract and producer guidance |
src/plugins.d/FUNCTION_TOPOLOGY_IMPLEMENTATION_SCOPE.md | Backend/frontend/aggregator migration scope |
.agents/sow/specs/topology-function-schema.md | Durable project spec for topology semantics |
.agents/sow/specs/topology-modes-correlation-aggregation.md | Mode, correlation, aggregation, and actor modal identification contract |
.agents/skills/project-writing-collectors/SKILL.md | Collector quality, Function, validation, and cardinality rules |
For transport-level Function behavior, also read:
src/plugins.d/FUNCTION_UI_REFERENCE.md
src/plugins.d/FUNCTION_UI_DEVELOPER_GUIDE.md
Developer How-Tos
The how-to catalog lives under how-tos/. These recipes are
developer-facing and must stay in this project skill, not under
docs/netdata-ai/skills/.
Core Rules
- Production payloads carry canonical topology facts for the aggregator and UI.
- Test-only projection code may reconstruct compatibility payload shapes to
prove parity.
- Never add compatibility reconstruction fields, old-schema adapter names, or
duplicated display strings to production payloads.
- Keep display composition in type-level and graph-level presentation metadata,
not in high-cardinality rows.
- Keep raw sensitive payload captures under
.local/ only.
Workflow
-
Define the topology purpose and scale target.
- Identify the graph users need: nodes, processes, containers, L2 devices,
vSphere inventory, streaming parents, or another domain.
- Estimate actor count, graph-link count, evidence-row count, and payload
size on realistic data.
-
Pick actors.
- Use stable identities.
- Keep display names separate from identity.
- Declare
identity, merge_identity, and parent_identity in actor types.
- Prepare aggregation scopes such as node, process name, PID, container,
Kubernetes workload, SNMP device/interface, or vSphere object.
-
Pick graph links.
- Graph links are renderable relationship groups.
- Keep graph links compact.
- Put one-to-many observation detail in evidence sections.
- Define direction semantics in link types.
- Use distinct semantic link types for ownership, local/resolved links,
correlation links, inferred links, and partial links when their meaning or
layout behavior differs.
-
Pick evidence rows.
- Evidence is the lossless relationship proof.
- For sockets, preserve the exact matching tuple.
- For SNMP/L2, preserve LLDP/CDP/FDB/ARP/STP facts according to role.
- For streaming, keep relationship facts separate from actor-owned path data.
- For vSphere, preserve inventory/relationship facts using stable object IDs.
-
Classify detail tables.
actor_detail: custom actor state, not generally aggregatable.
actor_inventory: actor-owned inventory data.
relationship_evidence: exact relationship rows.
relationship_summary: derived summaries.
- Use
json columns only for custom actor/detail cells that must preserve
nested producer-owned values; avoid them for high-cardinality evidence.
- Use a compact actor-owned
actor_labels table for modal labels:
actor, key, value, optional source, optional kind, and optional
value_index.
- Expose complete host/node labels when available.
- Expose useful non-node actor labels and metadata, while keeping identity,
correlation, grouping, sorting, filtering, and aggregation facts as typed
canonical columns.
-
Define telemetry overlays.
- Use overlay templates once per payload or type.
- Links and actors carry compact refs and parameters only.
- Do not put full metric query payloads on every row.
-
Define correlation semantics when actors can be resolved across payloads.
- Declare whether the topology needs loose-side resolution, actor
replacement, actor enrichment, or visible correlation actors.
- Do not hide correlation state as flags on real actors.
- Define
data.correlation.rules with declarative key templates,
priorities, class, absorb or link actions, point actor types when
visible correlation actors exist, optional claim actor types, correlation
link types, and output link types.
- Emit compact
data.correlation.points rows for visible correlation actors
when the input graph has them, and data.correlation.claims rows for real
actors that can satisfy keys.
- For high-cardinality exact observations, prefer loose relationship-side
facts plus declared materialization policy over creating one actor per
ephemeral endpoint.
- Use
absorb only for exact matches that should remove correlation actors
or loose-side placeholders from the aggregated output.
- Use
link for broader or partial matches that should keep the correlation
actor or materialized partial actor visible.
- Use
replace_actor semantics for weaker placeholder actors that should be
replaced by stronger managed actors.
- Use
merge_enrich_actor semantics when multiple payloads provide
complementary facts for the same actor identity.
- Keep NAT or alias information as additional point/claim rows, not as
mutation of the original observation.
-
Define graph presentation.
- Put actor presentation in
types.actor_types.<id>.presentation.
- Put link presentation in
types.link_types.<id>.presentation.
- Put graph port-bullet presentation in
types.port_types.<id>.presentation.
- Put legend, actor-click highlight behavior, port fields, and scale keys in
data.presentation.
- Use
__topology_mode for detailed vs aggregated topology requests when a
producer has a real mode difference. Do not expose a mode selector for
mode-invariant topologies.
- Use UI-owned color/icon/line/width/opacity/layout tokens only.
- Define
label_policy.columns with safe scalar display columns; never let
canonical identity arrays become actor names.
- Define
search.columns[] and/or search.label_keys[] for searchable
actors. Set search.enabled: false for helper actors that should not
appear in graph search. Do not rely on UI hardcoded details, match, or
attributes paths.
- Define
presentation.size.scale when an actor type needs fixed visual
emphasis, and presentation.layout.repulsion when an actor type needs
relative force-graph separation. Do not emit raw force numbers.
- Define
link_types.<id>.semantic_role when behavior depends on link
meaning, such as discovery, ownership, traffic, correlation, or
control. Do not make the UI infer this from link type names or protocol
strings.
- Keep
presentation.arrow authoritative for arrows. Omitted or auto
derives no arrows for undirected, observed_bidirectional, none, or
observation; derives forward for directed flow/dependency and
hierarchical ownership. Use explicit reverse or both when needed.
direction_role is required; never rely on orientation: directed alone
to infer arrows.
- Define
ports.sources[] whenever an actor type sets
ports.show_bullets: true.
- Use scalar display columns for
ports.sources[].name_column; do not use
refs, arrays, or JSON as graph bullet labels.
- Use numeric
ports.sources[].value_column when one compact row represents
multiple observations and the UI should size or count bullets by the sum.
- Use at most one variable visual channel per link type, keyed by
variable.scale_key and sourced from one raw numeric value_column.
- Use
presentation.layout.strength tokens weakest, weaker, normal,
stronger, strongest, and presentation.layout.distance tokens
closest, closer, normal, farther, farthest; do not emit numeric
force values.
- Current producer tuning keeps
presentation.layout.strength at normal
and varies only presentation.layout.distance where semantic separation is
needed. Do not emit non-normal strength tokens for graph polish unless a
later product decision explicitly re-enables force-strength tuning.
- Use only closed icon tokens. Do not emit raw SVG or depend on frontend
capability-string icon inference; add a schema/UI icon token first.
- Missing v1
size.scale, layout.repulsion, and search use neutral
defaults. Do not expect the UI to preserve legacy self/device/SNMP/
endpoint heuristics for v1.
-
Define modal/table composition.
- Put actor modal recipes in
types.actor_types.<id>.presentation.modal.
- Put link modal recipes in
types.link_types.<id>.presentation.modal.
- Put reusable table defaults in
types.table_types.<id>.presentation.
- Use
modal.labels.identification.fields[] to choose the small set of
actor labels that should appear in the actor modal identification/header
area. The full actor_labels table remains the Labels tab.
- Modal sections must select from existing
actors, links, evidence,
actor_table, or relationship_table sources.
- Do not duplicate evidence or actor metadata only to populate a modal.
- Use projections for display: direct column, actor-ref label, opposite
actor, formatted endpoint, selected-side endpoint, label lookup,
coalesce, const, or explicit scalar JSON path.
- For
selected_side_endpoint, include source/destination actor-ref
columns and both endpoint sides in the projection so the UI can choose the
side from the selected actor without hardcoded table knowledge.
- For
label_lookup, provide label_key; provide actor_column only when
the lookup should read labels for an actor referenced by the source row
instead of the selected modal actor.
- For
json_path, provide both the JSON column and scalar path.
- Use cell types: text, number, badge, actor_link, timestamp, duration,
endpoint, array_count, or debug_json.
- Use visibility values: table, expanded, hidden, or debug.
- Raw
json is debug-only unless a schema-declared scalar projection gives
the UI/aggregator semantics.
- Treat Function
info responses as metadata only. Validate full topology
responses against FUNCTION_TOPOLOGY_SCHEMA.json; do not require
metadata-only info responses to carry data.
-
Encode large sections as compact tables.
- Use
const for constant columns.
- Use
dict for low/medium-cardinality repeated values.
- Use
values only when values are high-cardinality.
- Prefer dictionary references for strings.
- For Go producers, use
src/go/pkg/topology/v1 compact-table helpers
instead of hand-building table JSON.
- Validate and measure.
- Validate JSON with
src/plugins.d/FUNCTION_TOPOLOGY_SCHEMA.json.
- Add semantic validation fixtures.
- Measure raw and gzip size on realistic data.
- Fail explicitly on size/row limits; never silently truncate.
- For topology row limits, count rows as
max(actor rows, link rows) so
valid actor-only payloads are not rejected.
Direction Rules
directed + flow: sockets, traffic, request dependencies.
directed + dependency: logical dependency direction.
hierarchical + ownership: parent/child, host/VM, cluster/host.
undirected + none: physical adjacency with no direction.
observed_bidirectional + observation: discovery saw one or both sides,
but direction is not user-facing dependency.
If direction is noise, mark it so the aggregator can merge independently of
direction.
Network-Connections Correlation Shape
Network-connections uses three graph-link families:
- node-to-process ownership links;
- resolved process-to-process socket links;
- process-to-endpoint socket relationships for unresolved or cross-node
endpoint tuples.
Network-connections dependency direction is client-to-server. Use
direction_role: "dependency" for socket dependency link types. Emit
src_actor as the client/dependant and dst_actor as the server/dependency
target. Do not expose local as a topology socket direction; same-node sockets
still become inbound or outbound dependency rows based on which side is the
client.
Use distinct presentation for each family:
endpoint_socket: solid, colored, thin, normal-strength, normal-distance unresolved
endpoint dependency links;
correlated_socket: solid, colored, thin, normal-strength, farthest aggregator
output links after exact endpoint absorption;
socket: gray, thin, normal-strength, normal-distance local process links, optionally
variable by socket_count;
ownership: dotted, faded/dim, thin, normal-strength, normal-distance graph-coherence links.
In aggregated mode, do not enable process port bullets from detailed socket
evidence. Emit a compact actor inventory table such as socket_ports with
actor, port, and numeric socket_count, point the process actor
ports.sources[] at it with value_column: "socket_count", and size process
actors with size.mode: "metric" over actor row socket_count.
For network-connections actor modals:
- self/node actors show a
Processes section from links filtered to
type == ownership;
- non-node actors show
Dependencies where the selected actor is src_actor
and Dependants where the selected actor is dst_actor;
- aggregated mode uses
tables.relationship.connections;
- detailed mode uses
evidence.socket;
socket_ports stays an actor inventory for graph port bullets, not a normal
modal tab;
- secondary socket metrics belong in
visibility: "expanded" columns instead
of separate duplicate sections.
For socket correlation:
- process actors emit claim rows for the socket tuple they own: client tuple
for outbound observations, server tuple for inbound observations;
- visible endpoint/correlation actors emit point rows when the producer
materializes them;
- the
socket_exact rule uses class: resolve_loose_side and
action: absorb;
- the key is declarative, typically protocol + address space + IP + port;
endpoint_socket links are normal-strength/normal-distance visible links before
aggregation;
correlated_socket is the farthest output link type after exact absorption.
Streaming Modal Rules
For topology:streaming actor modals:
- Size parent actors from the actor row
retained_node_count metric, not from
graph degree or direct child count. Emit presentation.size.mode: "metric"
and presentation.size.metric_column: "retained_node_count" for the parent
actor type. This count represents nodes for which the parent has retained
data, including self, virtual nodes, stale nodes, and transit descendants
when they have DB retention state.
- Attach parent graph bullets to the parent side of incoming streaming links.
For graph-link sources this means
ports.sources[].actor_column: "dst_actor" and a scalar child/node display name_column, such as
port_name.
- Keep
actor_labels, stream_path, retention, inbound, and outbound
as the single source of truth. Do not duplicate rows only to populate modal
sections.
- Put important node identity/status facts in
modal.labels.identification.fields[], backed by actor_labels. Typical
host-like keys are hostname, node type, health, stream, ingest, OS, OS
version, kernel, architecture, CPU, cores, RAM, virtualization, container,
cloud placement, and Agent version. Parent actors also include retained-node
count and direct child count.
Vnode actors should use inventory/device labels such as vnode type, vendor,
model, address, location, sys object id, LLDP name, and status. Keep long
stable identifiers such as machine GUID and node id in the full Labels tab by
default.
- Show
Stream path from stream_path filtered by actor, ordered by
path_index. This is only the selected actor's own path; child and virtual
node paths belong to their own actors. Do not emit blank since or
first_time values for synthetic path rows when those timestamps can be
derived from adjacent path, ingest, or DB status.
- Show
Retained nodes from the retention table filtered by
observer_actor; this answers which nodes' data the selected actor
maintains. Include self, virtual nodes, direct children, transit descendants,
and stale/archived hosts when present in the Agent root index. Preserve
db_from and db_to whenever the DB status knows the range.
- Show
Received nodes from inbound filtered by parent_actor; this table
represents children, virtual nodes, stale nodes, and descendants received or
transiting through the selected parent. Populate source_actor whenever the
immediate sending actor is known; for direct local receipt, use the child or
virtual-node actor instead of leaving the cell empty.
- Show
Outbound streams from outbound filtered by the sending parent actor.
This table must list every node payload the selected parent streams upstream,
including self, virtual nodes, direct children, and transit descendants. Rows
need at least streamed node actor, destination actor when known, status, age,
hops, TLS, compression, and useful counts/replication metrics when available.
- Do not show the old
Retention for node default section in the current modal
contract. Keep actor and observer_actor in the canonical retention table
so Cloud aggregation can preserve multiple retaining parents and a future
explicitly named Retained by section can be added without changing facts.
SNMP/L2 Modal Rules
For SNMP/L2 managed device actor modals:
- Treat the device as a collection of ports. The primary section is
Ports
over actor_ports.
- Put important device facts in
modal.labels.identification.fields[], backed
by actor_labels. Typical keys are display name, management IP, vendor,
model, port counts, and LLDP/CDP neighbor counts.
- Expose real port identity as typed
actor_ports columns: SNMP if_index as
the visible numeric port ID when known, source port_id, display name,
if_name, if_descr, if_alias, MAC, speed, status, mode, role, VLAN, FDB,
link, and neighbor counts.
- Do not fabricate numeric port IDs. Do not derive port identity from row order
or any generated sequence;
if_index must come from device/SNMP facts.
- Include compact expanded-row neighbor columns such as nullable
neighbor_actor and neighbor_port_name when graph-link facts can align the
port to a remote actor.
- Use an actor-owned
actor_port_links modal index for Port Neighbors when
the device modal needs remote actor, remote port, link type, evidence count,
confidence, inference, attachment mode, or timestamps.
actor_port_links may carry compact side-specific refs and scalar facts, but
must not duplicate raw LLDP/CDP/FDB/ARP/STP evidence JSON.
- Keep generic graph-link
Links sections only for endpoint, segment, or
custom actors that do not own port inventory.
- Build link endpoint port labels only from real port fields:
port_name,
if_name, if_descr, or source port_id. Never use actor labels such as
display_name or sys_name as port-name fallbacks.
Validation Checklist
- JSON validates against the topology schema.
- Semantic validation covers references, compact-table row counts, dictionaries,
correlation rules, layout tokens, and schema-token parity.
- Actor identities are documented and tested.
- Link direction policy is documented and tested.
- Correlation points, claims, rules, priorities, actions, and output link types
are documented and tested when cross-payload resolution applies.
- Evidence rows can reproduce required drilldown tables.
- Custom actor tables have correct roles and aggregation policy.
- Actor labels are emitted through
actor_labels when the producer has labels
or actor metadata to show.
actor_labels.key, actor_labels.value, actor_labels.source, and
actor_labels.kind are logical string fields. Accept string and
string_ref encodings as equivalent when validating, aggregating, or
rendering topology payloads.
- Treat
actor_labels as sensitive topology Function data. Preserve the source
Function's access-control assumptions when forwarding, aggregating, testing,
or documenting labels.
- Modal sections are recipes over existing facts and do not duplicate
high-cardinality evidence rows.
- Raw JSON columns are hidden/debug-only unless a schema-declared projection
renders a scalar value.
- Payload size is measured on realistic or captured data.
- Raw sensitive captures remain under
.local/.
Before considering cloud-topology-service ready, verify service-level
fixtures for all topology kinds covered by the schema. network-connections is
the required high-cardinality benchmark, but it is not enough by itself.
vSphere Coordination
The vSphere topology producer lives in a separate PR worktree. Do not edit that
worktree before telling the user, because another agent may be working there.