| name | staged-discovery |
| description | Add staged discovery support to a provider. Use when the user wants to implement staged/phased discovery, break down discovery into stages, add OptionStagedDiscovery support, or optimize a provider's memory usage during discovery. Triggers on requests like "add staged discovery to gcp", "implement staged discovery for aws", "break down discovery for <provider>", or "optimize <provider> discovery". |
| argument-hint | <provider-name> (e.g., aws, gcp, k8s, azure) |
Add Staged Discovery to a Provider
Implement staged (multi-phase) discovery for a provider so that AssetExplorer can traverse the provider's resource hierarchy one level at a time, releasing memory after each scope is closed.
Background: See docs/adr/002-staged-discovery.md for the full design rationale and docs/adr/001-asset-explorer-lazy-discovery.md for how AssetExplorer drives the traversal.
Reference implementation: The K8s provider in providers/k8s/resources/discovery.go — study Discover(), discoverClusterStage(), and discoverNamespaceStage() as the canonical example.
Prerequisites
Before starting, understand the provider's resource hierarchy. Every provider has a natural tree:
- K8s: cluster → namespaces → workloads (pods, deployments, etc.)
- GCP: organization → projects → services → resources
- AWS: organization → accounts → regions → resources
- Azure: tenant → subscriptions → resource groups → resources
Each level of this tree becomes a discovery stage. Ask the user to confirm the hierarchy if it's not obvious.
Step-by-Step Implementation
Step 1: Identify the discovery entry point
Find the provider's Discover() function. It is typically in providers/<name>/resources/discovery.go or called from providers/<name>/provider/provider.go during connection setup.
grep -rn "func.*Discover" providers/<name>/resources/ providers/<name>/provider/
Read the existing discovery logic thoroughly. Understand:
- What assets are currently returned (the full set)
- How platform IDs are constructed
- How connection configs are set on child assets
- Whether
WithParentConnectionId is used and where
Step 2: Add the staged discovery router
Modify the Discover() function to check for OptionStagedDiscovery and route to stage-specific functions. The legacy path MUST remain unchanged — older clients that don't set the flag must continue working.
import "go.mondoo.com/mql/v13/providers-sdk/v1/plugin"
func Discover(runtime *plugin.Runtime, ...) (*inventory.Inventory, error) {
conn := runtime.Connection.(YourConnection)
invConfig := conn.InventoryConfig()
if _, ok := invConfig.Options[plugin.OptionStagedDiscovery]; ok {
if invConfig.Options["your-scope-option"] != "" {
return discoverScopedStage(runtime, conn, invConfig)
}
return discoverRootStage(runtime, conn, invConfig)
}
return discoverLegacy(runtime, conn, invConfig)
}
Important: Rename the existing discovery function to discoverLegacy (or similar) and add a TODO(v15) comment. Do not delete it.
Step 3: Implement Stage 1 (root/top-level scope)
Stage 1 discovers the top-level asset and its immediate children. Children are returned as assets with connection configs that trigger Stage 2 when connected.
Critical rules:
- Child assets that represent a new scope (e.g., namespaces, projects, regions) must NOT use
WithParentConnectionId. They need their own independent runtime so their MQL resource cache is isolated and released when the scope is closed.
- Clone the parent's connection config for each child, adding a scope option that identifies the next stage.
- The
OptionStagedDiscovery flag is propagated automatically by Clone().
func discoverRootStage(runtime *plugin.Runtime, conn YourConnection, invConfig *inventory.Config) (*inventory.Inventory, error) {
in := &inventory.Inventory{Spec: &inventory.InventorySpec{
Assets: []*inventory.Asset{},
}}
rootAsset := &inventory.Asset{
PlatformIds: []string{rootPlatformId},
Name: conn.Name(),
Platform: conn.Platform(),
Connections: []*inventory.Config{invConfig.Clone(inventory.WithoutDiscovery())},
}
in.Spec.Assets = append(in.Spec.Assets, rootAsset)
children, err := listChildScopes(conn)
if err != nil {
return nil, err
}
for _, child := range children {
childConfig := invConfig.Clone()
childConfig.Options["your-scope-option"] = child.ID
childAsset := &inventory.Asset{
PlatformIds: []string{child.PlatformId},
Name: child.Name,
Platform: child.Platform,
Connections: []*inventory.Config{childConfig},
}
in.Spec.Assets = append(in.Spec.Assets, childAsset)
}
return in, nil
}
Step 4: Implement Stage 2+ (scoped discovery)
Each subsequent stage reads its scope from the connection config, discovers resources within that scope, and optionally emits further children for deeper stages.
Leaf assets within a scope SHOULD use WithParentConnectionId to share the scope's API client cache. This avoids redundant API calls while keeping the cache scoped to the parent (not the root).
func discoverScopedStage(runtime *plugin.Runtime, conn YourConnection, invConfig *inventory.Config) (*inventory.Inventory, error) {
scopeId := invConfig.Options["your-scope-option"]
in := &inventory.Inventory{Spec: &inventory.InventorySpec{
Assets: []*inventory.Asset{},
}}
resources, err := listResourcesInScope(conn, scopeId)
if err != nil {
return nil, err
}
for _, res := range resources {
resAsset := &inventory.Asset{
PlatformIds: []string{res.PlatformId},
Name: res.Name,
Platform: res.Platform,
Connections: []*inventory.Config{invConfig.Clone(
inventory.WithParentConnectionId(invConfig.Id),
)},
}
in.Spec.Assets = append(in.Spec.Assets, resAsset)
}
return in, nil
}
Step 5: Strip platform IDs from non-targeted intermediate assets
When users specify discovery targets (e.g., --discover pods), intermediate assets that don't match the targets must still be emitted (so AssetExplorer connects to them and discovers their children), but should NOT be scanned. The simplest way: strip their platform IDs. The existing "no platform IDs → skip" logic in AssetExplorer and the scanner already handles this — no caller-side changes needed.
isScopeTargeted := isTargeted(invConfig.Discover.Targets, YourScopeDiscoveryTarget)
for _, child := range children {
childConfig := invConfig.Clone()
childConfig.Options["your-scope-option"] = child.ID
childAsset := &inventory.Asset{
Name: child.Name,
Platform: child.Platform,
Connections: []*inventory.Config{childConfig},
}
if isScopeTargeted {
childAsset.PlatformIds = []string{child.PlatformId}
}
in.Spec.Assets = append(in.Spec.Assets, childAsset)
}
Key rules:
DiscoveryAuto and DiscoveryAll targets mean everything is scannable — always keep platform IDs
- Mixed targets (e.g.,
--discover pods,namespaces) — if the intermediate level IS a target, keep its platform IDs
- Leaf assets at the bottom of the hierarchy always keep their platform IDs if they match targets
Step 6: Gate resource methods at higher scopes (if needed)
When the root scope is scanned, resource methods that load lower-scope data should return empty results to avoid loading everything into the root's cache. This is optional but important for large providers.
func isRootScopedConnection(r *plugin.Runtime) bool {
conn := r.Connection.(YourConnection)
cfg := conn.InventoryConfig()
if _, ok := cfg.Options[plugin.OptionStagedDiscovery]; !ok {
return false
}
return cfg.Options["your-scope-option"] == ""
}
func (r *mqlYourProvider) childScopedResources() ([]interface{}, error) {
if isRootScopedConnection(r.MqlRuntime) {
return []interface{}{}, nil
}
}
Step 7: Verify both paths and discovery targets
Both the legacy and staged paths must discover the same final set of assets (same platform IDs, same names). They differ only in how discovery is chunked. Also verify that discovery targets correctly filter scannable assets.
make providers/build/<name> && make providers/install/<name>
mql shell <provider-args>
mql shell <provider-args>
mql shell <provider-args> --discover <specific-target>
go test ./providers/<name>/...
Step 8: Update .lr.versions if new resources were added
If you added any new resources or fields to support staged discovery, update the .lr.versions file:
make providers/mqlr
./mqlr generate providers/<name>/resources/<name>.lr --dist providers/<name>/resources
Checklist