بنقرة واحدة
discover-components
// Discover platform components by exploring breadcrumbs (installers, operators, dependencies) in checkouts directory. Outputs component-map.json for platforms without manifest scripts.
// Discover platform components by exploring breadcrumbs (installers, operators, dependencies) in checkouts directory. Outputs component-map.json for platforms without manifest scripts.
Analyze a component repository and generate comprehensive architecture summary with structured markdown tables. Use when analyzing ODH/RHOAI components, documenting architecture, or creating security diagrams.
Generate architecture diagrams (Mermaid, C4, security network diagrams) from GENERATED_ARCHITECTURE.md. Creates visual representations for different audiences - developers, architects, security teams.
Generate diagrams for all component architecture files in an organized architecture directory. Processes each component sequentially ONE AT A TIME by reading the .md file and extracting data from tables. Creates Mermaid, C4, and security diagrams. Skips components that already have diagrams by default.
Generate platform-level diagrams (dependency graphs, network topology, cross-component workflows) from aggregated PLATFORM.md file. Creates visualizations for architects, security teams, and platform engineers.
Combine multiple component architecture summaries into a platform-level architecture document. Use after generating component summaries to create a wholistic platform view.
Analyze all components for a platform (ODH or RHOAI) sequentially by parsing get_all_manifests.sh and executing repo-to-architecture-summary for each. Skips components that already have GENERATED_ARCHITECTURE.md. Fully autonomous and resumable.
| name | discover-components |
| description | Discover platform components by exploring breadcrumbs (installers, operators, dependencies) in checkouts directory. Outputs component-map.json for platforms without manifest scripts. |
| user-invocable | true |
| allowed-tools | Read, Glob, Grep, Write, Task, Bash(ls *), Bash(find *), Bash(cat *), Bash(grep *), Bash(python *) |
Discover which repositories in a checkouts directory are actual platform components (shipped in the product) vs. side projects, tools, or helpers.
This is used for platforms that don't have a central manifest script (like ODH/RHOAI's get_all_manifests.sh). Instead, we explore "breadcrumbs" to build a component map:
Required:
--platform=<name> - Platform identifier (e.g., "aap", "ansible")--checkouts-dir=<path> - Directory containing cloned reposOptional:
--entry-repo=<name> - Starting point repo (e.g., "installer", "operator")--architecture-dir=<path> - Output directory (default: architecture)--exclude=<pattern> - Additional repos to exclude (comma-separated)List all subdirectories in the checkouts directory:
ls -1 {checkouts_dir}/
This gives you the universe of possible components.
Exclude obvious non-components:
. (hidden)*-docs, *-documentation*-ci, *-tools, *-testing, *-testmust-gather, additional-imagesRHOAI-Build-Config, konflux-central)Do NOT exclude odh-cli — it is a shipped component starting with RHOAI 3.3+.
Create an initial list of candidate repos.
ODH and RHOAI use a DataScienceCluster (DSC) CR to manage platform components. The operator's DSC spec defines which components are user-togglable (can be set to Managed or Removed). This is the authoritative source for tier classification.
Step 1: Parse the DSC spec to find managed components.
Look in the operator repo for the DataScienceCluster types:
grep "json:" {operator_repo}/api/datasciencecluster/v2/datasciencecluster_types.go | grep -v "//"
This lists all component fields in the Components struct. Each field corresponds to a sub-operator or controller that the user can toggle. These are the optional_platform components.
Step 2: Map DSC field names to repos. The DSC field names (e.g., dashboard, kserve, ray) map to component controller directories:
ls {operator_repo}/internal/controller/components/
Each directory name maps to a DSC field. Cross-reference with the RELATED_IMAGE mappings (Step 5.1a) and operator bundle to determine which repo each controller deploys.
Step 3: Apply tier classification.
| Tier | Criteria | Examples |
|---|---|---|
core_platform | The meta-operator itself AND components that are always deployed (not togglable via DSC) | rhods-operator, opendatahub-operator, odh-dashboard, notebooks, odh-model-controller |
optional_platform | Components with a DSC field — user can set managementState: Managed/Removed | kserve, data-science-pipelines-operator, codeflare-operator, kuberay, kueue, trustyai-service-operator, model-registry-operator, modelmesh-serving, trainer, training-operator, spark-operator, feast, llama-stack-k8s-operator, mlflow-operator, models-as-a-service, workload-variant-autoscaler |
payload_component | Shipped containers/libraries deployed BY core or optional components — not independently togglable | vllm, data-science-pipelines, model-registry, codeflare-sdk, modelmesh, rest-proxy |
Key distinction: optional_platform components are the ones a cluster admin toggles on/off. payload_component repos provide the container images or libraries that optional_platform operators deploy — they don't have their own DSC toggle.
Dashboard, notebooks, and odh-model-controller are core_platform even though they appear as DSC fields — they are always-on components required for the platform to function. The DSC fields for these exist for configuration, not for enable/disable.
Set discovery_method: "breadcrumb" in metadata.
Record the tier for each repo. This tiering drives the rest of the discovery process:
core_platform + optional_platform → full breadcrumb exploration in Steps 3-5payload_component → include as component, lighter explorationecosystem → skip breadcrumb exploration, go directly to excluded (can be pulled back in by dependency analysis in Step 5a/5b)Limit candidate repos to those in the core_platform, optional_platform, and payload_component tiers. Do NOT treat every operator-shaped repo as an entry point.
If --entry-repo specified, start there. Otherwise, search for common entry points:
Operator repos (high-value entry points):
bundle/, config/manager/, operator.yaml*-operator, operatorInstaller repos:
install.yml, site.yml, playbooks/installer, *-installer, deploymentPlatform repos:
platform, automation-platform, *-platformList discovered entry points and pick the best one (or use all).
For each entry point, look for references to other repos:
Search for container image references:
grep -r "image:" {entry_repo}/config/ {entry_repo}/manifests/ {entry_repo}/bundle/
Extract repo names from image paths like:
quay.io/ansible/awx-operator:latest → awx-operatorregistry.redhat.io/ansible/eda-server:1.0 → eda-serverSearch for role/collection references:
grep -r "role:" {entry_repo}/
grep -r "collection:" {entry_repo}/
Python (requirements.txt, pyproject.toml):
find {entry_repo} -name "requirements*.txt" -o -name "pyproject.toml"
cat {found_files}
Look for patterns like:
django-ansible-base>=1.0.0 - First-party package (matches repo name)-e git+https://github.com/ansible/django-ansible-base.git - Editable install from gitfile:///path/to/local/repo - Local dependencyGo (go.mod):
find {entry_repo} -name "go.mod"
cat {found_files}
Look for:
github.com/ansible/common-lib v1.0.0 - First-party modulereplace github.com/ansible/foo => ../foo - Local replacementKey insight: If a dependency name matches a repo in the checkouts directory, it's likely a first-party shared library!
cat {entry_repo}/.gitmodules
find {entry_repo} -path "*/.github/workflows/*.yml" -o -path "*/.gitlab-ci.yml"
cat {found_files}
Look for:
After tier classification (Step 2a) and entry point exploration (Steps 3-4), discover additional components via operand mappings and dependency analysis.
5.1a: Discover operands via RELATED_IMAGE mappings. The operator deploys sub-components via RELATED_IMAGE_* environment variable mappings defined in *_support.go files.
Run the helper script to parse these mappings and match them to repos:
python ${CLAUDE_SKILL_DIR}/scripts/parse_related_images.py {operator_repo} {checkouts_dir1} {checkouts_dir2} ...
The script scans internal/controller/components/*/*_support.go for imageParamMap entries, normalizes the image keys, and matches them against repos in the checkouts directories. Output is JSON with repos (matched) and unmatched sections.
For each matched repo in the output, if not already in the component list, add as discovered_via: "operator_operand", referenced_by: ["{operator-name}"]. Use the tier already assigned in Step 2a if the repo was classified there; otherwise default to tier: "payload_component".
Binding rule: Any repo matched by parse_related_images.py MUST be included as a component. Do NOT override these matches by reclassifying the repo as excluded. The script identifies operands that the operator ships — if the operator references a container image built from a repo, that repo is a shipped component regardless of whether it looks like "infrastructure" or a "utility." The only exception is build infrastructure repos like RHOAI-Build-Config itself.
5.1b: Discover operands via OLM catalog relatedImages (RHOAI). RHOAI has a build config repo (RHOAI-Build-Config) containing OLM catalog YAML with relatedImages sections — the authoritative list of every container image shipped in each version.
Run the helper script to parse the catalog and match images to repos:
python ${CLAUDE_SKILL_DIR}/scripts/parse_catalog_images.py --find-catalog {checkouts_dir1} {checkouts_dir2} ...
Or to target a specific version:
python ${CLAUDE_SKILL_DIR}/scripts/parse_catalog_images.py --find-catalog --version rhoai-3.4 {checkouts_dir1} {checkouts_dir2} ...
The script auto-finds the best RHOAI-Build-Config/catalog/ directory, extracts unique images from relatedImages sections, and matches them to repos using multi-step normalization (strip odh- prefix, -rhel[0-9] suffix, hardware variant suffixes) plus known name mappings (e.g., ml-pipelines-* → data-science-pipelines, dashboard → odh-dashboard).
The catalog directories are versioned by RHOAI release, not by checkout branch — RHOAI-Build-Config is typically checked out at head but contains catalogs for all historical versions. If no catalog directory exactly matches the target platform version, the script uses the latest available version as the best approximation.
Output is JSON with:
repos: matched repos with image names and match methodunmatched: images with no repo match (third-party base images, internal tools)variant_groups: images that are build variants of the same componentFor each matched repo, if not already in the component list, add as discovered_via: "container_image", referenced_by: ["rhods-operator"]. Use the tier already assigned in Step 2a if the repo was classified there; otherwise default to tier: "payload_component". Do NOT add RHOAI-Build-Config itself as a component — it is build infrastructure, not a shipped component. Unmatched images that are clearly third-party (ubi-*, ose-*, postgresql-*, etcd) should be ignored. Other unmatched images may represent components without source repos in the checkouts — note them but don't block on them.
Binding rule: Any repo matched by parse_catalog_images.py MUST be included as a component. Do NOT override these matches by reclassifying the repo as excluded. The OLM catalog's relatedImages is the authoritative list of container images shipped in the product — if a repo's image is in the catalog, the repo is a shipped component regardless of whether it looks like "infrastructure," a "utility," or "covered by" another component. The only exception is build infrastructure repos like RHOAI-Build-Config itself.
5.2: Scan go.mod for shared libraries. Scan go.mod (or equivalent) of each discovered core_platform and optional_platform component. Look for first-party dependencies (same GitHub org) that match repos in the checkouts directory. This is how shared libraries like library-go, api, client-go get discovered.
As you discover references:
discovered_via and referenced_byshipped: true if deployed directlydependency_graph — even in signal mode, record which components depend on whichTrack the dependency graph:
{
"kserve": ["kubeflow", "gateway-api-inference-extension"],
"data-science-pipelines-operator": ["data-science-pipelines", "ml-metadata", "argo-workflows"],
"training-operator": ["kubeflow"],
...
}
See classification heuristics for:
Repos not discovered via DSC spec, RELATED_IMAGE mappings, OLM catalog, or dependency analysis are ecosystem tier and should be excluded unless they were pulled in as a shared library (Step 5a) or API specification (Step 5b).
Definitely not shipped (exclude):
konflux-central, RHOAI-Build-Config)must-gather, rhoai-additional-images)See consensus review procedure for the full multi-reviewer consensus process — when to trigger it, the 3 reviewer prompts (structural, relational, functional), vote aggregation rules, and how to record consensus results in the component map.
For each discovered component, check if GENERATED_ARCHITECTURE.md exists:
ls {checkouts_dir}/{repo_name}/GENERATED_ARCHITECTURE.md
Set has_architecture: true/false accordingly.
See output schema for the full component-map.json schema including metadata, component fields, dependency_graph, and excluded sections.
Important structural requirements:
components must be a dict keyed by component key, not a listexcluded must be a dict keyed by repo name, not a listkey field must match its dict keyWrite to architecture/{platform}/component-map.json.
Always overwrite the existing file if one is present — the user is re-running discovery to get updated results. Do NOT skip writing because the file already exists.
After writing, run the validation script to catch schema errors before reporting success:
python ${CLAUDE_SKILL_DIR}/scripts/validate_component_map.py architecture/{platform}/component-map.json
If validation fails, read the errors, fix the JSON, re-write the file, and re-validate. Do not proceed to Step 10 until validation passes.
See output schema for the full report summary template. Output includes platform info, discovery method, component counts, tiered component lists, consensus-reviewed repos, and next steps.
See classification heuristics for full include/exclude criteria, confidence levels, shared library detection methods, and special cases.
component-map.json after generationSee common mistakes for the 5 most frequent classification errors and how to avoid them.