// Analyze OpenShift bare metal installation failures in Prow CI jobs using dev-scripts artifacts. Use for jobs with "metal" in name, for debugging Metal3/Ironic provisioning, installation, or dev-scripts setup failures. You may also use the prow-job-analyze-install-failure skill with this one.
| name | Prow Job Analyze Metal Install Failure |
| description | Analyze OpenShift bare metal installation failures in Prow CI jobs using dev-scripts artifacts. Use for jobs with "metal" in name, for debugging Metal3/Ironic provisioning, installation, or dev-scripts setup failures. You may also use the prow-job-analyze-install-failure skill with this one. |
This skill helps debug OpenShift bare metal installation failures in CI jobs by analyzing dev-scripts logs, libvirt console logs, sosreports, and other metal-specific artifacts.
Use this skill when:
This skill is invoked by the main prow-job-analyze-install-failure skill when it detects a metal job.
Metal IPI jobs use dev-scripts (https://github.com/openshift-metal3/dev-scripts) with Metal3 and Ironic to install OpenShift:
The installation process has multiple layers:
Failures can occur at any layer, so analysis must check all of them.
IMPORTANT: The term "disconnected" refers to the cluster nodes, NOT the hypervisor.
When analyzing failures in "metal-ipi-ovn-ipv6" jobs:
gcloud CLI Installation
which gcloudgcloud Authentication (Optional)
test-platform-results bucket is publicly accessibleThe user will provide:
Metal jobs produce several diagnostic archives:
{target}/ofcir-acquire/build-log.txt: Log showing pool, provider, and host detailsartifacts/junit_metal_setup.xml: JUnit with test [sig-metal] should get working host from infra provider{target}/baremetalds-devscripts-setup/artifacts/root/dev-scripts/logs/.openshift_install*.log) will also be present in the devscripts folders.{target}/baremetalds-devscripts-gather/artifacts/{target}/baremetalds-devscripts-gather/artifacts/bootstrap/journals/ironic.log and bootstrap/journals/metal3-baremetal-operator.log
control-plane/{node-ip}/containers/metal3-ironic-*.log and control-plane/{node-ip}/containers/metal3-baremetal-operator-*.log
{target}/baremetalds-devscripts-gather/artifacts/{target}/baremetalds-devscripts-gather/artifacts/Download OFCIR logs
gcloud storage cp gs://test-platform-results/{bucket-path}/artifacts/{target}/ofcir-acquire/build-log.txt .work/prow-job-analyze-install-failure/{build_id}/logs/ofcir-build-log.txt --no-user-output-enabled 2>&1 || echo "OFCIR build log not found"
gcloud storage cp gs://test-platform-results/{bucket-path}/artifacts/{target}/ofcir-acquire/artifacts/junit_metal_setup.xml .work/prow-job-analyze-install-failure/{build_id}/logs/junit_metal_setup.xml --no-user-output-enabled 2>&1 || echo "OFCIR JUnit not found"
Check junit_metal_setup.xml for acquisition failure
[sig-metal] should get working host from infra providerExtract OFCIR details from build-log.txt
pool: The OFCIR pool nameprovider: The infrastructure providername: The host name allocatedIf OFCIR acquisition failed
Download dev-scripts logs directory
gcloud storage cp -r gs://test-platform-results/{bucket-path}/artifacts/{target}/baremetalds-devscripts-setup/artifacts/root/dev-scripts/logs/ .work/prow-job-analyze-install-failure/{build_id}/logs/devscripts/ --no-user-output-enabled
Handle missing dev-scripts logs gracefully
Find and download libvirt-logs.tar
gcloud storage ls -r gs://test-platform-results/{bucket-path}/artifacts/ 2>&1 | grep "libvirt-logs\.tar$"
gcloud storage cp {full-gcs-path-to-libvirt-logs.tar} .work/prow-job-analyze-install-failure/{build_id}/logs/ --no-user-output-enabled
Extract libvirt logs
tar -xf .work/prow-job-analyze-install-failure/{build_id}/logs/libvirt-logs.tar -C .work/prow-job-analyze-install-failure/{build_id}/logs/
Download sosreport (optional)
gcloud storage ls -r gs://test-platform-results/{bucket-path}/artifacts/ 2>&1 | grep "sosreport.*\.tar\.xz$"
gcloud storage cp {full-gcs-path-to-sosreport} .work/prow-job-analyze-install-failure/{build_id}/logs/ --no-user-output-enabled
tar -xf .work/prow-job-analyze-install-failure/{build_id}/logs/sosreport-{name}.tar.xz -C .work/prow-job-analyze-install-failure/{build_id}/logs/
Download squid-logs (optional, for IPv6/disconnected jobs)
gcloud storage ls -r gs://test-platform-results/{bucket-path}/artifacts/ 2>&1 | grep "squid-logs.*\.tar$"
gcloud storage cp {full-gcs-path-to-squid-logs} .work/prow-job-analyze-install-failure/{build_id}/logs/ --no-user-output-enabled
tar -xf .work/prow-job-analyze-install-failure/{build_id}/logs/squid-logs-{name}.tar -C .work/prow-job-analyze-install-failure/{build_id}/logs/
Check dev-scripts logs FIRST - they show what happened during setup and installation.
Read dev-scripts logs in order
.openshift_install*.log files in the devscripts directoriesKey errors to look for:
.openshift_install*.log) present in devscripts foldersImportant distinction:
Save dev-scripts analysis:
.work/prow-job-analyze-install-failure/{build_id}/analysis/devscripts-summary.txtCRITICAL: Check the RIGHT Ironic logs based on what failed
The log bundle contains TWO sets of Ironic logs in different locations:
Which logs to check:
bootstrap/journals/ironic.logcontrol-plane/{ip}/containers/metal3-ironic-*.logDownload and extract log bundle
gcloud storage ls -r gs://test-platform-results/{bucket-path}/artifacts/ 2>&1 | grep "log-bundle.*\.tar$"
gcloud storage cp {full-gcs-path-to-log-bundle.tar} .work/prow-job-analyze-install-failure/{build_id}/logs/ --no-user-output-enabled
tar -xf .work/prow-job-analyze-install-failure/{build_id}/logs/log-bundle-*.tar -C .work/prow-job-analyze-install-failure/{build_id}/logs/
Find ALL Ironic logs
# Bootstrap Ironic (master provisioning)
find .work/prow-job-analyze-install-failure/{build_id}/logs/ -path "*/bootstrap/journals/ironic.log"
find .work/prow-job-analyze-install-failure/{build_id}/logs/ -path "*/bootstrap/journals/metal3-baremetal-operator.log"
# Control-plane Ironic (worker provisioning) - CRITICAL for worker failures
find .work/prow-job-analyze-install-failure/{build_id}/logs/ -path "*/control-plane/*/containers/metal3-ironic-*.log"
find .work/prow-job-analyze-install-failure/{build_id}/logs/ -path "*/control-plane/*/containers/metal3-baremetal-operator-*.log"
Analyze the Ironic logs:
For Master Provisioning Issues (check bootstrap logs):
bootstrap/journals/ironic.log and bootstrap/journals/metal3-baremetal-operator.logFor Worker Provisioning Issues (check control-plane logs):
control-plane/{node-ip}/containers/metal3-ironic-*.logMap node UUIDs to BareMetalHost names:
b7fa5b83-91d0-46ee-acd2-e4b33e9ac983)Save Ironic analysis:
.work/prow-job-analyze-install-failure/{build_id}/analysis/ironic-summary.txtConsole logs are CRITICAL for metal failures during cluster creation.
Find console logs
find .work/prow-job-analyze-install-failure/{build_id}/logs/ -name "*console*.log"
{cluster-name}-bootstrap_console.log, {cluster-name}-master-{N}_console.logAnalyze console logs for boot/provisioning issues:
Console logs show the complete boot sequence:
Save console log analysis:
.work/prow-job-analyze-install-failure/{build_id}/analysis/console-summary.txtOnly needed for hypervisor-level issues.
Check sosreport for hypervisor diagnostics:
var/log/messages - Hypervisor system logsos_commands/ - Output of diagnostic commandsetc/libvirt/ - Libvirt configurationLook for hypervisor-level issues:
Important for debugging CI access to the cluster.
Check squid proxy logs:
Common issues:
Create comprehensive metal analysis report:
Metal Installation Failure Analysis
====================================
Job: {job-name}
Build ID: {build_id}
Prow URL: {original-url}
Installation Method: dev-scripts + Metal3 + Ironic
OFCIR Host Acquisition
----------------------
Pool: {pool name from OFCIR build log}
Provider: {provider from OFCIR build log}
Host: {host name from OFCIR build log}
Status: {Success or Failure}
{If OFCIR acquisition failed, note that installation never started}
Dev-Scripts Analysis
--------------------
{Summary of dev-scripts logs}
Key Findings:
- {First error in dev-scripts setup}
- {Related errors}
If dev-scripts failed: The problem is in the setup process (host config, Ironic, installer build)
If dev-scripts succeeded: The problem is in cluster installation (see main analysis)
Console Logs Analysis
---------------------
{Summary of VM/node console logs}
Bootstrap Node:
- {Boot sequence status}
- {Ignition status}
- {Network configuration}
- {Key errors}
Master Nodes:
- {Status for each master}
- {Key errors}
Hypervisor Diagnostics (sosreport)
-----------------------------------
{Summary of sosreport findings, if applicable}
Proxy Logs (squid)
------------------
{Summary of proxy logs, if applicable}
Note: Squid logs show CI access to the cluster, not cluster's registry access
Metal-Specific Recommended Steps
---------------------------------
Based on the failure:
For dev-scripts setup failures:
- Review host configuration (networking, DNS, storage)
- Check Ironic/Metal3 setup logs for BMC/provisioning issues
- Verify installer build completed successfully
- Check installer logs in devscripts folders
For console boot failures:
- Check Ignition configuration and network connectivity
- Review kernel boot messages for hardware issues
- Verify network configuration (DHCP, DNS, routing)
For CI access issues:
- Check squid proxy logs for failed CI connections to cluster
- Verify network routing between CI and cluster
- Check proxy configuration
Artifacts Location
------------------
Dev-scripts logs: .work/prow-job-analyze-install-failure/{build_id}/logs/devscripts/
Console logs: .work/prow-job-analyze-install-failure/{build_id}/logs/
sosreport: .work/prow-job-analyze-install-failure/{build_id}/logs/sosreport-*/
squid logs: .work/prow-job-analyze-install-failure/{build_id}/logs/squid-logs-*/
Save report:
.work/prow-job-analyze-install-failure/{build_id}/analysis/metal-analysis.txt| Issue | Symptoms | Where to Look |
|---|---|---|
| Dev-scripts host config | Early failure before cluster creation | Dev-scripts logs (host configuration step) |
| Ironic/Metal3 setup | Provisioning failures, BMC errors | Dev-scripts logs (Ironic setup) |
| BMC communication | BareMetalHost stuck registering, power state failures | Ironic logs (in log-bundle), BareMetalHost status |
| Node boot failure | VMs/nodes won't boot | Console logs (kernel, boot sequence) |
| Ignition failure | Nodes boot but don't provision | Console logs (Ignition messages) |
| Network config | DHCP failures, DNS issues | Console logs (network messages), dev-scripts host config |
| CI access issues | Tests can't connect to cluster | squid logs (proxy logs for CI โ cluster access) |
| Hypervisor issues | Resource constraints, libvirt errors | sosreport (system logs, libvirt config) |
.openshift_install*.log files in devscripts directories