Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COS-3013: overlay node image before bootstrapping if necessary #8742

Merged
merged 3 commits into from
Jan 18, 2025

Conversation

jlebon
Copy link
Member

@jlebon jlebon commented Jul 16, 2024

As per openshift/enhancements#1637, we're trying to get rid of all OpenShift-versioned components from the bootimages.

This means that there will no longer be oc, kubelet, or crio binaries for example, which bootstrapping obviously relies on.

Instead, now we change things up so that early on when booting the bootstrap node, we pull down the node image, unencapsulate it (this just means convert it back to an OSTree commit), then mount over its /usr, and import new /etc content.

This is done by isolating to a different systemd target to only bring up the minimum number of services to do the pivot and then carry on with bootstrapping.

This does not incur additional reboots and should be compatible with AI/ABI/SNO. But it is of course, a huge conceptual shift in how bootstrapping works. With this, we would now always be sure that we're using the same binaries as the target version as part of bootstrapping, which should alleviate some issues such as AI late-binding (see e.g. https://issues.redhat.com/browse/MGMT-16705).

The big exception of course being the kernel. Relatedly, note we do persist /usr/lib/modules from the booted system so that loading kernel modules still works.

To be conservative, the new logic only kicks in when using bootimages which do not have oc. This will allow us to ratchet this in more easily.

Down the line, we should be able to replace some of this with bootc apply-live once that's available (and also works in a live environment). (See containers/bootc#76.)

For full context, see the linked enhancement and discussions there.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 16, 2024
Copy link
Contributor

openshift-ci bot commented Jul 16, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@jlebon
Copy link
Member Author

jlebon commented Jul 16, 2024

/test e2e-aws-ovn

@jlebon
Copy link
Member Author

jlebon commented Jul 17, 2024

/retest

@jlebon
Copy link
Member Author

jlebon commented Jul 17, 2024

✔️ ci/prow/e2e-aws-ovn — Job succeeded
✔️ ci/prow/e2e-openstack-ovn — Job succeeded

Sweet! 🎉

Looks like e2e-aws-ovn-heterogeneous failed on a different issue.

@jlebon
Copy link
Member Author

jlebon commented Jul 17, 2024

/retest

@jlebon
Copy link
Member Author

jlebon commented Jul 17, 2024

/test e2e-agent-compact-ipv4
/test e2e-agent-sno-ipv4-pxe

@zaneb
Copy link
Member

zaneb commented Jul 17, 2024

/test ?

Copy link
Contributor

openshift-ci bot commented Jul 17, 2024

@zaneb: The following commands are available to trigger required jobs:

  • /test agent-integration-tests
  • /test altinfra-images
  • /test aro-unit
  • /test e2e-agent-compact-ipv4
  • /test e2e-aws-ovn
  • /test e2e-aws-ovn-edge-zones-manifest-validation
  • /test e2e-aws-ovn-upi
  • /test e2e-azure-ovn
  • /test e2e-azure-ovn-upi
  • /test e2e-gcp-ovn
  • /test e2e-gcp-ovn-upi
  • /test e2e-metal-ipi-ovn-ipv6
  • /test e2e-openstack-ovn
  • /test e2e-vsphere-ovn
  • /test e2e-vsphere-ovn-upi
  • /test gofmt
  • /test golint
  • /test govet
  • /test images
  • /test okd-images
  • /test okd-unit
  • /test okd-verify-codegen
  • /test openstack-manifests
  • /test shellcheck
  • /test terraform-images
  • /test terraform-verify-vendor
  • /test tf-lint
  • /test unit
  • /test verify-codegen
  • /test verify-vendor
  • /test yaml-lint

The following commands are available to trigger optional jobs:

  • /test altinfra-e2e-aws-custom-security-groups
  • /test altinfra-e2e-aws-ovn
  • /test altinfra-e2e-aws-ovn-fips
  • /test altinfra-e2e-aws-ovn-imdsv2
  • /test altinfra-e2e-aws-ovn-localzones
  • /test altinfra-e2e-aws-ovn-proxy
  • /test altinfra-e2e-aws-ovn-public-ipv4-pool
  • /test altinfra-e2e-aws-ovn-shared-vpc
  • /test altinfra-e2e-aws-ovn-shared-vpc-local-zones
  • /test altinfra-e2e-aws-ovn-shared-vpc-wavelength-zones
  • /test altinfra-e2e-aws-ovn-single-node
  • /test altinfra-e2e-aws-ovn-wavelengthzones
  • /test altinfra-e2e-azure-capi-ovn
  • /test altinfra-e2e-azure-ovn-shared-vpc
  • /test altinfra-e2e-gcp-capi-ovn
  • /test altinfra-e2e-gcp-ovn-byo-network-capi
  • /test altinfra-e2e-gcp-ovn-secureboot-capi
  • /test altinfra-e2e-gcp-ovn-xpn-capi
  • /test altinfra-e2e-ibmcloud-capi-ovn
  • /test altinfra-e2e-nutanix-capi-ovn
  • /test altinfra-e2e-openstack-capi-ccpmso
  • /test altinfra-e2e-openstack-capi-ccpmso-zone
  • /test altinfra-e2e-openstack-capi-dualstack
  • /test altinfra-e2e-openstack-capi-dualstack-upi
  • /test altinfra-e2e-openstack-capi-dualstack-v6primary
  • /test altinfra-e2e-openstack-capi-externallb
  • /test altinfra-e2e-openstack-capi-nfv-intel
  • /test altinfra-e2e-openstack-capi-ovn
  • /test altinfra-e2e-openstack-capi-proxy
  • /test altinfra-e2e-powervs-capi-ovn
  • /test altinfra-e2e-vsphere-capi-multi-vcenter-ovn
  • /test altinfra-e2e-vsphere-capi-ovn
  • /test altinfra-e2e-vsphere-capi-static-ovn
  • /test altinfra-e2e-vsphere-capi-zones
  • /test azure-ovn-marketplace-images
  • /test e2e-agent-compact-ipv4-appliance-diskimage
  • /test e2e-agent-compact-ipv4-none-platform
  • /test e2e-agent-ha-dualstack
  • /test e2e-agent-sno-ipv4-pxe
  • /test e2e-agent-sno-ipv6
  • /test e2e-aws-overlay-mtu-ovn-1200
  • /test e2e-aws-ovn-edge-zones
  • /test e2e-aws-ovn-fips
  • /test e2e-aws-ovn-heterogeneous
  • /test e2e-aws-ovn-imdsv2
  • /test e2e-aws-ovn-proxy
  • /test e2e-aws-ovn-public-subnets
  • /test e2e-aws-ovn-shared-vpc-custom-security-groups
  • /test e2e-aws-ovn-shared-vpc-edge-zones
  • /test e2e-aws-ovn-single-node
  • /test e2e-aws-ovn-upgrade
  • /test e2e-aws-ovn-workers-rhel8
  • /test e2e-aws-upi-proxy
  • /test e2e-azure-ovn-resourcegroup
  • /test e2e-azure-ovn-shared-vpc
  • /test e2e-azurestack
  • /test e2e-azurestack-upi
  • /test e2e-crc
  • /test e2e-external-aws
  • /test e2e-external-aws-ccm
  • /test e2e-gcp-ovn-byo-vpc
  • /test e2e-gcp-ovn-xpn
  • /test e2e-gcp-secureboot
  • /test e2e-gcp-upgrade
  • /test e2e-gcp-upi-xpn
  • /test e2e-ibmcloud-ovn
  • /test e2e-metal-assisted
  • /test e2e-metal-ipi-ovn
  • /test e2e-metal-ipi-ovn-dualstack
  • /test e2e-metal-ipi-ovn-swapped-hosts
  • /test e2e-metal-ipi-ovn-virtualmedia
  • /test e2e-metal-single-node-live-iso
  • /test e2e-nutanix-ovn
  • /test e2e-openstack-ccpmso
  • /test e2e-openstack-ccpmso-zone
  • /test e2e-openstack-dualstack
  • /test e2e-openstack-dualstack-upi
  • /test e2e-openstack-externallb
  • /test e2e-openstack-nfv-intel
  • /test e2e-openstack-proxy
  • /test e2e-vsphere-ovn-techpreview
  • /test e2e-vsphere-ovn-upi-zones
  • /test e2e-vsphere-ovn-zones
  • /test e2e-vsphere-ovn-zones-techpreview
  • /test e2e-vsphere-static-ovn
  • /test okd-e2e-agent-compact-ipv4
  • /test okd-e2e-agent-ha-dualstack
  • /test okd-e2e-agent-sno-ipv6
  • /test okd-e2e-aws-ovn
  • /test okd-e2e-aws-ovn-upgrade
  • /test okd-e2e-gcp
  • /test okd-e2e-gcp-ovn-upgrade
  • /test okd-e2e-vsphere
  • /test okd-scos-e2e-aws-ovn
  • /test okd-scos-images
  • /test tf-fmt

Use /test all to run the following jobs that were automatically triggered:

  • pull-ci-openshift-installer-master-altinfra-images
  • pull-ci-openshift-installer-master-aro-unit
  • pull-ci-openshift-installer-master-e2e-aws-ovn
  • pull-ci-openshift-installer-master-e2e-aws-ovn-heterogeneous
  • pull-ci-openshift-installer-master-e2e-openstack-ovn
  • pull-ci-openshift-installer-master-gofmt
  • pull-ci-openshift-installer-master-golint
  • pull-ci-openshift-installer-master-govet
  • pull-ci-openshift-installer-master-images
  • pull-ci-openshift-installer-master-okd-unit
  • pull-ci-openshift-installer-master-okd-verify-codegen
  • pull-ci-openshift-installer-master-shellcheck
  • pull-ci-openshift-installer-master-tf-fmt
  • pull-ci-openshift-installer-master-tf-lint
  • pull-ci-openshift-installer-master-unit
  • pull-ci-openshift-installer-master-verify-codegen
  • pull-ci-openshift-installer-master-verify-vendor
  • pull-ci-openshift-installer-master-yaml-lint

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@zaneb
Copy link
Member

zaneb commented Jul 17, 2024

/test e2e-metal-assisted

@jlebon
Copy link
Member Author

jlebon commented Jul 18, 2024

Just surfacing some insights from internal discussions here.

The AI test here is not using the ISO from rhcos.json but instead the one hardcoded in https://github.com/openshift/assisted-service/blob/master/data/default_os_images.json.

The ABI tests here are not using the ISO from the rhcos.json but instead the ISO baked in the release payload within machine-os-images.

@jlebon
Copy link
Member Author

jlebon commented Jul 18, 2024

The ABI tests here are not using the ISO from the rhcos.json but instead the ISO baked in the release payload within machine-os-images.

It's odd to me that machine-os-images is not more tightly coupled to rhcos.json. Was that by design? ISTM like machine-os-images should basically just be output from the installer repo too instead and always match what's in rhcos.json. (Though longer term obviously related is discussions around having managed bootimages, e.g. openshift/enhancements#201.)

@zaneb
Copy link
Member

zaneb commented Jul 19, 2024

It's odd to me that machine-os-images is not more tightly coupled to rhcos.json. Was that by design? ISTM like machine-os-images should basically just be output from the installer repo too instead and always match what's in rhcos.json.

It's tightly coupled in ART builds because machine-os-images depends on the installer image. But upstream CI doesn't have any notion of dependencies between container images. The CI payload just contains the latest master build of every container image. So machine-os-images only gets rebuilt when a PR merges to it, regardless of any changes to the installer repo (let alone unmerged PRs in the installer repo).

Arguably we chose the wrong ISO here, and in the event of a conflict between what rhcos.json says and what's in the payload, we should go with downloading the former so that we can CI changes like this. The assumption is that the payload is canonical, but that's not really true in CI, and CI is theoretically the only place they can differ. (I wonder if this would break disconnected CI tests though - @bfournie do you know? It'd also mean we'd be executing different code paths in CI than users are actually getting in reality.)

@r4f4
Copy link
Contributor

r4f4 commented Jul 19, 2024

Looks like e2e-aws-ovn-heterogeneous failed on a different issue.

This job is not required and expected to fail until #8698 merges.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 6, 2024
@jlebon
Copy link
Member Author

jlebon commented Sep 6, 2024

OK, updated! Didn't rebase to make the diff easier, but also because I don't expect this to be merged before the associated enhancement is merged.

But changes now include:

  • I renamed the nomenclature from "pivot" to "overlay"
  • now with 100% less setenforce 0
  • we no longer use Restart=on-failure because it complicates dependency handling (as discussed in COS-3013: overlay node image before bootstrapping if necessary #8742 (comment)), and only the tiny section related to networking needs to be retried
  • I've split out the systemctl isolate back to multi-user.target into a separate unit; this is part of the work to make this work with Assisted Installer
  • we now persist /usr/lib/modules
  • we now use ostree container image pull to pull the image

This works with Assisted Installer as well but requires a patch there. I'll open that one shortly.

@jlebon
Copy link
Member Author

jlebon commented Sep 6, 2024

/test e2e-aws-ovn
/test e2e-openstack-ovn

@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 6, 2024
@jlebon
Copy link
Member Author

jlebon commented Sep 6, 2024

OK, ended up doing a full rebase so try to get CI going.

/test e2e-aws-ovn
/test e2e-openstack-ovn

Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks sane to me

jlebon added a commit to jlebon/assisted-installer that referenced this pull request Sep 6, 2024
As per openshift/enhancements#1637, we're trying
to get rid of all OpenShift-versioned components from the bootimages.

This means that there will no longer be oc, kubelet, or crio
binaries for example, which bootstrapping obviously relies on.

To adapt to this, the OpenShift installer now ships a new
`node-image-overlay.service` in its bootstrap Ignition config. This
service takes care of pulling down the node image and overlaying it,
effectively updating the system to the node image version.

Here, we accordingly also adapt assisted-installer so that we run
`node-image-overlay.service` before starting e.g. `kubelet.service` and
`bootkube.service`.

See also: openshift/installer#8742
jlebon added a commit to jlebon/assisted-installer that referenced this pull request Sep 6, 2024
As per openshift/enhancements#1637, we're trying
to get rid of all OpenShift-versioned components from the bootimages.

This means that there will no longer be oc, kubelet, or crio
binaries for example, which bootstrapping obviously relies on.

To adapt to this, the OpenShift installer now ships a new
`node-image-overlay.service` in its bootstrap Ignition config. This
service takes care of pulling down the node image and overlaying it,
effectively updating the system to the node image version.

Here, we accordingly also adapt assisted-installer so that we run
`node-image-overlay.service` before starting e.g. `kubelet.service` and
`bootkube.service`.

See also: openshift/installer#8742
@jlebon
Copy link
Member Author

jlebon commented Jan 15, 2025

Just to export some conversations we had with @mhrivnak about this. To summarize a few points:

  • This approach definitely has warts. After overlaying, we're going to still be running software from the bootimage, notably the network stack and systemd and obviously the kernel. The cleanest workaround for this would be to do a systemd soft-reboot (though in the ABI case, we do need to keep the agent alive across the transition). Obviously we can manually restart individual userspace components if the need arises.
  • It's important to note that the bootimage and node images are both specified by the release payload (i.e. we still have and respect rhcos.json). So the skew is known and can be tested by CI, and it's the same skew that customers would run with.
  • This change makes the release payload more "hermetic"; there's no longer two kubelet and cri-o versions to think about, only one. The contents of the bootimage still matter, but a lot less. This should in theory reduce bootimage churn as well (similarly to how e.g. traditional RHEL/Fedora install media is actually updated quite rarely after GA -- it just needs to work well enough to install and get updates).
  • A middle-ground approach is to still produce bootimages with OCP content whenever needed. The CoreOS pipeline would still maintain mainly the RHEL X.Y-based streams, but we'd have a separate flow to build bootimages from the OCI node images using cosa/b-i-b whenever a bootimage bump is needed. This avoids having to change bootstrapping, but adds release engineering complexity and reduces turnaround speed.
  • A lot of the gnarly bash code here should eventually become Rust code in bootc. The live application of updates is a desirable feature for RHEL Image Mode in general and something we'll want to support.
  • @jlebon may be completely wrong and we may see a sea of red in CI dashboards (it wouldn't be after this merges, but after the bootimage bump where we switch to the new bootimages). If that happens, based on what fails and how, we may revert this change and go back to the drawing board.

@mhrivnak
Copy link
Member

I'll just add that bootstrap issues can be difficult to troubleshoot even today. With this new complexity, we should ensure that there's a clear and documented way to reproduce a customer's bootstrap environment in a lab. That should include:

  • a clear way to know which coreos image and which node image were used.
  • a clear log message marking when the node image overlay happened (so when examining other log messages and events, we can know whether they happened with the old or new userspace)

@jlebon
Copy link
Member Author

jlebon commented Jan 16, 2025

a clear way to know which coreos image and which node image were used.

I mentioned this in #8742 (comment), but the bootimage is specified by the payload as today. Of course, in some scenarios it's users that control the bootimage. The booted RHCOS version is visible in the journal logs but it's definitely not obvious. I've added a more direct journal log entry with that info now.

a clear log message marking when the node image overlay happened (so when examining other log messages and events, we can know whether they happened with the old or new userspace)

Thanks, this is good feedback. Added more logging in general from the new scripts. Let me know if that matches what you were thinking!

openshift/release#60534 is enabling unconditional collection of bootstrap logs, which should make all this easier to see in CI too but we'll of course only see it as part of the PR that changes the bootimages.

Edit: OK, just to give a better idea, here's a truncated bootstrap log to see what it looks like currently: bootstrap.log

@mhrivnak
Copy link
Member

It's common for assisted-installer to boot a live ISO on systems before knowing which openshift version will get installed on them. That running live ISO will then be used for bootstrap-in-place once an installation is initiated.

@eranco74 @carbonin do we need anything else to accommodate that, or are we all set? Will we grab these scripts and unit files out of the bootstrap ignition?

@jlebon the main point otherwise is that we'll need flexibility to do the overlay on a variety of different versions of the coreos image.

@jlebon
Copy link
Member Author

jlebon commented Jan 16, 2025

It's common for assisted-installer to boot a live ISO on systems before knowing which openshift version will get installed on them. That running live ISO will then be used for bootstrap-in-place once an installation is initiated.

Sorry yeah, you're right. I actually knew this (I even referenced the related late-binding problem in the commit message), but forgot it when writing my previous comment. That's actually a perfect example of "there's no longer two kubelet and cri-o versions to think about, only one".

I do have concerns though with not putting bounds on that skew to a range we can test and document. So I'd still argue for some version check as discussed in this ticket.

@eranco74 @carbonin do we need anything else to accommodate that, or are we all set? Will we grab these scripts and unit files out of the bootstrap ignition?

Yes. This is done by openshift/assisted-installer#899.


UNIT_DIR="${1:-/tmp}"

if ! rpm -q openshift-clients &>/dev/null; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eranco74 @carbonin this is the proxy test for whether the current filesystem needs all of the "node layer" stuff to be layered on. Any concerns for the assisted-installer live ISO? We don't install this, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we don't install anything additional on the rootfs

@mhrivnak
Copy link
Member

@jlebon ah yes, sorry I forgot you did that for us! Thanks for that.

The logging looks good to me. I think it'll help.

@andfasano
Copy link
Contributor

/test e2e-agent-compact-ipv4
/test e2e-agent-sno-ipv6
/test e2e-agent-ha-dualstack

@andfasano
Copy link
Contributor

I've tested locally via dev-scripts using also openshift/assisted-installer#899 (and the related live boot ISO), and it went fine.

@jlebon
Copy link
Member Author

jlebon commented Jan 17, 2025

/retest

@jlebon jlebon changed the title overlay node image before bootstrapping if necessary COS-3013: overlay node image before bootstrapping if necessary Jan 17, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jan 17, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jan 17, 2025

@jlebon: This pull request references COS-3013 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.19.0" version, but no target version was set.

In response to this:

As per openshift/enhancements#1637, we're trying to get rid of all OpenShift-versioned components from the bootimages.

This means that there will no longer be oc, kubelet, or crio binaries for example, which bootstrapping obviously relies on.

Instead, now we change things up so that early on when booting the bootstrap node, we pull down the node image, unencapsulate it (this just means convert it back to an OSTree commit), then mount over its /usr, and import new /etc content.

This is done by isolating to a different systemd target to only bring up the minimum number of services to do the pivot and then carry on with bootstrapping.

This does not incur additional reboots and should be compatible with AI/ABI/SNO. But it is of course, a huge conceptual shift in how bootstrapping works. With this, we would now always be sure that we're using the same binaries as the target version as part of bootstrapping, which should alleviate some issues such as AI late-binding (see e.g. https://issues.redhat.com/browse/MGMT-16705).

The big exception of course being the kernel. Relatedly, note we do persist /usr/lib/modules from the booted system so that loading kernel modules still works.

To be conservative, the new logic only kicks in when using bootimages which do not have oc. This will allow us to ratchet this in more easily.

Down the line, we should be able to replace some of this with bootc apply-live once that's available (and also works in a live environment). (See containers/bootc#76.)

For full context, see the linked enhancement and discussions there.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

As per openshift/enhancements#1637, we're trying
to get rid of all OpenShift-versioned components from the bootimages.

This means that there will no longer be `oc`, `kubelet`, or `crio`
binaries for example, which bootstrapping obviously relies on.

Instead, now we change things up so that early on when booting the
bootstrap node, we pull down the node image, unencapsulate it (this just
means convert it back to an OSTree commit), then mount over its `/usr`,
and import new `/etc` content.

This is done by isolating to a different systemd target to only bring
up the minimum number of services to do the pivot and then carry on
with bootstrapping.

This does not incur additional reboots and should be compatible
with AI/ABI/SNO. But it is of course, a huge conceptual shift in how
bootstrapping works. With this, we would now always be sure that we're
using the same binaries as the target version as part of bootstrapping,
which should alleviate some issues such as AI late-binding (see e.g.
https://issues.redhat.com/browse/MGMT-16705).

The big exception of course being the kernel. Relatedly, note we do
persist `/usr/lib/modules` from the booted system so that loading kernel
modules still works.

To be conservative, the new logic only kicks in when using bootimages
which do not have `oc`. This will allow us to ratchet this in more
easily.

Down the line, we should be able to replace some of this with
`bootc apply-live` once that's available (and also works in a live
environment). (See containers/bootc#76.)

For full context, see the linked enhancement and discussions there.
golint was complaining about:

```
pkg/asset/ignition/bootstrap/common.go:406:2: ifElseChain: rewrite if-else to switch statement (gocritic)
	if parentDir == "bin" || parentDir == "dispatcher.d" || parentDir == "system-generators" {
	^
```
This is obsoleted now by `node-image-overlay.target` and its associated
units which do a similar thing but applies to both OKD and OCP.
@jlebon
Copy link
Member Author

jlebon commented Jan 17, 2025

Updated this now! I've confirmed that containers/common#1758 is in RHEL 9.6, so we can drop the workaround for it. Also switched from bootc status to rpm-ostree status for now to work around containers/bootc#1043.

Tested both a standard flow and an ABI flow with this.

jlebon added a commit to jlebon/assisted-installer that referenced this pull request Jan 17, 2025
As per openshift/enhancements#1637, we're trying
to get rid of all OpenShift-versioned components from the bootimages.

This means that there will no longer be oc, kubelet, or crio
binaries for example, which bootstrapping obviously relies on.

To adapt to this, the OpenShift installer now ships a new
`node-image-overlay.service` in its bootstrap Ignition config. This
service takes care of pulling down the node image and overlaying it,
effectively updating the system to the node image version.

Here, we accordingly also adapt assisted-installer so that we run
`node-image-overlay.service` before starting e.g. `kubelet.service` and
`bootkube.service`.

See also: openshift/installer#8742
@patrickdillon
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 17, 2025
Copy link
Contributor

openshift-ci bot commented Jan 18, 2025

@jlebon: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-heterogeneous 9f5f3eec1dd5f241bebc69fe1862b092e9484ee2 link false /test e2e-aws-ovn-heterogeneous
ci/prow/e2e-vsphere-ovn 4c29e7f link true /test e2e-vsphere-ovn
ci/prow/integration-tests 4c29e7f link true /test integration-tests
ci/prow/e2e-aws-ovn-edge-zones-manifest-validation 4c29e7f link true /test e2e-aws-ovn-edge-zones-manifest-validation
ci/prow/e2e-metal-ipi-ovn-ipv6 4c29e7f link true /test e2e-metal-ipi-ovn-ipv6
ci/prow/e2e-azure-ovn 4c29e7f link true /test e2e-azure-ovn
ci/prow/e2e-gcp-ovn-upi 4c29e7f link true /test e2e-gcp-ovn-upi
ci/prow/e2e-aws-ovn-upi 4c29e7f link true /test e2e-aws-ovn-upi
ci/prow/e2e-gcp-ovn 4c29e7f link true /test e2e-gcp-ovn
ci/prow/terraform-verify-vendor 4c29e7f link true /test terraform-verify-vendor
ci/prow/e2e-vsphere-ovn-upi 4c29e7f link true /test e2e-vsphere-ovn-upi
ci/prow/integration-tests-nodejoiner 4c29e7f link true /test integration-tests-nodejoiner
ci/prow/e2e-azure-ovn-upi 4c29e7f link true /test e2e-azure-ovn-upi
ci/prow/e2e-openstack-ovn 751e6ae link true /test e2e-openstack-ovn
ci/prow/okd-scos-e2e-aws-ovn ef7700e link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-vsphere-host-groups-ovn-custom-no-upgrade ef7700e link false /test e2e-vsphere-host-groups-ovn-custom-no-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit cdd054f into openshift:main Jan 18, 2025
17 of 19 checks passed
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-installer-altinfra
This PR has been included in build ose-installer-altinfra-container-v4.19.0-202501180437.p0.gcdd054f.assembly.stream.el9.
All builds following this will include this PR.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-installer-terraform-providers
This PR has been included in build ose-installer-terraform-providers-container-v4.19.0-202501180437.p0.gcdd054f.assembly.stream.el9.
All builds following this will include this PR.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-baremetal-installer
This PR has been included in build ose-baremetal-installer-container-v4.19.0-202501180437.p0.gcdd054f.assembly.stream.el9.
All builds following this will include this PR.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-installer-artifacts
This PR has been included in build ose-installer-artifacts-container-v4.19.0-202501180437.p0.gcdd054f.assembly.stream.el9.
All builds following this will include this PR.

jlebon added a commit to jlebon/assisted-installer that referenced this pull request Jan 20, 2025
As per openshift/enhancements#1637, we're trying
to get rid of all OpenShift-versioned components from the bootimages.

This means that there will no longer be oc, kubelet, or crio
binaries for example, which bootstrapping obviously relies on.

To adapt to this, the OpenShift installer now ships a new
`node-image-overlay.service` in its bootstrap Ignition config. This
service takes care of pulling down the node image and overlaying it,
effectively updating the system to the node image version.

Here, we accordingly also adapt assisted-installer so that we run
`node-image-overlay.service` before starting e.g. `kubelet.service` and
`bootkube.service`.

See also: openshift/installer#8742
openshift-merge-bot bot pushed a commit to openshift/assisted-installer that referenced this pull request Jan 21, 2025
* ops: add new FileExists method

Prep for next patch. Also use that in one spot where we were manually
calling `stat`.

* overlay node image before bootstrapping if necessary

As per openshift/enhancements#1637, we're trying
to get rid of all OpenShift-versioned components from the bootimages.

This means that there will no longer be oc, kubelet, or crio
binaries for example, which bootstrapping obviously relies on.

To adapt to this, the OpenShift installer now ships a new
`node-image-overlay.service` in its bootstrap Ignition config. This
service takes care of pulling down the node image and overlaying it,
effectively updating the system to the node image version.

Here, we accordingly also adapt assisted-installer so that we run
`node-image-overlay.service` before starting e.g. `kubelet.service` and
`bootkube.service`.

See also: openshift/installer#8742
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.