-
Notifications
You must be signed in to change notification settings - Fork 484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split RHCOS into layers #1637
Split RHCOS into layers #1637
Conversation
Skipping CI for Draft Pull Request. |
067ece5
to
f79684b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work on this!
openshift/kubernetes has a specific workflow where jobs will build a new kubelet to use during the job run. This helps with rebase work and validating new kubernetes versions coming into OpenShift. We should preserve this workflow when migrating to RHCOS layering. /cc @soltysh |
I don't expect any issues there. That workflow should keep working as is. |
f79684b
to
a6a7438
Compare
/cc @cybertron @andfasano |
I believe this was the pre-req work done in openshift/kubernetes#1805, which ensured we won't have problems in o/k. |
OK, so let's resume the bootstrapping issue. Restating some of the things from above and from researching further:
What I'm playing with now is basically to have a special This is in effect like a more aggressive WIP for this in openshift/installer#8742. |
@jlebon That sounds like it might work. Where will the Kubelet be coming from? An OpenShift built image? |
Won't doing |
From the node image (i.e. for OCP, the
No. The system boots into |
Via a generator overriding |
These images are built as part of the CoreOS pipeline. They will be used as bases for building the node images containing OCP-versioned content for CI. Part of openshift/enhancements#1637.
As part of openshift/enhancements#1637, we want to start building the node image as a layered build on top of an RHCOS base image. For now, promote this image as `node`. In the future, when we're ready to switch CI over to the node image, it'll take the place of `rhel-coreos`.
As part of openshift/enhancements#1637, we want to start building the node image as a layered build on top of an RHCOS base image. For now, don't promote this image. In the future, when we're ready to switch CI over, it'll take the place of `rhel-coreos`.
* openshift/os: start building node image As part of openshift/enhancements#1637, we want to start building the node image as a layered build on top of an RHCOS base image. For now, don't promote this image. In the future, when we're ready to switch CI over, it'll take the place of `rhel-coreos`. * openshift/os: add an e2e-aws test Now that we're building the node image in CI, we can run cluster tests with it. Let's start simple for now with just the standard e2e-aws test. Note that it doesn't run by default. This means that we can request it on specific PRs only using `/test`.
As per openshift/enhancements#1637, we're trying to get rid of all OpenShift-versioned components from the bootimages. This means that there will no longer be `oc`, `kubelet`, or `crio` binaries for example, which bootstrapping obviously relies on. Instead, now we change things up so that early on when booting the bootstrap node, we pull down the node image, unencapsulate it (this just means convert it back to an OSTree commit), then mount over its `/usr`, and import new `/etc` content. This is done by isolating to a different systemd target to only bring up the minimum number of services to do the pivot and then carry on with bootstrapping. This does not incur additional reboots and should be compatible with AI/ABI/SNO. But it is of course, a huge conceptual shift in how bootstrapping works. With this, we would now always be sure that we're using the same binaries as the target version as part of bootstrapping, which should alleviate some issues such as AI late-binding (see e.g. https://issues.redhat.com/browse/MGMT-16705). The big exception of course being the kernel. Relatedly, note we do persist `/usr/lib/modules` from the booted system so that loading kernel modules still works. To be conservative, the new logic only kicks in when using bootimages which do not have `oc`. This will allow us to ratchet this in more easily. Down the line, we should be able to replace some of this with `bootc apply-live` once that's available (and also works in a live environment). (See containers/bootc#76.) For full context, see the linked enhancement and discussions there.
As per openshift/enhancements#1637, we're trying to get rid of all OpenShift-versioned components from the bootimages. This means that there will no longer be oc, kubelet, or crio binaries for example, which bootstrapping obviously relies on. To adapt to this, the OpenShift installer now ships a new `node-image-overlay.service` in its bootstrap Ignition config. This service takes care of pulling down the node image and overlaying it, effectively updating the system to the node image version. Here, we accordingly also adapt assisted-installer so that we run `node-image-overlay.service` before starting e.g. `kubelet.service` and `bootkube.service`. See also: openshift/installer#8742
These images are built as part of the CoreOS pipeline. They will be used as bases for building the node images containing OCP-versioned content for CI. Part of openshift/enhancements#1637.
* openshift/os: start building node image As part of openshift/enhancements#1637, we want to start building the node image as a layered build on top of an RHCOS base image. For now, don't promote this image. In the future, when we're ready to switch CI over, it'll take the place of `rhel-coreos`. * openshift/os: add an e2e-aws test Now that we're building the node image in CI, we can run cluster tests with it. Let's start simple for now with just the standard e2e-aws test. Note that it doesn't run by default. This means that we can request it on specific PRs only using `/test`.
As per openshift/enhancements#1637, we're trying to get rid of all OpenShift-versioned components from the bootimages. This means that there will no longer be `oc`, `kubelet`, or `crio` binaries for example, which bootstrapping obviously relies on. Instead, now we change things up so that early on when booting the bootstrap node, we pull down the node image, unencapsulate it (this just means convert it back to an OSTree commit), then mount over its `/usr`, and import new `/etc` content. This is done by isolating to a different systemd target to only bring up the minimum number of services to do the pivot and then carry on with bootstrapping. This does not incur additional reboots and should be compatible with AI/ABI/SNO. But it is of course, a huge conceptual shift in how bootstrapping works. With this, we would now always be sure that we're using the same binaries as the target version as part of bootstrapping, which should alleviate some issues such as AI late-binding (see e.g. https://issues.redhat.com/browse/MGMT-16705). The big exception of course being the kernel. Relatedly, note we do persist `/usr/lib/modules` from the booted system so that loading kernel modules still works. To be conservative, the new logic only kicks in when using bootimages which do not have `oc`. This will allow us to ratchet this in more easily. Down the line, we should be able to replace some of this with `bootc apply-live` once that's available (and also works in a live environment). (See containers/bootc#76.) For full context, see the linked enhancement and discussions there.
As per openshift/enhancements#1637, we're trying to get rid of all OpenShift-versioned components from the bootimages. This means that there will no longer be `oc`, `kubelet`, or `crio` binaries for example, which bootstrapping obviously relies on. Instead, now we change things up so that early on when booting the bootstrap node, we pull down the node image, unencapsulate it (this just means convert it back to an OSTree commit), then mount over its `/usr`, and import new `/etc` content. This is done by isolating to a different systemd target to only bring up the minimum number of services to do the pivot and then carry on with bootstrapping. This does not incur additional reboots and should be compatible with AI/ABI/SNO. But it is of course, a huge conceptual shift in how bootstrapping works. With this, we would now always be sure that we're using the same binaries as the target version as part of bootstrapping, which should alleviate some issues such as AI late-binding (see e.g. https://issues.redhat.com/browse/MGMT-16705). The big exception of course being the kernel. Relatedly, note we do persist `/usr/lib/modules` from the booted system so that loading kernel modules still works. To be conservative, the new logic only kicks in when using bootimages which do not have `oc`. This will allow us to ratchet this in more easily. Down the line, we should be able to replace some of this with `bootc apply-live` once that's available (and also works in a live environment). (See containers/bootc#76.) For full context, see the linked enhancement and discussions there.
As per openshift/enhancements#1637, we're trying to get rid of all OpenShift-versioned components from the bootimages. This means that there will no longer be `oc`, `kubelet`, or `crio` binaries for example, which bootstrapping obviously relies on. Instead, now we change things up so that early on when booting the bootstrap node, we pull down the node image, unencapsulate it (this just means convert it back to an OSTree commit), then mount over its `/usr`, and import new `/etc` content. This is done by isolating to a different systemd target to only bring up the minimum number of services to do the pivot and then carry on with bootstrapping. This does not incur additional reboots and should be compatible with AI/ABI/SNO. But it is of course, a huge conceptual shift in how bootstrapping works. With this, we would now always be sure that we're using the same binaries as the target version as part of bootstrapping, which should alleviate some issues such as AI late-binding (see e.g. https://issues.redhat.com/browse/MGMT-16705). The big exception of course being the kernel. Relatedly, note we do persist `/usr/lib/modules` from the booted system so that loading kernel modules still works. To be conservative, the new logic only kicks in when using bootimages which do not have `oc`. This will allow us to ratchet this in more easily. Down the line, we should be able to replace some of this with `bootc apply-live` once that's available (and also works in a live environment). (See containers/bootc#76.) For full context, see the linked enhancement and discussions there.
As per openshift/enhancements#1637, we're trying to get rid of all OpenShift-versioned components from the bootimages. This means that there will no longer be `oc`, `kubelet`, or `crio` binaries for example, which bootstrapping obviously relies on. Instead, now we change things up so that early on when booting the bootstrap node, we pull down the node image, unencapsulate it (this just means convert it back to an OSTree commit), then mount over its `/usr`, and import new `/etc` content. This is done by isolating to a different systemd target to only bring up the minimum number of services to do the pivot and then carry on with bootstrapping. This does not incur additional reboots and should be compatible with AI/ABI/SNO. But it is of course, a huge conceptual shift in how bootstrapping works. With this, we would now always be sure that we're using the same binaries as the target version as part of bootstrapping, which should alleviate some issues such as AI late-binding (see e.g. https://issues.redhat.com/browse/MGMT-16705). The big exception of course being the kernel. Relatedly, note we do persist `/usr/lib/modules` from the booted system so that loading kernel modules still works. To be conservative, the new logic only kicks in when using bootimages which do not have `oc`. This will allow us to ratchet this in more easily. Down the line, we should be able to replace some of this with `bootc apply-live` once that's available (and also works in a live environment). (See containers/bootc#76.) For full context, see the linked enhancement and discussions there.
As per openshift/enhancements#1637, we're trying to get rid of all OpenShift-versioned components from the bootimages. This means that there will no longer be oc, kubelet, or crio binaries for example, which bootstrapping obviously relies on. To adapt to this, the OpenShift installer now ships a new `node-image-overlay.service` in its bootstrap Ignition config. This service takes care of pulling down the node image and overlaying it, effectively updating the system to the node image version. Here, we accordingly also adapt assisted-installer so that we run `node-image-overlay.service` before starting e.g. `kubelet.service` and `bootkube.service`. See also: openshift/installer#8742
As per openshift/enhancements#1637, we're trying to get rid of all OpenShift-versioned components from the bootimages. This means that there will no longer be oc, kubelet, or crio binaries for example, which bootstrapping obviously relies on. To adapt to this, the OpenShift installer now ships a new `node-image-overlay.service` in its bootstrap Ignition config. This service takes care of pulling down the node image and overlaying it, effectively updating the system to the node image version. Here, we accordingly also adapt assisted-installer so that we run `node-image-overlay.service` before starting e.g. `kubelet.service` and `bootkube.service`. See also: openshift/installer#8742
* ops: add new FileExists method Prep for next patch. Also use that in one spot where we were manually calling `stat`. * overlay node image before bootstrapping if necessary As per openshift/enhancements#1637, we're trying to get rid of all OpenShift-versioned components from the bootimages. This means that there will no longer be oc, kubelet, or crio binaries for example, which bootstrapping obviously relies on. To adapt to this, the OpenShift installer now ships a new `node-image-overlay.service` in its bootstrap Ignition config. This service takes care of pulling down the node image and overlaying it, effectively updating the system to the node image version. Here, we accordingly also adapt assisted-installer so that we run `node-image-overlay.service` before starting e.g. `kubelet.service` and `bootkube.service`. See also: openshift/installer#8742
As part of openshift/enhancements#1637, the version string for RHCOS will change from being OCP+RHEL-based (e.g. 419.96...) to being purely RHEL-based (e.g. 9.6...). Adapt the logic for this new scheme so that it links to the right stream. We should eventually clean this up though so that the stream name is available more directly to the release controller so that it doesn't need to do any guessing.
As part of openshift/enhancements#1637, the CoreOS pipeline now only builds a RHEL-only RHCOS base image and later on, a node image is built on top of this base image to add all the OCP-specific packages. As a result, the RHCOS release browser will only display the diff of the _base_ image content, and will not have any OCP content. Often, that's sufficient. E.g. if you're just interested in kernel or systemd changes, the RHCOS release browser is enough. However, users can be confused by the lack of OCP packages in the list. Let's add an info box to the changelog page with instructions to generate a full diff. As a bonus, these instructions also conveniently serve as a way to get any diff at all without VPN access. Of course, being able to generate this diff ourselves and rendering it would be useful. And such a mechanism need not be specific to the CoreOS image; any of the many OCP images we ship which contain RPMs would benefit from being able to view package diffs. The most likely candidate for implementing this would be in `oc adm release info`, but downloading images to generate diffs is a much more expensive operation than the git changelog-based one. So a caching service might be better instead. (That said, it's possible with Konflux that we'll end up storing RPM lockfiles in git, in which case an RPM diff _is_ a git diff which matches nicely the existing semantics.)
As part of openshift/enhancements#1637, the CoreOS pipeline now only builds a RHEL-only RHCOS base image and later on, a node image is built on top of this base image to add all the OCP-specific packages. As a result, the RHCOS release browser will only display the diff of the _base_ image content, and will not have any OCP content. Often, that's sufficient. E.g. if you're just interested in kernel or systemd changes, the RHCOS release browser is enough. However, users can be confused by the lack of OCP packages in the list. Let's add an info box to the changelog page with instructions to generate a full diff. But only display it if one of the RHCOS versions is of the new RHEL-only kind. As a bonus, these instructions also conveniently serve as a way to get any diff at all without VPN access. Of course, being able to generate this diff ourselves and rendering it would be useful. And such a mechanism need not be specific to the CoreOS image; any of the many OCP images we ship which contain RPMs would benefit from being able to view package diffs. The most likely candidate for implementing this would be in `oc adm release info`, but downloading images to generate diffs is a much more expensive operation than the git changelog-based one. So a caching service might be better instead. (That said, it's possible with Konflux that we'll end up storing RPM lockfiles in git, in which case an RPM diff _is_ a git diff which matches nicely the existing semantics.)
As part of openshift/enhancements#1637, the CoreOS pipeline now only builds a RHEL-only RHCOS base image and later on, a node image is built on top of this base image to add all the OCP-specific packages. As a result, the RHCOS release browser will only display the diff of the _base_ image content, and will not have any OCP content. Often, that's sufficient. E.g. if you're just interested in kernel or systemd changes, the RHCOS release browser is enough. However, users can be confused by the lack of OCP packages in the list. Let's add an info box to the changelog page with instructions to generate a full diff. But only display it if one of the RHCOS versions is of the new RHEL-only kind. As a bonus, these instructions also conveniently serve as a way to get any diff at all without VPN access. Of course, being able to generate this diff ourselves and rendering it would be useful. And such a mechanism need not be specific to the CoreOS image; any of the many OCP images we ship which contain RPMs would benefit from being able to view package diffs. The most likely candidate for implementing this would be in `oc adm release info`, but downloading images to generate diffs is a much more expensive operation than the git changelog-based one. So a caching service might be better instead. (That said, it's possible with Konflux that we'll end up storing RPM lockfiles in git, in which case an RPM diff _is_ a git diff which matches nicely the existing semantics.)
As part of openshift/enhancements#1637, the CoreOS pipeline now only builds a RHEL-only RHCOS base image and later on, a node image is built on top of this base image to add all the OCP-specific packages. As a result, the RHCOS release browser will only display the diff of the _base_ image content, and will not have any OCP content. Often, that's sufficient. E.g. if you're just interested in kernel or systemd changes, the RHCOS release browser is enough. However, users can be confused by the lack of OCP packages in the list. Let's add an info box to the changelog page with instructions to generate a full diff. But only display it if one of the RHCOS versions is of the new RHEL-only kind. As a bonus, these instructions also conveniently serve as a way to get any diff at all without VPN access. Of course, being able to generate this diff ourselves and rendering it would be useful. And such a mechanism need not be specific to the CoreOS image; any of the many OCP images we ship which contain RPMs would benefit from being able to view package diffs. The most likely candidate for implementing this would be in `oc adm release info`, but downloading images to generate diffs is a much more expensive operation than the git changelog-based one. So a caching service might be better instead. (That said, it's possible with Konflux that we'll end up storing RPM lockfiles in git, in which case an RPM diff _is_ a git diff which matches nicely the existing semantics.)
It's often useful when looking up release images to know the list of RPM packages that shipped in the node image. Add new switches for this: - `oc adm release info --rpmdb $IMG` will list all the packages in the node image for the given release image payload - `oc adm release info --rpmdb-diff $IMG1 $IMG2` will diff the set of packages in the node image for the given release image payloads The code is generic over the actual target image. By default, the node image is used, but `--rpmdb-image` can be used to select a different one. The primary motivation for this is openshift/enhancements#1637, in which the node image will no longer be built within the CoreOS pipeline as a base image. Instead, it will be a layered image built in OpenShift CI/Konflux. As a result, all layered packages will not show up in the CoreOS release browser differ. With this functionality, the release controller will be able to render RPM diffs in the web UI, greatly de-emphasize the CoreOS differ and effectively dropping the requirement for having VPN access. Some notes on the implementation: - The rpmdb for a given image is cached, keyed by the image digest. - We don't try to be smart here and e.g. only download some layers. There are some issues with doing that. We literally do download the full image, _but_ we only cache the rpmdb content and throw away the rest. That said, the high cost isn't an issue in practice because the release controller can nicely represent operations which take time so it didn't feel worth the effort of trying to optimize this further. Once we have SBOMs available for all our images, this should be a much cheaper way to query its RPM contents. Additionally/alternatively, for the node image specifically, if we ever end up with lockfiles in the git repo, this would effectively mean that the git changelog _is_ the RPM changelog also, meshing nicely with the existing infrastructure around that.
It's often useful when looking up release images to know the list of RPM packages that shipped in the node image. Add new switches for this: - `oc adm release info --rpmdb $IMG` will list all the packages in the node image for the given release image payload - `oc adm release info --rpmdb-diff $IMG1 $IMG2` will diff the set of packages in the node image for the given release image payloads The code is generic over the actual target image. By default, the node image is used, but `--rpmdb-image` can be used to select a different one. The primary motivation for this is openshift/enhancements#1637, in which the node image will no longer be built within the CoreOS pipeline as a base image. Instead, it will be a layered image built in OpenShift CI/Konflux. As a result, all layered packages will not show up in the CoreOS release browser differ. With this functionality, the release controller will be able to render RPM diffs in the web UI, greatly de-emphasize the CoreOS differ and effectively dropping the requirement for having VPN access. Some notes on the implementation: - The rpmdb for a given image is cached, keyed by the image digest. - We don't try to be smart here and e.g. only download some layers. There are some issues with doing that. We literally do download the full image, _but_ we only cache the rpmdb content and throw away the rest. That said, the high cost isn't an issue in practice because the release controller can nicely represent operations which take time so it didn't feel worth the effort of trying to optimize this further. Once we have SBOMs available for all our images, this should be a much cheaper way to query its RPM contents. Additionally/alternatively, for the node image specifically, if we ever end up with lockfiles in the git repo, this would effectively mean that the git changelog _is_ the RPM changelog also, meshing nicely with the existing infrastructure around that.
These images are built as part of the CoreOS pipeline. They will be used as bases for building the node images containing OCP-versioned content for CI. Part of openshift/enhancements#1637.
* openshift/os: start building node image As part of openshift/enhancements#1637, we want to start building the node image as a layered build on top of an RHCOS base image. For now, don't promote this image. In the future, when we're ready to switch CI over, it'll take the place of `rhel-coreos`. * openshift/os: add an e2e-aws test Now that we're building the node image in CI, we can run cluster tests with it. Let's start simple for now with just the standard e2e-aws test. Note that it doesn't run by default. This means that we can request it on specific PRs only using `/test`.
It's often useful when looking up release images to know the list of RPM packages that shipped in the node image. Add new switches for this: - `oc adm release info --rpmdb $IMG` will list all the packages in the node image for the given release image payload - `oc adm release info --rpmdb-diff $IMG1 $IMG2` will diff the set of packages in the node image for the given release image payloads The code is generic over the actual target image. By default, the node image is used, but `--rpmdb-image` can be used to select a different one. The primary motivation for this is openshift/enhancements#1637, in which the node image will no longer be built within the CoreOS pipeline as a base image. Instead, it will be a layered image built in OpenShift CI/Konflux. As a result, all layered packages will not show up in the CoreOS release browser differ. With this functionality, the release controller will be able to render RPM diffs in the web UI, greatly de-emphasize the CoreOS differ and effectively dropping the requirement for having VPN access. Some notes on the implementation: - The rpmdb for a given image is cached, keyed by the image digest. - We don't try to be smart here and e.g. only download some layers. There are some issues with doing that. We literally do download the full image, _but_ we only cache the rpmdb content and throw away the rest. That said, the high cost isn't an issue in practice because the release controller can nicely represent operations which take time so it didn't feel worth the effort of trying to optimize this further. Once we have SBOMs available for all our images, this should be a much cheaper way to query its RPM contents. Additionally/alternatively, for the node image specifically, if we ever end up with lockfiles in the git repo, this would effectively mean that the git changelog _is_ the RPM changelog also, meshing nicely with the existing infrastructure around that.
As part of openshift/enhancements#1637, we've moved OCP 4.19 to use bootimages with RHEL content only. This means that the bootimages built with OCP content will never be used in practice. Add a `skip_disk_images` knob for this. We still generate the QEMU image for kola tests to run and because they're useful to debug, but we drop everything else. (Actually, we could also not generate the QEMU image either and sanity-check the OCI image with `kola run --oscontainer`, but that requires more rewiring.) We don't generate live media since I don't think the test coverage from that is meaningfully different enough from the RHEL-only variants given that the additional OCP packages are unrelated.
As part of openshift/enhancements#1637, we've moved OCP 4.19 to use bootimages with RHEL content only. This means that the bootimages built with OCP content will never be used in practice. Add a `skip_disk_images` knob to disable building them. We still generate the QEMU image for kola tests to run and because they're useful to debug, but we drop everything else. (Actually, we could also not generate the QEMU image either and sanity-check the OCI image with `kola run --oscontainer`, but that requires more rewiring.) We don't generate live media since I don't think the test coverage from that is meaningfully different enough from the RHEL-only variants given that the additional OCP packages are unrelated.
This enhancement describes improvements to the way RHEL CoreOS (RHCOS) is built so that it will better align with image mode for RHEL, all while also providing benefits on the OpenShift side. Currently, RHCOS is built as a single layer that includes both RHEL and OCP content. This enhancement proposes splitting it into three layers. Going from bottom to top:
The terms "bootc layer", "CoreOS layer", and "node layer" will be used throughout this enhancement to refer to these.
The details of this enhancement focus on doing the first split: creating the node layer as distinct from the CoreOS layer (which will not yet be rebased on top of a bootc layer). The two changes involved which most affect OCP are:
rhel-coreos
payload image will be built in Prow/Konflux (as any other)Tracked at: https://issues.redhat.com/browse/OCPSTRAT-1190