Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split RHCOS into layers #1637

Merged

Conversation

jlebon
Copy link
Member

@jlebon jlebon commented Jun 7, 2024

This enhancement describes improvements to the way RHEL CoreOS (RHCOS) is built so that it will better align with image mode for RHEL, all while also providing benefits on the OpenShift side. Currently, RHCOS is built as a single layer that includes both RHEL and OCP content. This enhancement proposes splitting it into three layers. Going from bottom to top:

  1. the (RHEL-versioned) bootc layer (i.e. the base rhel-bootc image shared with image mode for RHEL)
  2. the (RHEL-versioned) CoreOS layer (i.e. coreos-installer, ignition, afterburn, scripts, etc...)
  3. the (OCP-versioned) node layer (i.e. kubelet, cri-o, etc...)

The terms "bootc layer", "CoreOS layer", and "node layer" will be used throughout this enhancement to refer to these.

The details of this enhancement focus on doing the first split: creating the node layer as distinct from the CoreOS layer (which will not yet be rebased on top of a bootc layer). The two changes involved which most affect OCP are:

  1. bootimages will no longer contain OCP components (e.g. kubelet, cri-o, etc...)
  2. the rhel-coreos payload image will be built in Prow/Konflux (as any other)

Tracked at: https://issues.redhat.com/browse/OCPSTRAT-1190

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 7, 2024
Copy link
Contributor

openshift-ci bot commented Jun 7, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work on this!

enhancements/rhcos/split-rhcos-into-layers.md Show resolved Hide resolved
enhancements/rhcos/split-rhcos-into-layers.md Show resolved Hide resolved
enhancements/rhcos/split-rhcos-into-layers.md Outdated Show resolved Hide resolved
enhancements/rhcos/split-rhcos-into-layers.md Show resolved Hide resolved
enhancements/rhcos/split-rhcos-into-layers.md Outdated Show resolved Hide resolved
enhancements/rhcos/split-rhcos-into-layers.md Outdated Show resolved Hide resolved
enhancements/rhcos/split-rhcos-into-layers.md Outdated Show resolved Hide resolved
enhancements/rhcos/split-rhcos-into-layers.md Show resolved Hide resolved
enhancements/rhcos/split-rhcos-into-layers.md Outdated Show resolved Hide resolved
@rphillips
Copy link
Contributor

openshift/kubernetes has a specific workflow where jobs will build a new kubelet to use during the job run. This helps with rebase work and validating new kubernetes versions coming into OpenShift. We should preserve this workflow when migrating to RHCOS layering.

/cc @soltysh

@openshift-ci openshift-ci bot requested a review from soltysh June 12, 2024 16:00
@jlebon
Copy link
Member Author

jlebon commented Jun 12, 2024

openshift/kubernetes has a specific workflow where jobs will build a new kubelet to use during the job run. This helps with rebase work and validating new kubernetes versions coming into OpenShift. We should preserve this workflow when migrating to RHCOS layering.

/cc @soltysh

I don't expect any issues there. That workflow should keep working as is.

@jlebon jlebon force-pushed the pr/split-rhcos-into-layers branch from f79684b to a6a7438 Compare June 20, 2024 21:15
enhancements/rhcos/split-rhcos-into-layers.md Show resolved Hide resolved
enhancements/rhcos/split-rhcos-into-layers.md Show resolved Hide resolved
enhancements/rhcos/split-rhcos-into-layers.md Outdated Show resolved Hide resolved
enhancements/rhcos/split-rhcos-into-layers.md Outdated Show resolved Hide resolved
enhancements/rhcos/split-rhcos-into-layers.md Outdated Show resolved Hide resolved
enhancements/rhcos/split-rhcos-into-layers.md Outdated Show resolved Hide resolved
enhancements/rhcos/split-rhcos-into-layers.md Outdated Show resolved Hide resolved
@zaneb
Copy link
Member

zaneb commented Jun 24, 2024

/cc @cybertron @andfasano

@openshift-ci openshift-ci bot requested review from andfasano and cybertron June 24, 2024 11:59
@soltysh
Copy link

soltysh commented Jun 26, 2024

I don't expect any issues there. That workflow should keep working as is.

I believe this was the pre-req work done in openshift/kubernetes#1805, which ensured we won't have problems in o/k.

@jlebon
Copy link
Member Author

jlebon commented Jul 16, 2024

OK, so let's resume the bootstrapping issue. Restating some of the things from above and from researching further:

  • We can't run the kubelet in a container because it's no longer supported.
  • The delta between kubelet and podman play is too large to make the latter a feasible replacement.
  • systemctl soft-reboot is not in RHEL9.
  • In the AI/ABI/SNO cases, bootstrapping happens in the live environment where e.g. rebooting is not possible.
  • I considered cobbling something around kexec, but in the limit, there are potential issues with kexec and hardware reliability, as well as how it meshes with Secure Boot.

What I'm playing with now is basically to have a special node-image-pivot.target that the node isolates to first. There, we pull the node image, unencapsulate it, check out its contents, and then mount over /usr and do a rough 3-way /etc merge. We then isolate back to multi-user.target to continue with the bootstrapping process.

This is in effect like a more aggressive bootc/rpm-ostree apply-live, though that doesn't currently work in live environments. (Though even in the non-live case, there are some issues there that would need to be resolved.) It's close to what OKD currently does when using FCOS live media today, though using the ostree stack and isolating targets should make this more robust.

WIP for this in openshift/installer#8742.

@rphillips
Copy link
Contributor

@jlebon That sounds like it might work. Where will the Kubelet be coming from? An OpenShift built image?

@zaneb
Copy link
Member

zaneb commented Jul 17, 2024

Won't doing systemctl isolate node-image-pivot.targethave the effect of stopping the assisted/agent services that we need to avoid stopping?

@jlebon
Copy link
Member Author

jlebon commented Jul 17, 2024

@jlebon That sounds like it might work. Where will the Kubelet be coming from? An OpenShift built image?

From the node image (i.e. for OCP, the rhel-coreos image in the release payload).

Won't doing systemctl isolate node-image-pivot.targethave the effect of stopping the assisted/agent services that we need to avoid stopping?

No. The system boots into node-image-pivot.target first. Any other service hooked into multi-user.target aren't started until after we've finished the live pivot.

@cgwalters
Copy link
Member

The system boots into node-image-pivot.target first.

Via a generator overriding default.target?

yingzhanredhat pushed a commit to yingzhanredhat/release that referenced this pull request Dec 18, 2024
These images are built as part of the CoreOS pipeline. They will be used
as bases for building the node images containing OCP-versioned content
for CI.

Part of openshift/enhancements#1637.
jlebon added a commit to jlebon/release that referenced this pull request Dec 19, 2024
As part of openshift/enhancements#1637, we want
to start building the node image as a layered build on top of an RHCOS
base image.

For now, promote this image as `node`. In the future, when we're
ready to switch CI over to the node image, it'll take the place of
`rhel-coreos`.
jlebon added a commit to jlebon/release that referenced this pull request Dec 19, 2024
As part of openshift/enhancements#1637, we want
to start building the node image as a layered build on top of an RHCOS
base image.

For now, don't promote this image. In the future, when we're ready to
switch CI over, it'll take the place of `rhel-coreos`.
openshift-merge-bot bot pushed a commit to openshift/release that referenced this pull request Dec 19, 2024
* openshift/os: start building node image

As part of openshift/enhancements#1637, we want
to start building the node image as a layered build on top of an RHCOS
base image.

For now, don't promote this image. In the future, when we're ready to
switch CI over, it'll take the place of `rhel-coreos`.

* openshift/os: add an e2e-aws test

Now that we're building the node image in CI, we can run cluster tests
with it. Let's start simple for now with just the standard e2e-aws test.
Note that it doesn't run by default. This means that we can request it
on specific PRs only using `/test`.
jlebon added a commit to jlebon/installer that referenced this pull request Dec 20, 2024
As per openshift/enhancements#1637, we're trying
to get rid of all OpenShift-versioned components from the bootimages.

This means that there will no longer be `oc`, `kubelet`, or `crio`
binaries for example, which bootstrapping obviously relies on.

Instead, now we change things up so that early on when booting the
bootstrap node, we pull down the node image, unencapsulate it (this just
means convert it back to an OSTree commit), then mount over its `/usr`,
and import new `/etc` content.

This is done by isolating to a different systemd target to only bring
up the minimum number of services to do the pivot and then carry on
with bootstrapping.

This does not incur additional reboots and should be compatible
with AI/ABI/SNO. But it is of course, a huge conceptual shift in how
bootstrapping works. With this, we would now always be sure that we're
using the same binaries as the target version as part of bootstrapping,
which should alleviate some issues such as AI late-binding (see e.g.
https://issues.redhat.com/browse/MGMT-16705).

The big exception of course being the kernel. Relatedly, note we do
persist `/usr/lib/modules` from the booted system so that loading kernel
modules still works.

To be conservative, the new logic only kicks in when using bootimages
which do not have `oc`. This will allow us to ratchet this in more
easily.

Down the line, we should be able to replace some of this with
`bootc apply-live` once that's available (and also works in a live
environment). (See containers/bootc#76.)

For full context, see the linked enhancement and discussions there.
jlebon added a commit to jlebon/assisted-installer that referenced this pull request Dec 21, 2024
As per openshift/enhancements#1637, we're trying
to get rid of all OpenShift-versioned components from the bootimages.

This means that there will no longer be oc, kubelet, or crio
binaries for example, which bootstrapping obviously relies on.

To adapt to this, the OpenShift installer now ships a new
`node-image-overlay.service` in its bootstrap Ignition config. This
service takes care of pulling down the node image and overlaying it,
effectively updating the system to the node image version.

Here, we accordingly also adapt assisted-installer so that we run
`node-image-overlay.service` before starting e.g. `kubelet.service` and
`bootkube.service`.

See also: openshift/installer#8742
yingzhanredhat pushed a commit to yingzhanredhat/release that referenced this pull request Dec 24, 2024
These images are built as part of the CoreOS pipeline. They will be used
as bases for building the node images containing OCP-versioned content
for CI.

Part of openshift/enhancements#1637.
yingzhanredhat pushed a commit to yingzhanredhat/release that referenced this pull request Dec 24, 2024
* openshift/os: start building node image

As part of openshift/enhancements#1637, we want
to start building the node image as a layered build on top of an RHCOS
base image.

For now, don't promote this image. In the future, when we're ready to
switch CI over, it'll take the place of `rhel-coreos`.

* openshift/os: add an e2e-aws test

Now that we're building the node image in CI, we can run cluster tests
with it. Let's start simple for now with just the standard e2e-aws test.
Note that it doesn't run by default. This means that we can request it
on specific PRs only using `/test`.
jlebon added a commit to jlebon/installer that referenced this pull request Jan 10, 2025
As per openshift/enhancements#1637, we're trying
to get rid of all OpenShift-versioned components from the bootimages.

This means that there will no longer be `oc`, `kubelet`, or `crio`
binaries for example, which bootstrapping obviously relies on.

Instead, now we change things up so that early on when booting the
bootstrap node, we pull down the node image, unencapsulate it (this just
means convert it back to an OSTree commit), then mount over its `/usr`,
and import new `/etc` content.

This is done by isolating to a different systemd target to only bring
up the minimum number of services to do the pivot and then carry on
with bootstrapping.

This does not incur additional reboots and should be compatible
with AI/ABI/SNO. But it is of course, a huge conceptual shift in how
bootstrapping works. With this, we would now always be sure that we're
using the same binaries as the target version as part of bootstrapping,
which should alleviate some issues such as AI late-binding (see e.g.
https://issues.redhat.com/browse/MGMT-16705).

The big exception of course being the kernel. Relatedly, note we do
persist `/usr/lib/modules` from the booted system so that loading kernel
modules still works.

To be conservative, the new logic only kicks in when using bootimages
which do not have `oc`. This will allow us to ratchet this in more
easily.

Down the line, we should be able to replace some of this with
`bootc apply-live` once that's available (and also works in a live
environment). (See containers/bootc#76.)

For full context, see the linked enhancement and discussions there.
jlebon added a commit to jlebon/installer that referenced this pull request Jan 13, 2025
As per openshift/enhancements#1637, we're trying
to get rid of all OpenShift-versioned components from the bootimages.

This means that there will no longer be `oc`, `kubelet`, or `crio`
binaries for example, which bootstrapping obviously relies on.

Instead, now we change things up so that early on when booting the
bootstrap node, we pull down the node image, unencapsulate it (this just
means convert it back to an OSTree commit), then mount over its `/usr`,
and import new `/etc` content.

This is done by isolating to a different systemd target to only bring
up the minimum number of services to do the pivot and then carry on
with bootstrapping.

This does not incur additional reboots and should be compatible
with AI/ABI/SNO. But it is of course, a huge conceptual shift in how
bootstrapping works. With this, we would now always be sure that we're
using the same binaries as the target version as part of bootstrapping,
which should alleviate some issues such as AI late-binding (see e.g.
https://issues.redhat.com/browse/MGMT-16705).

The big exception of course being the kernel. Relatedly, note we do
persist `/usr/lib/modules` from the booted system so that loading kernel
modules still works.

To be conservative, the new logic only kicks in when using bootimages
which do not have `oc`. This will allow us to ratchet this in more
easily.

Down the line, we should be able to replace some of this with
`bootc apply-live` once that's available (and also works in a live
environment). (See containers/bootc#76.)

For full context, see the linked enhancement and discussions there.
jlebon added a commit to jlebon/installer that referenced this pull request Jan 16, 2025
As per openshift/enhancements#1637, we're trying
to get rid of all OpenShift-versioned components from the bootimages.

This means that there will no longer be `oc`, `kubelet`, or `crio`
binaries for example, which bootstrapping obviously relies on.

Instead, now we change things up so that early on when booting the
bootstrap node, we pull down the node image, unencapsulate it (this just
means convert it back to an OSTree commit), then mount over its `/usr`,
and import new `/etc` content.

This is done by isolating to a different systemd target to only bring
up the minimum number of services to do the pivot and then carry on
with bootstrapping.

This does not incur additional reboots and should be compatible
with AI/ABI/SNO. But it is of course, a huge conceptual shift in how
bootstrapping works. With this, we would now always be sure that we're
using the same binaries as the target version as part of bootstrapping,
which should alleviate some issues such as AI late-binding (see e.g.
https://issues.redhat.com/browse/MGMT-16705).

The big exception of course being the kernel. Relatedly, note we do
persist `/usr/lib/modules` from the booted system so that loading kernel
modules still works.

To be conservative, the new logic only kicks in when using bootimages
which do not have `oc`. This will allow us to ratchet this in more
easily.

Down the line, we should be able to replace some of this with
`bootc apply-live` once that's available (and also works in a live
environment). (See containers/bootc#76.)

For full context, see the linked enhancement and discussions there.
jlebon added a commit to jlebon/installer that referenced this pull request Jan 17, 2025
As per openshift/enhancements#1637, we're trying
to get rid of all OpenShift-versioned components from the bootimages.

This means that there will no longer be `oc`, `kubelet`, or `crio`
binaries for example, which bootstrapping obviously relies on.

Instead, now we change things up so that early on when booting the
bootstrap node, we pull down the node image, unencapsulate it (this just
means convert it back to an OSTree commit), then mount over its `/usr`,
and import new `/etc` content.

This is done by isolating to a different systemd target to only bring
up the minimum number of services to do the pivot and then carry on
with bootstrapping.

This does not incur additional reboots and should be compatible
with AI/ABI/SNO. But it is of course, a huge conceptual shift in how
bootstrapping works. With this, we would now always be sure that we're
using the same binaries as the target version as part of bootstrapping,
which should alleviate some issues such as AI late-binding (see e.g.
https://issues.redhat.com/browse/MGMT-16705).

The big exception of course being the kernel. Relatedly, note we do
persist `/usr/lib/modules` from the booted system so that loading kernel
modules still works.

To be conservative, the new logic only kicks in when using bootimages
which do not have `oc`. This will allow us to ratchet this in more
easily.

Down the line, we should be able to replace some of this with
`bootc apply-live` once that's available (and also works in a live
environment). (See containers/bootc#76.)

For full context, see the linked enhancement and discussions there.
jlebon added a commit to jlebon/assisted-installer that referenced this pull request Jan 17, 2025
As per openshift/enhancements#1637, we're trying
to get rid of all OpenShift-versioned components from the bootimages.

This means that there will no longer be oc, kubelet, or crio
binaries for example, which bootstrapping obviously relies on.

To adapt to this, the OpenShift installer now ships a new
`node-image-overlay.service` in its bootstrap Ignition config. This
service takes care of pulling down the node image and overlaying it,
effectively updating the system to the node image version.

Here, we accordingly also adapt assisted-installer so that we run
`node-image-overlay.service` before starting e.g. `kubelet.service` and
`bootkube.service`.

See also: openshift/installer#8742
jlebon added a commit to jlebon/assisted-installer that referenced this pull request Jan 20, 2025
As per openshift/enhancements#1637, we're trying
to get rid of all OpenShift-versioned components from the bootimages.

This means that there will no longer be oc, kubelet, or crio
binaries for example, which bootstrapping obviously relies on.

To adapt to this, the OpenShift installer now ships a new
`node-image-overlay.service` in its bootstrap Ignition config. This
service takes care of pulling down the node image and overlaying it,
effectively updating the system to the node image version.

Here, we accordingly also adapt assisted-installer so that we run
`node-image-overlay.service` before starting e.g. `kubelet.service` and
`bootkube.service`.

See also: openshift/installer#8742
openshift-merge-bot bot pushed a commit to openshift/assisted-installer that referenced this pull request Jan 21, 2025
* ops: add new FileExists method

Prep for next patch. Also use that in one spot where we were manually
calling `stat`.

* overlay node image before bootstrapping if necessary

As per openshift/enhancements#1637, we're trying
to get rid of all OpenShift-versioned components from the bootimages.

This means that there will no longer be oc, kubelet, or crio
binaries for example, which bootstrapping obviously relies on.

To adapt to this, the OpenShift installer now ships a new
`node-image-overlay.service` in its bootstrap Ignition config. This
service takes care of pulling down the node image and overlaying it,
effectively updating the system to the node image version.

Here, we accordingly also adapt assisted-installer so that we run
`node-image-overlay.service` before starting e.g. `kubelet.service` and
`bootkube.service`.

See also: openshift/installer#8742
jlebon added a commit to jlebon/release-controller that referenced this pull request Jan 27, 2025
As part of openshift/enhancements#1637, the
version string for RHCOS will change from being OCP+RHEL-based (e.g.
419.96...) to being purely RHEL-based (e.g. 9.6...).

Adapt the logic for this new scheme so that it links to the right
stream.

We should eventually clean this up though so that the stream name is
available more directly to the release controller so that it doesn't
need to do any guessing.
jlebon added a commit to jlebon/release-controller that referenced this pull request Jan 27, 2025
As part of openshift/enhancements#1637, the
CoreOS pipeline now only builds a RHEL-only RHCOS base image and later
on, a node image is built on top of this base image to add all the
OCP-specific packages.

As a result, the RHCOS release browser will only display the diff of the
_base_ image content, and will not have any OCP content.

Often, that's sufficient. E.g. if you're just interested in kernel
or systemd changes, the RHCOS release browser is enough. However, users
can be confused by the lack of OCP packages in the list.

Let's add an info box to the changelog page with instructions to
generate a full diff.

As a bonus, these instructions also conveniently serve as a way to get
any diff at all without VPN access.

Of course, being able to generate this diff ourselves and rendering it
would be useful. And such a mechanism need not be specific to the CoreOS
image; any of the many OCP images we ship which contain RPMs would
benefit from being able to view package diffs.

The most likely candidate for implementing this would be in
`oc adm release info`, but downloading images to generate diffs is a
much more expensive operation than the git changelog-based one. So a
caching service might be better instead.

(That said, it's possible with Konflux that we'll end up storing RPM
lockfiles in git, in which case an RPM diff _is_ a git diff which
matches nicely the existing semantics.)
jlebon added a commit to jlebon/release-controller that referenced this pull request Jan 27, 2025
As part of openshift/enhancements#1637, the
CoreOS pipeline now only builds a RHEL-only RHCOS base image and later
on, a node image is built on top of this base image to add all the
OCP-specific packages.

As a result, the RHCOS release browser will only display the diff of the
_base_ image content, and will not have any OCP content.

Often, that's sufficient. E.g. if you're just interested in kernel
or systemd changes, the RHCOS release browser is enough. However, users
can be confused by the lack of OCP packages in the list.

Let's add an info box to the changelog page with instructions to
generate a full diff. But only display it if one of the RHCOS versions
is of the new RHEL-only kind.

As a bonus, these instructions also conveniently serve as a way to get
any diff at all without VPN access.

Of course, being able to generate this diff ourselves and rendering it
would be useful. And such a mechanism need not be specific to the CoreOS
image; any of the many OCP images we ship which contain RPMs would
benefit from being able to view package diffs.

The most likely candidate for implementing this would be in
`oc adm release info`, but downloading images to generate diffs is a
much more expensive operation than the git changelog-based one. So a
caching service might be better instead.

(That said, it's possible with Konflux that we'll end up storing RPM
lockfiles in git, in which case an RPM diff _is_ a git diff which
matches nicely the existing semantics.)
jlebon added a commit to jlebon/release-controller that referenced this pull request Jan 27, 2025
As part of openshift/enhancements#1637, the
CoreOS pipeline now only builds a RHEL-only RHCOS base image and later
on, a node image is built on top of this base image to add all the
OCP-specific packages.

As a result, the RHCOS release browser will only display the diff of the
_base_ image content, and will not have any OCP content.

Often, that's sufficient. E.g. if you're just interested in kernel
or systemd changes, the RHCOS release browser is enough. However, users
can be confused by the lack of OCP packages in the list.

Let's add an info box to the changelog page with instructions to
generate a full diff. But only display it if one of the RHCOS versions
is of the new RHEL-only kind.

As a bonus, these instructions also conveniently serve as a way to get
any diff at all without VPN access.

Of course, being able to generate this diff ourselves and rendering it
would be useful. And such a mechanism need not be specific to the CoreOS
image; any of the many OCP images we ship which contain RPMs would
benefit from being able to view package diffs.

The most likely candidate for implementing this would be in
`oc adm release info`, but downloading images to generate diffs is a
much more expensive operation than the git changelog-based one. So a
caching service might be better instead.

(That said, it's possible with Konflux that we'll end up storing RPM
lockfiles in git, in which case an RPM diff _is_ a git diff which
matches nicely the existing semantics.)
jlebon added a commit to jlebon/release-controller that referenced this pull request Jan 27, 2025
As part of openshift/enhancements#1637, the
CoreOS pipeline now only builds a RHEL-only RHCOS base image and later
on, a node image is built on top of this base image to add all the
OCP-specific packages.

As a result, the RHCOS release browser will only display the diff of the
_base_ image content, and will not have any OCP content.

Often, that's sufficient. E.g. if you're just interested in kernel
or systemd changes, the RHCOS release browser is enough. However, users
can be confused by the lack of OCP packages in the list.

Let's add an info box to the changelog page with instructions to
generate a full diff. But only display it if one of the RHCOS versions
is of the new RHEL-only kind.

As a bonus, these instructions also conveniently serve as a way to get
any diff at all without VPN access.

Of course, being able to generate this diff ourselves and rendering it
would be useful. And such a mechanism need not be specific to the CoreOS
image; any of the many OCP images we ship which contain RPMs would
benefit from being able to view package diffs.

The most likely candidate for implementing this would be in
`oc adm release info`, but downloading images to generate diffs is a
much more expensive operation than the git changelog-based one. So a
caching service might be better instead.

(That said, it's possible with Konflux that we'll end up storing RPM
lockfiles in git, in which case an RPM diff _is_ a git diff which
matches nicely the existing semantics.)
jlebon added a commit to jlebon/oc that referenced this pull request Jan 28, 2025
It's often useful when looking up release images to know the list of RPM
packages that shipped in the node image. Add new switches for this:
- `oc adm release info --rpmdb $IMG` will list all the packages in the
  node image for the given release image payload
- `oc adm release info --rpmdb-diff $IMG1 $IMG2` will diff the set of
  packages in the node image for the given release image payloads

The code is generic over the actual target image. By default, the node
image is used, but `--rpmdb-image` can be used to select a different
one.

The primary motivation for this is
openshift/enhancements#1637, in which the
node image will no longer be built within the CoreOS pipeline as a
base image. Instead, it will be a layered image built in OpenShift
CI/Konflux. As a result, all layered packages will not show up in the
CoreOS release browser differ.

With this functionality, the release controller will be able to render
RPM diffs in the web UI, greatly de-emphasize the CoreOS differ and
effectively dropping the requirement for having VPN access.

Some notes on the implementation:
- The rpmdb for a given image is cached, keyed by the image digest.
- We don't try to be smart here and e.g. only download some layers.
  There are some issues with doing that. We literally do download the
  full image, _but_ we only cache the rpmdb content and throw away the
  rest. That said, the high cost isn't an issue in practice because the
  release controller can nicely represent operations which take time so
  it didn't feel worth the effort of trying to optimize this further.

Once we have SBOMs available for all our images, this should be a much
cheaper way to query its RPM contents. Additionally/alternatively, for
the node image specifically, if we ever end up with lockfiles in the git
repo, this would effectively mean that the git changelog _is_ the RPM
changelog also, meshing nicely with the existing infrastructure around
that.
jlebon added a commit to jlebon/oc that referenced this pull request Jan 29, 2025
It's often useful when looking up release images to know the list of RPM
packages that shipped in the node image. Add new switches for this:
- `oc adm release info --rpmdb $IMG` will list all the packages in the
  node image for the given release image payload
- `oc adm release info --rpmdb-diff $IMG1 $IMG2` will diff the set of
  packages in the node image for the given release image payloads

The code is generic over the actual target image. By default, the node
image is used, but `--rpmdb-image` can be used to select a different
one.

The primary motivation for this is
openshift/enhancements#1637, in which the
node image will no longer be built within the CoreOS pipeline as a
base image. Instead, it will be a layered image built in OpenShift
CI/Konflux. As a result, all layered packages will not show up in the
CoreOS release browser differ.

With this functionality, the release controller will be able to render
RPM diffs in the web UI, greatly de-emphasize the CoreOS differ and
effectively dropping the requirement for having VPN access.

Some notes on the implementation:
- The rpmdb for a given image is cached, keyed by the image digest.
- We don't try to be smart here and e.g. only download some layers.
  There are some issues with doing that. We literally do download the
  full image, _but_ we only cache the rpmdb content and throw away the
  rest. That said, the high cost isn't an issue in practice because the
  release controller can nicely represent operations which take time so
  it didn't feel worth the effort of trying to optimize this further.

Once we have SBOMs available for all our images, this should be a much
cheaper way to query its RPM contents. Additionally/alternatively, for
the node image specifically, if we ever end up with lockfiles in the git
repo, this would effectively mean that the git changelog _is_ the RPM
changelog also, meshing nicely with the existing infrastructure around
that.
krishvoor pushed a commit to krishvoor/release that referenced this pull request Jan 29, 2025
These images are built as part of the CoreOS pipeline. They will be used
as bases for building the node images containing OCP-versioned content
for CI.

Part of openshift/enhancements#1637.
krishvoor pushed a commit to krishvoor/release that referenced this pull request Jan 29, 2025
* openshift/os: start building node image

As part of openshift/enhancements#1637, we want
to start building the node image as a layered build on top of an RHCOS
base image.

For now, don't promote this image. In the future, when we're ready to
switch CI over, it'll take the place of `rhel-coreos`.

* openshift/os: add an e2e-aws test

Now that we're building the node image in CI, we can run cluster tests
with it. Let's start simple for now with just the standard e2e-aws test.
Note that it doesn't run by default. This means that we can request it
on specific PRs only using `/test`.
jlebon added a commit to jlebon/oc that referenced this pull request Jan 30, 2025
It's often useful when looking up release images to know the list of RPM
packages that shipped in the node image. Add new switches for this:
- `oc adm release info --rpmdb $IMG` will list all the packages in the
  node image for the given release image payload
- `oc adm release info --rpmdb-diff $IMG1 $IMG2` will diff the set of
  packages in the node image for the given release image payloads

The code is generic over the actual target image. By default, the node
image is used, but `--rpmdb-image` can be used to select a different
one.

The primary motivation for this is
openshift/enhancements#1637, in which the
node image will no longer be built within the CoreOS pipeline as a
base image. Instead, it will be a layered image built in OpenShift
CI/Konflux. As a result, all layered packages will not show up in the
CoreOS release browser differ.

With this functionality, the release controller will be able to render
RPM diffs in the web UI, greatly de-emphasize the CoreOS differ and
effectively dropping the requirement for having VPN access.

Some notes on the implementation:
- The rpmdb for a given image is cached, keyed by the image digest.
- We don't try to be smart here and e.g. only download some layers.
  There are some issues with doing that. We literally do download the
  full image, _but_ we only cache the rpmdb content and throw away the
  rest. That said, the high cost isn't an issue in practice because the
  release controller can nicely represent operations which take time so
  it didn't feel worth the effort of trying to optimize this further.

Once we have SBOMs available for all our images, this should be a much
cheaper way to query its RPM contents. Additionally/alternatively, for
the node image specifically, if we ever end up with lockfiles in the git
repo, this would effectively mean that the git changelog _is_ the RPM
changelog also, meshing nicely with the existing infrastructure around
that.
jlebon added a commit to jlebon/fedora-coreos-pipeline that referenced this pull request Feb 4, 2025
As part of openshift/enhancements#1637, we've
moved OCP 4.19 to use bootimages with RHEL content only. This means that
the bootimages built with OCP content will never be used in practice.

Add a `skip_disk_images` knob for this.

We still generate the QEMU image for kola tests to run and because
they're useful to debug, but we drop everything else. (Actually, we
could also not generate the QEMU image either and sanity-check the OCI
image with `kola run --oscontainer`, but that requires more rewiring.)

We don't generate live media since I don't think the test coverage from
that is meaningfully different enough from the RHEL-only variants given
that the additional OCP packages are unrelated.
jlebon added a commit to jlebon/fedora-coreos-pipeline that referenced this pull request Feb 4, 2025
As part of openshift/enhancements#1637, we've
moved OCP 4.19 to use bootimages with RHEL content only. This means that
the bootimages built with OCP content will never be used in practice.

Add a `skip_disk_images` knob to disable building them.

We still generate the QEMU image for kola tests to run and because
they're useful to debug, but we drop everything else. (Actually, we
could also not generate the QEMU image either and sanity-check the OCI
image with `kola run --oscontainer`, but that requires more rewiring.)

We don't generate live media since I don't think the test coverage from
that is meaningfully different enough from the RHEL-only variants given
that the additional OCP packages are unrelated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.