Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix boot modes and ESP partitions for RHEL 8 and 9 (RHEL-69628) #1080

Merged
merged 6 commits into from
Dec 3, 2024

Conversation

ondrejbudai
Copy link
Member

RHEL got support for UEFI on AWS in 8.9 and 9.3. However, we introduced some bugs in how we treat old distributions:

  • the ami image type for older RHEL 8 versions was incorrectly marked as hybrid bootable, but the base partition table didn't have /boot/efi rendering the image unbootable on UEFI.
  • The same is true for RHEL 9. This situation is even trickier here: the ec2* images were marked as legacy-only for older releases, but they actually contained an ESP partition, which is not needed.

I fixed all bugs, and added 2 tests to all distribution:

  1. TestESP checks whether all images marked as UEFI or hybrid bootable have an ESP partition in their base partition table. The opposite is checked as well: legacy-only images shall not have an ESP partition.
  2. TestAMIHybridBoot checks the boot mode of AMIs for RHEL 9 and RHEL 10.

There's a quite a lot of code duplication here, I'm happy to hear you ideas about how to deduplicate it. The use of internal structures/methods is making this a bit tricky.

@achilleas-k
Copy link
Member

achilleas-k commented Dec 2, 2024

I fixed all bugs, and added 2 tests to all distribution:

  1. TestESP checks whether all images marked as UEFI or hybrid bootable have an ESP partition in their base partition table. The opposite is checked as well: legacy-only images shall not have an ESP partition.
  2. TestAMIHybridBoot checks the boot mode of AMIs for RHEL 9 and RHEL 10.

There's a quite a lot of code duplication here, I'm happy to hear you ideas about how to deduplicate it. The use of internal structures/methods is making this a bit tricky.

I think with the new partition table creation we can get rid of a lot of these kinds of issues. Instead of having a boot mode parameter (which we set on each image type) and a partition table (which we statically encode) that have to match, and then have a test that makes sure they match, it would probably make our lives easier if the base partition tables were simpler and then the final bits were added dynamically based on image type properties.

Most of our static partition tables define a root partition of a given size and then everything else is determined by the architecture and boot mode. Perhaps we could get rid of all the static partition tables and instead define the root partition size, a boot mode, and a partition table type and let NewCustomPartitionTable() take care of the rest.

These new tests would then be redundant. We would only need the tests that we already have that check whether an ESP is created when boot mode is UEFI or hybrid, etc.

@ondrejbudai
Copy link
Member Author

Agreed, let's put it on the plan when we are finished with all the required stuff?

@thozza
Copy link
Member

thozza commented Dec 2, 2024

I think with the new partition table creation we can get rid of a lot of these kinds of issues. Instead of having a boot mode parameter (which we set on each image type) and a partition table (which we statically encode) that have to match, and then have a test that makes sure they match, it would probably make our lives easier if the base partition tables were simpler and then the final bits were added dynamically based on image type properties.

With one correction, the boot mode is not set on each image type; instead, it is determined by the Platform that it is associated with. Unfortunately, the rest is the same - the partitioning table must match the boot mode of the Platform.

achilleas-k
achilleas-k previously approved these changes Dec 2, 2024
Copy link
Member

@achilleas-k achilleas-k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

It looks like the TestESP test could be defined in only one place if the ImageType interface had a GetPartitionTable() method or similar, so that the tests could run from anywhere.

Something to consider for a follow-up I think.

Copy link
Member

@thozza thozza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just looked at 3eb5779 and the intention (and it looks that the implementation matched) was for AMIs build in the service or on-prem after RHEL-8.9 to be all hybrid-boot for any RHEL-8 version. And the boot mode was supposed to stay as it was only for EC2 images built in Brew.

Then 7acf235 removed /boot/efi for all RHEL-8 AMI/EC2 images and made the same partition table to be used for RHEL EC2 and AMI.

Then I fixed the missing /boot/efi by fa5aec4.

And later reverted the /boot/efi for EC2 images from before 8.9 by 35215e6, but due to previous changes, the same partitioning table is used for RHEL AMIs. The Platform for AMIs and EC2 images stayed different and this is from where the inconsistency is coming from.

I'll ask if we don't want to get to the initial version and use hybrid boot mode for RHEL-8 and RHEL-9 AMIs in the SaaS version and on-prem, regardless of the minor version? The initial idea was that the SaaS version builds the latest "version" of RHEL images. But we didn't want to change old images built in Brew when rebuilt.

Lastly, I like the tests that you've added. I'd suggest replacing the four copies by a
single implementation in pkg/distro/distro_test_common/distro_test_common.go and then calling it from each distro.

Copy link
Member

@thozza thozza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking to have the discussion first.

@ondrejbudai
Copy link
Member Author

Yeah, I vaguely remember that we discussed this before. I wonder if we should just keep it simple, and treat the ec2 and ami images as close to each other as posslble. It seems like there was quite a lot of confusion about this in the past, so making it dead simple might just be better.

No hard feelings, though. :)

@thozza
Copy link
Member

thozza commented Dec 3, 2024

I also don't insist on moving back to the original idea and don't have a strong preference for using hybrid boot mode for older RHEL minor versions (shameless plug: we should not be building them anyway in SaaS and on-prem without EUS repos 😇 ).

It would be good to call this out explicitly in the code. Maybe something like the following comment could replace the existing comments and/or be added where appropriate:

// RHEL-internal EC2 images and RHEL AMI images are kept intentionally in sync
// with regard to not supporting hybrid boot mode before RHEL version X.Y.
// The partitioning table for these reflects that and is also intentionally in sync.

Would you have the time to factor out the new tests so that you have just a single copy? I'm happy to approve the PR afterward.

@ondrejbudai
Copy link
Member Author

Working on it, I'm fighting with import cycles! 🔪➿

@ondrejbudai
Copy link
Member Author

okay, I deduped it as much as possible. If we made GetPartitionTable a part of the distro.ImageType interface, we can even make it fully generic, but I don't like making methods public just for the sake of tests. But I can be convinced pretty easily. Maybe we can do this after #1077 lands.

I also added the comment suggested by @thozza, I couldn't figure out a better place, suggestions are welcome.

Copy link
Member

@thozza thozza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes. This looks nice.

The addition of "github.com/osbuild/images/pkg/platform" to pkg/distro/distro_test_common/distro_test_common.go is in the wrong commit.

pkg/distro/rhel/rhel8/distro.go Outdated Show resolved Hide resolved
pkg/distro/rhel/rhel9/distro.go Outdated Show resolved Hide resolved
It's not needed anywhere else, and it removes the dependency of
distro_test_common on all distros (via repogistry) which helps with
the subsequent commits. The function was also made private, because
it's used in just one package.
TestESP checks whether all images marked as UEFI or hybrid bootable have
an ESP partition in their base partition table. The opposite is
checked as well: legacy-only images shall not have an ESP partition.

The code is deduplicated as much as possible. It meant adding a proxy
function argument to the shared TestESP function because partition
tables are not a public interface, but it's imho good enough.

Maybe we can go one step further once the GetPartitionTable interface
is unified across rhel and fedora, and just make it public. Maybe.
TestAMIHybridBoot verifies that rhel 8.9 and 9.3 are the first RHEL
versions that implemented hybrid boot for the ami and ec2* image types.
RHEL 8.8 and older doesn't support UEFI in AWS. The code correctly set
it for EC2 images, but not for ami. This commit fixes that.
    RHEL 9.2 and older doesn't support UEFI in AWS. The code correctly set
    it for EC2 images, but not for ami. This commit fixes that.
RHEL 9.2 and older don't support UEFI on AWS. Let's remove also the ESP
from there, as it's absolutely useless there.
@ondrejbudai
Copy link
Member Author

I think I managed to fix the rebase fails, and I fixed the X.Y placeholders. :)

Copy link
Member

@achilleas-k achilleas-k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Couple of suggestions/nitpicks, nothing to block on. Except Tomáš comment about the import change being in the wrong commit. That one should get fixed :)

// It also checks the opposite, i.e. that legacy images don't have an ESP. This test is only
// performed on image types with a partition table, thus it doesn't run on e.g. installers and
// ostree commits.
// distros is a list of distros to test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// distros is a list of distros to test
// distros is a list of distros to test.


func TestESP(t *testing.T) {
var distros []distro.Distro
for _, distroName := range []string{"fedora-40", "fedora-41", "fedora-42"} {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could extend this to fedora-99 and not have to worry about keeping the list up-to-date. Or maybe we can add a follow-up item to iterate over all known distro versions based on test/data/repositories/.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this, and it creates a hard-to-break import loop. Let's figure this in a separate PR. :)

func TestESP(t *testing.T) {
var distros []distro.Distro
distroFactory := distrofactory.NewDefault()
for _, distroName := range []string{"rhel-7.9", "rhel-8.8", "rhel-8.9", "rhel-8.10", "centos-8", "rhel-9.0", "rhel-9.2", "rhel-9.4", "centos-9", "rhel-10.0", "centos-10"} {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as for Fedora.

@ondrejbudai ondrejbudai added this pull request to the merge queue Dec 3, 2024
Merged via the queue into osbuild:main with commit 7600a5a Dec 3, 2024
19 checks passed
@ondrejbudai ondrejbudai deleted the fix-bootmode branch December 3, 2024 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants