-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Publish the initrd and rootfs images separately for PXE booting #390
Comments
Thanks for the report. I'm having a bit of a hard time grasping your setup, so I'd be happy if you could expand a couple of details:
|
I can confirm very slow TFTP transfers, at least with both PXE and destination servers on the same ESXi environment. |
This may be one reason Anaconda uses a separate "stage2" that's the rootfs rather than embedding it in the initramfs. This also intersects heavily with #352 in that the fulliso build that exists but is not shipped by FCOS added code to find the rootfs which we could extend to fetching from the kernel commandline over HTTP from the initramfs, the same way as Anaconda's inst.stage2 argument. |
One strawman here:
|
Hum...you're proposing that the rootfs is a cpio that has a single file squashfs file? And in the HTTP case we just unwrap the cpio? That's a really clever hack if it works! |
Yup, exactly! Implementation-wise, cosa would just spit out the two CPIOs separately anyway and in the live ISO case we'd just append it in |
We discussed this in the meeting today:
|
Sorry for the delay @lucab, here's the answers to your questions:
I was actually downloading the initramfs via https from my boot server using iPXE and iPXE seems to take a long time to downloading large files. TFTP is similar though (haven't used it for CoreOS PXE booting though) as a lot of machines cannot use a block size other than 512 bytes. I found that this incurred a large performance penalty. This is based on my poor memory though as it was a couple of years ago when I last did this.
I know this is unsupported from CoreOS but I supply a secondary initramfs (similar to what @jlebon suggested) which sets up a systemd unit that If I just write it directly to
I actually did this when testing that I was splitting the Here's the code that I use to extract the |
This was discussed recently in the context of #352. We are considering supporting a stage 2 mechanism. This could also enable easier factory reset. At the artifact level, we're hesitant to offer the root squashfs separately. Some ideas on how to offer this functionality (not necessarily mutually exclusive):
Thinking more on this, the offset idea is neat, though it's still a kind of "artifact" that users will come to rely on. It would also be a bit awkward to present in the stream metadata. We might be OK with that though. Having just a tool provide the possibility to change implementation details later on, so we could start with that at least. |
If a "root squashfs" artefact is produced, it makes it much easier to use standard caching middlewares for PXE booting. If it isn't, a custom application/processor has to be built to split the fat initrd up. |
You are correct, it does make that flow harder. Right now we feel there is value in not providing many different options that users have to choose from so we're thinking of taking a middle ground approach: still provide the "fat" initramfs as the only PXE artifact, but provide a tool/publish data that will allow people who really need to do trim out the root squashfs and provide that a different way. If we get more and more community members reporting that this is a problem and the current setup is not ideal for them then we'll definitely reconsider. |
iPXE on Packet seems to take ~5 minutes to download the initrd from our builds bucket over HTTPS. |
Sounds like an argument for delivering a thin initramfs? |
We discussed this in the meeting today.
We'd like to know if what @bgilbert was seeing is because of the remote server actually taking a long amount of time to serve the artifact or if it is genuinely because of the performance of iPXE. If it turns out it is indeed iPXE we'll have to evaluate our current stance again because the Packet/iPXE flow is something that we don't consider to be a corner case. |
I'm not sure about packet, but in my original issue message I actually meant iPXE. It took 5 minutes to download the initrd through iPXE and less than 1 minute if I ran a curl. I don't have the exact numbers to hand anymore though. |
I was having the same issue of iPXE taking at least 5 minutes to download the initrd from a local repo behind Nginx. The download progress went instantly from 0 to 99% and remained there until completion. I noticed that it got much faster (<30sec) after removing some old sub_filter rules, similar to those ones that were used years ago and leftover. Using curl to download the file took around 30 seconds with or without those rules. By looking further into it with traffic captures, I noticed that Nginx http response includes those 2 headers when not using sub_filter:
When having sub_filter in the config, Nginx response includes It seems to be a known issue with the iPXE downloader, if no When having the server including this header, the download of the initrd should be even faster than with tftp. |
Just to confirm, that was HTTP, not HTTPS, right? In my case, the content length header is present but the artefact is being served over HTTPS. |
HTTP as my iPXE version doesn't support https. |
@SerialVelocity Are you able to try fetching over HTTP as a datapoint? |
Unfortunately I am not able to try fetching over HTTP currently. |
Another thing related to this we discussed in the community meeting was making the live ISO ship the stage2 CPIO instead once we have that so that it contains everything needed to PXE boot (though I think @bgilbert was hesitant on that suggestion). This would also allow it to leverage the same stage2 mechanism as PXE, which is nice. The downside though is that it would increase memory consumption (because we'd have to extract the CPIO into RAM), which is an issue today (#407). It'd probably also make the live ISO boot slower because of the extraction step. So it might not be worth it in the end if we can't resolve those issues easily. |
Proposal
Implementation plan
Step 2 can be deferred if needed. Test cases
|
Awesome summary! This sounds great to me! |
Any reason for this short window? One month is just 2 releases. WDYT about 4 or 5 releases? |
2 might be too short, but maybe not all the way to 5. I think if people haven't changed things by 3 or 4 they probably missed or forgot about the pending change. It would be nice to get this in FCOS and battle tested a bit sooner than later. I also wonder if we could drop |
It would be nice if we would start adding a Grub/UEFI testing to the mix:
|
This looks really great @bgilbert! |
Love the detailed write-up! Thanks to all who contributed to the discussion. Could we add something to the plan about documenting these changes and how users would use the multiple artifacts? |
Updates
Proposed deprecation schedule
That gives 2 weeks' migration time for |
- If the separate rootfs image was appended as a second initrd, make sure the initramfs and rootfs are from the same build, - else if we were asked to fetch the rootfs over HTTP(S), do so, - else if we're shipping the legacy initramfs image during the deprecation window, add a MOTD, - else fail. If we see the karg to fetch the rootfs, automatically enable network. See coreos/fedora-coreos-tracker#390.
This week's stable release will switch to the separate rootfs image, completing the migration. I'll close this out. |
The fix for this went into stable stream release Thanks @bgilbert for the hard work on this one! |
* Fedora CoreOS stable (after Oct 6) ships separate initramfs and rootfs images, used as initrd's * Update profiles to match the Matchbox examples, which have already switched to the new profile and to remove the unused kernel args * Requires Fedora CoreOS version which ships rootfs images (e.g. stable 32.20200923.3.0 or later) Rel: * coreos/fedora-coreos-tracker#390 (comment) * poseidon/matchbox@da0df01#diff-4541f7b7c174f6ae6270135942c1c65ed9e09ebe81239709f5a9fb34e858ddcf Supercedes #888
* Fedora CoreOS stable (after Oct 6) ships separate initramfs and rootfs images, used as initrd's * Update profiles to match the Matchbox examples, which have already switched to the new profile and to remove the unused kernel args * Requires Fedora CoreOS version which ships rootfs images (e.g. stable 32.20200923.3.0 or later) Rel: * coreos/fedora-coreos-tracker#390 (comment) * poseidon/matchbox@da0df01#diff-4541f7b7c174f6ae6270135942c1c65ed9e09ebe81239709f5a9fb34e858ddcf Supercedes #888
The way that upstream has chosen doesn't work with the HP servers, even though it works fine an the master desktop nodes and the development cluster. See https://docs.fedoraproject.org/en-US/fedora-coreos/live-booting-ipxe/#_pxe_images and coreos/fedora-coreos-tracker#390.
* Fedora CoreOS stable (after Oct 6) ships separate initramfs and rootfs images, used as initrd's * Update profiles to match the Matchbox examples, which have already switched to the new profile and to remove the unused kernel args * Requires Fedora CoreOS version which ships rootfs images (e.g. stable 32.20200923.3.0 or later) Rel: * coreos/fedora-coreos-tracker#390 (comment) * poseidon/matchbox@da0df01#diff-4541f7b7c174f6ae6270135942c1c65ed9e09ebe81239709f5a9fb34e858ddcf Supercedes poseidon#888
When PXE booting hosts, it's useful to be able to push the initrd over the network without the rootfs image included. We do this for two reasons:
curl
.Is it possible to publish these images to https://builds.coreos.fedoraproject.org? The initrd used to be published as the installer-initramfs and the rootfs had to be extracted from the live-initramfs
The text was updated successfully, but these errors were encountered: