Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s390x: build native runner binaries in a stage #67

Merged
merged 3 commits into from
Feb 3, 2025

Conversation

theihor
Copy link
Contributor

@theihor theihor commented Jan 29, 2025

See commit messages for details.

@theihor theihor marked this pull request as draft January 30, 2025 01:06
@theihor
Copy link
Contributor Author

theihor commented Jan 30, 2025

Unfortunately, this has to be parked for now...

I tested the build on a local box, and on a s390x host. I then modified bpf_ci_runner_s390x_02 to use the new runner image to test it. It worked without issues.

However, the s390x-native build of actions runner binaries via docker-build-push action (which uses qemu emulation) on github-hosted runners fails with a segfault in .NET

This may or may not be related to the issue in emulation with .NET 8.

So, for now I am going to use the same approach I did for v2.321.0 upgrade: make a "release" tag in my fork and upload s390x binaries that I know are working. Then, updated runner image to use those binaries.

On the next actions runner release, I'll test this PR again.

cc: @chantra

@theihor
Copy link
Contributor Author

theihor commented Jan 30, 2025

Oh, btw I briefly tried cross-compiling s390x binaries on x64, but didn't get too far. I looked into that on the off chance it's easy (.NET is supposed to be cross-platform!!!), but it wasn't. No point in spending lots of time on that.

Remove now unused s390x.Dockerfile, then rename
s390x-native.Dockerfile to s390x.Dockerfile

Signed-off-by: Ihor Solodrai <[email protected]>
Since v2.321.0 of Github Actions runner we have switched to native
s390x binaries, built externally [1].

The external build was executed locally using scripts from the gaplib
[2], the binaries then were published [3], so that they could be
downloaded during s390x image build.

However there is no technical reason to externalize s390x build
process. It was done this way mostly due to the fact that officially
released pre-built binaries are downloaded, and not built.

This change updates s390x Dockerfile adding a stage for building s390x
actions runner binaries, which are immediately used in the runner
docker image for BPF CI.

This will simplify s390x native runner maintenance, and will help to
automate uneventful version bumps.

The current actions runner version is v2.322.0

[1] kernel-patches#64
[2] https://github.com/anup-kodlekere/gaplib
[3] https://github.com/theihor/s390x-actions-runner/releases/tag/v2.321.0

Signed-off-by: Ihor Solodrai <[email protected]>
Rename docker build ARG according to the recent changes, and change
wording in PR/commit message.

Signed-off-by: Ihor Solodrai <[email protected]>
@theihor
Copy link
Contributor Author

theihor commented Jan 31, 2025

cc: @chantra @anakryiko

Here is a TL;DR picture describing the situation:

This week I wanted to update s390x runners to the new version (v2.322.0), and while at it I modified our dockerfiles such that simple bumps are mostly automated. This is what this PR is about.

But that ended up segfaulting on .NET build, as I described in a previous comment.

"Bummer!", I thought. And proceeded to updating runners semi-manually. But I got blocked, because it turned out that our docker build workflows have been failing with obscure errors for a while now.

Well, that had to be fixed. It took a while, but in the end I found a fix (#69, nice) thanks to the good people on the internet.

The fix came down to forcing upstream actions to use more recent QEMU. And that gave me a pause: maybe the reason for .NET segfault was also QEMU?

Of course it was! Which is why the pipeline is green on this PR now.

And then I remembered that last week libbpf/libbpf builds on non-x64 archs started segfaulting when compiling with cc. Andrii's take was "let's wait, it might go away", and he's probably right, although the timeline is unclear.

But first I wanted to confirm that the run-on-arch action uses similar dependencies as docker build-push. Turned out it's a bit worse. The run-on-arch action is basically a wrapper around docker, that uses Javascript to execute a bash script to run docker commands. It also hasn't been updated in a couple of months, and it uses old-ish docker build instead of the buildkit. So yeah...

Anyways, important point here is that if you're using docker to run things on different architectures, you have to remember that it's not magic. It's QEMU. And so what actually happens is that qemu-${arch}-static installed on a VM is executing all your non-native programs. And sometimes this may cause problems, whether due to bugs in qemu, or in docker or interaction with the host system, etc etc.

For libbpf I first tried to force run-on-arch to use newer qemu. That didn't work, so I decided to give it a try and run docker in the workflow directly (it is set up on github-hosted runners after all). And that resolved the weird segfault.

Phew!

@theihor theihor marked this pull request as ready for review January 31, 2025 20:59
@theihor theihor requested a review from chantra January 31, 2025 20:59
Copy link
Contributor

@chantra chantra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for cleaning this up.

@chantra
Copy link
Contributor

chantra commented Feb 1, 2025

I wonder if this could eventually be stable enough to make it into myoung34/docker-github-actions-runner#248 ?

@theihor theihor merged commit e382cf7 into kernel-patches:main Feb 3, 2025
3 checks passed
@theihor
Copy link
Contributor Author

theihor commented Feb 3, 2025

I wonder if this could eventually be stable enough to make it into myoung34/docker-github-actions-runner#248 ?

I'll look into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants