-
Notifications
You must be signed in to change notification settings - Fork 716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ksym buffer overflow on 32 bit platforms #1660
Conversation
Using uintptr to allocate the backing slice is definitely wrong: https://github.com/torvalds/linux/blob/2014c95afecee3e76ca4a56956a936e23283f05b/kernel/bpf/syscall.c#L4840-L4846 Key question is whether |
Yes, this is exactly what On my machine those addresses (as shown in /proc/kallsyms) all start with As casting those to 32-bit discards those, I guess this is not the right thing to do, as the resulting address won't be valid. |
@kolyshkin you're running an i386 binary on an amd64 kernel? |
Interesting. Since this array is allocated as []ulong on the kernel side, I guess we shouldn't allocate a slice of a variable-sized type on the user space side, that looks wrong indeed. However, I think exposing this as []uintptr through the API was the right call. @lmb ulong should also be 32 bits on i386, right? I don't think the user needs to be running an i386 binary on x64 for this to happen, the kernel-side BPF API should be consistent across arches, esp. since ulong gets extended to u64 before I think we need to do the unfortunately straightforward thing and allocate both a []uintptr and a []uint64 and copy+truncate. Edit: interesting that /proc/kallsyms displays 64-bit pointers with 0xff in the upper 32 bits. @kolyshkin could you share a few lines of that kallsyms output? I'd argue the opposite; |
Yes (in runc ci, as linked to in this PR description). We only do that to check if our code is 32-bit clean, but apparently it also helped to find this issue. |
I think it's the same on any x86_64 machine. The issue here is, we're using i386 binary on x86_64. Most probably it will be the same when running e.g. an armhf binary on arm64. [kir@kir-tp1 linux]$ uname -a
Linux kir-tp1 6.12.0+ #11 SMP PREEMPT_DYNAMIC Mon Jan 27 13:45:51 PST 2025 x86_64 GNU/Linux
[kir@kir-tp1 linux]$ cat /etc/fedora-release
Fedora release 41 (Forty One)
[kir@kir-tp1 linux]$ sudo grep bpf_prog_ /proc/kallsyms | head -10
[sudo] password for kir:
000000000002ff60 A bpf_prog_active
ffffffffae2cdda0 t __pfx___bpf_prog_ret0_warn
ffffffffae2cddb0 t __bpf_prog_ret0_warn
ffffffffae2cddd0 t __pfx___bpf_prog_ret1
ffffffffae2cdde0 t __bpf_prog_ret1
ffffffffae2ce5e0 t __pfx___bpf_prog_array_free_sleepable_cb
ffffffffae2ce5f0 t __bpf_prog_array_free_sleepable_cb
ffffffffae2d01b0 T __pfx_bpf_prog_free
ffffffffae2d01c0 T bpf_prog_free
ffffffffae2d0750 T __pfx_bpf_prog_alloc_no_stats |
Okay, so:
I think I'm okay with 32bit user space on 64 bit kernel not working fully: it's not something we've ever planned to support and is by now probably a miniscule % of installations. TL;DR: I agree with Timo that uint64 in the syscall and uintptr in the API is the way to go. Thanks for debugging / bisecting this! |
I've got a draft PR which should make issues like this less likely in the future: #1670 It already includes a fix for the issue here, but to me it would make sense to have two separate commits. It also turns out that the kernel takes |
3cfde93
to
b9e7fb7
Compare
@lmb I've updated the patch here to avoid API breakage (similar to what you did in #1670, or how it was done before commit 78074c5), except I've added a way to check if a kernel address fits into The check is based on the fact that on most architectures the kernel addresses have I'm not a big fan of this code (and I'm not sure it's 100% correct for all architectures and user/kernel combinations), but IMHO it's better than just returning truncated/wrong addresses. |
Commit 78074c5 ("info: expose more prog jited info"), which made its way into v0.17.0, resulted in random runc CI failures when running i386 binary on an amd64 kernel (see [1]). Apparently [2], the kernel always returns 64-bit pointers, so uint64 (rather than uintptr) should be used for ksyms slice regardless of the platform to avoid the buffer overrun. Now, to keep the public API of (*ProgramInfo).JitedKsymAddrs intact, convert those addresses back to uintptr, as it was done before commit 78074c5. Except, if the kernel address won't fit into an uintptr (as it is the case when running i386 binary on an amd64 kernel), return an empty slice and a false, rather than incorrect addresses. [1]: opencontainers/runc#4594 [2]: https://github.com/torvalds/linux/blob/2014c95afecee3e76ca4a56956a936e23283f05b/kernel/bpf/syscall.c#L4840-L4846 Signed-off-by: Kir Kolyshkin <[email protected]>
b9e7fb7
to
1f761f6
Compare
Commit 78074c5 ("info: expose more prog jited info", PR #1598), which made its
way into v0.17.0, resulted in random runc CI failures when running i386
binary on an amd64 kernel (see 1).
Apparently 2, the kernel always returns 64-bit pointers, so uint64
(rather than uintptr) should be used for ksyms slice regardless of
the platform to avoid the buffer overrun.
Now, to keep the public API of
(*ProgramInfo).JitedKsymAddrs
intact,convert those addresses back to
uintptr
, as it was done beforecommit 78074c5. Except, if the kernel address won't fit into an
uintptr
(as it is the case when running i386 binary on an amd64 kernel), return
an empty slice and a false, rather than incorrect addresses.