Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ksym buffer overflow on 32 bit platforms #1660

Merged
merged 1 commit into from
Feb 11, 2025
Merged

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Jan 31, 2025

Commit 78074c5 ("info: expose more prog jited info", PR #1598), which made its
way into v0.17.0, resulted in random runc CI failures when running i386
binary on an amd64 kernel (see 1).

Apparently 2, the kernel always returns 64-bit pointers, so uint64
(rather than uintptr) should be used for ksyms slice regardless of
the platform to avoid the buffer overrun.

Now, to keep the public API of (*ProgramInfo).JitedKsymAddrs intact,
convert those addresses back to uintptr, as it was done before
commit 78074c5. Except, if the kernel address won't fit into an uintptr
(as it is the case when running i386 binary on an amd64 kernel), return
an empty slice and a false, rather than incorrect addresses.

@lmb
Copy link
Collaborator

lmb commented Feb 3, 2025

Using uintptr to allocate the backing slice is definitely wrong: https://github.com/torvalds/linux/blob/2014c95afecee3e76ca4a56956a936e23283f05b/kernel/bpf/syscall.c#L4840-L4846 Key question is whether (unsigned long) ever exceeds Go uintptr (32 bit) on i386. If it doesn't we can keep the existing API and simply copy + cast the slice.

@lmb lmb self-requested a review February 3, 2025 16:43
@kolyshkin
Copy link
Contributor Author

Using uintptr to allocate the backing slice is definitely wrong: https://github.com/torvalds/linux/blob/2014c95afecee3e76ca4a56956a936e23283f05b/kernel/bpf/syscall.c#L4840-L4846 Key question is whether (unsigned long) ever exceeds Go uintptr (32 bit) on i386. If it doesn't we can keep the existing API and simply copy + cast the slice.

Yes, this is exactly what KsymAddrs did before 78074c5.

On my machine those addresses (as shown in /proc/kallsyms) all start with ffffffff. This correlates well with what I saw in runc issue -- opencontainers/runc#4594 (comment) -- four bytes were overwritten with 0xff.

As casting those to 32-bit discards those, I guess this is not the right thing to do, as the resulting address won't be valid.

@lmb
Copy link
Collaborator

lmb commented Feb 4, 2025

@kolyshkin you're running an i386 binary on an amd64 kernel?

@ti-mo
Copy link
Collaborator

ti-mo commented Feb 4, 2025

Interesting. Since this array is allocated as []ulong on the kernel side, I guess we shouldn't allocate a slice of a variable-sized type on the user space side, that looks wrong indeed. However, I think exposing this as []uintptr through the API was the right call.

@lmb ulong should also be 32 bits on i386, right? I don't think the user needs to be running an i386 binary on x64 for this to happen, the kernel-side BPF API should be consistent across arches, esp. since ulong gets extended to u64 before put_user().

I think we need to do the unfortunately straightforward thing and allocate both a []uintptr and a []uint64 and copy+truncate.

Edit: interesting that /proc/kallsyms displays 64-bit pointers with 0xff in the upper 32 bits. @kolyshkin could you share a few lines of that kallsyms output? I'd argue the opposite; 0xffffffff00000001 is clearly not a 32-bit pointer, truncating that to 0x1 looks better.

@kolyshkin
Copy link
Contributor Author

@kolyshkin you're running an i386 binary on an amd64 kernel?

Yes (in runc ci, as linked to in this PR description). We only do that to check if our code is 32-bit clean, but apparently it also helped to find this issue.

@kolyshkin
Copy link
Contributor Author

Edit: interesting that /proc/kallsyms displays 64-bit pointers with 0xff in the upper 32 bits. @kolyshkin could you share a few lines of that kallsyms output?

I think it's the same on any x86_64 machine. The issue here is, we're using i386 binary on x86_64. Most probably it will be the same when running e.g. an armhf binary on arm64.

[kir@kir-tp1 linux]$ uname -a
Linux kir-tp1 6.12.0+ #11 SMP PREEMPT_DYNAMIC Mon Jan 27 13:45:51 PST 2025 x86_64 GNU/Linux

[kir@kir-tp1 linux]$ cat /etc/fedora-release 
Fedora release 41 (Forty One)

[kir@kir-tp1 linux]$ sudo grep bpf_prog_ /proc/kallsyms | head -10
[sudo] password for kir: 
000000000002ff60 A bpf_prog_active
ffffffffae2cdda0 t __pfx___bpf_prog_ret0_warn
ffffffffae2cddb0 t __bpf_prog_ret0_warn
ffffffffae2cddd0 t __pfx___bpf_prog_ret1
ffffffffae2cdde0 t __bpf_prog_ret1
ffffffffae2ce5e0 t __pfx___bpf_prog_array_free_sleepable_cb
ffffffffae2ce5f0 t __bpf_prog_array_free_sleepable_cb
ffffffffae2d01b0 T __pfx_bpf_prog_free
ffffffffae2d01c0 T bpf_prog_free
ffffffffae2d0750 T __pfx_bpf_prog_alloc_no_stats

@lmb
Copy link
Collaborator

lmb commented Feb 5, 2025

Okay, so:

  • Kernel code uses put_user with a u64* which leads to writing the unsigned long (32 bit on i386, 64bit on amd64) as a 64bit value regardless of "native" word size
  • On 64 bit arches, Go uintptr is 64bit and therefore things are ok
  • On 32 bit raches, Go uintptr is 32 bit and things break
  • For the special case of running a 32bit program on a 64bit kernel things are even more awkward. User space uintptr is not large enough to represent a kernel address.

I think I'm okay with 32bit user space on 64 bit kernel not working fully: it's not something we've ever planned to support and is by now probably a miniscule % of installations.

TL;DR: I agree with Timo that uint64 in the syscall and uintptr in the API is the way to go.

Thanks for debugging / bisecting this!

@lmb
Copy link
Collaborator

lmb commented Feb 6, 2025

I've got a draft PR which should make issues like this less likely in the future: #1670 It already includes a fix for the issue here, but to me it would make sense to have two separate commits.

It also turns out that the kernel takes unsigned long as an input parameter for kprobe_multi attach: https://github.com/cilium/ebpf/pull/1670/files#diff-1d6277e4d95629eacfddd4c8a640d7dd3eaece13cadb7100ca38ba88db8ff45dR910 Another reason to explicitly not support 32 bit userspace on 64bit kernel.

@github-actions github-actions bot removed the breaking-change Changes exported API label Feb 6, 2025
@kolyshkin
Copy link
Contributor Author

@lmb I've updated the patch here to avoid API breakage (similar to what you did in #1670, or how it was done before commit 78074c5), except I've added a way to check if a kernel address fits into uintptr to avoid returning truncated/wrong addresses.

The check is based on the fact that on most architectures the kernel addresses have 1 for one or few MSBs, thus I only check the first address, not all of them.

I'm not a big fan of this code (and I'm not sure it's 100% correct for all architectures and user/kernel combinations), but IMHO it's better than just returning truncated/wrong addresses.

@kolyshkin kolyshkin changed the title Fix ksym buffer overflow on i386 Fix ksym buffer overflow on 32 bit platforms Feb 6, 2025
Commit 78074c5 ("info: expose more prog jited info"), which made its
way into v0.17.0, resulted in random runc CI failures when running i386
binary on an amd64 kernel (see [1]).

Apparently [2], the kernel always returns 64-bit pointers, so uint64
(rather than uintptr) should be used for ksyms slice regardless of
the platform to avoid the buffer overrun.

Now, to keep the public API of (*ProgramInfo).JitedKsymAddrs intact,
convert those addresses back to uintptr, as it was done before
commit 78074c5. Except, if the kernel address won't fit into an
uintptr (as it is the case when running i386 binary on an amd64 kernel),
return an empty slice and a false, rather than incorrect addresses.

[1]: opencontainers/runc#4594
[2]: https://github.com/torvalds/linux/blob/2014c95afecee3e76ca4a56956a936e23283f05b/kernel/bpf/syscall.c#L4840-L4846

Signed-off-by: Kir Kolyshkin <[email protected]>
info.go Show resolved Hide resolved
@lmb lmb merged commit 159dff1 into cilium:main Feb 11, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants