Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NonNull<bpf_object> cannot be shared between threads safely #1027

Closed
telbizov opened this issue Dec 4, 2024 · 12 comments
Closed

NonNull<bpf_object> cannot be shared between threads safely #1027

telbizov opened this issue Dec 4, 2024 · 12 comments

Comments

@telbizov
Copy link

telbizov commented Dec 4, 2024

Hi @danielocfb,

As a follow up from #1017, I wanted to open this issue separately. To summarize here again.
I've been using 0.23.3 and I am currently in the processing of trying to upgrade to 0.24.8 (latest stable as of now).

I have my own structure which then I share across multiple threads like this:

pub static BPF: LazyLock<Bpf> = LazyLock::new(|| Bpf::build().unwrap());

The structure, with the latest modification of embedding the storage in it looks like this now:

pub struct Bpf {
    pub skel: skel::MyOwnSkel<'static>,
    storage: Box<MaybeUninit<OpenObject>>,
}

Now I get the following compilation error:

22  | pub static BPF: LazyLock<Bpf> = LazyLock::new(|| Bpf::build().unwrap());
    |                 ^^^^^^^^^^^^^ `NonNull<bpf_object>` cannot be shared between threads safely
    |
    = help: within `MaybeUninit<OpenObject>`, the trait `Sync` is not implemented for `NonNull<bpf_object>`, which is required by `LazyLock<Bpf>: Sync`

Now, I am not entirely sure where the issue is rooted in. The OpenObject currently (https://github.com/libbpf/libbpf-rs/blob/v0.24.8/libbpf-rs/src/object.rs#L222) is:

#[derive(Debug)]
#[repr(transparent)]
pub struct OpenObject {
    ptr: NonNull<libbpf_sys::bpf_object>,
}

NonNull itself has the following:

/// `NonNull` pointers are not `Send` because the data they reference may be aliased.
// N.B., this impl is unnecessary, but should provide better error messages.
#[stable(feature = "nonnull", since = "1.25.0")]
impl<T: ?Sized> !Send for NonNull<T> {}

/// `NonNull` pointers are not `Sync` because the data they reference may be aliased.
// N.B., this impl is unnecessary, but should provide better error messages.
#[stable(feature = "nonnull", since = "1.25.0")]
impl<T: ?Sized> !Sync for NonNull<T> {}

So that makes sense to be non-Sync. However, what I don't know yet is what changed exactly. Previously, in 0.23.3, OpenObject was (https://github.com/libbpf/libbpf-rs/blob/v0.23.3/libbpf-rs/src/object.rs#L153):

#[derive(Debug)]
pub struct OpenObject {
    ptr: NonNull<libbpf_sys::bpf_object>,
    maps: HashMap<String, OpenMap>,
    progs: HashMap<String, OpenProgram>,
}

So again, it contains ptr: NonNull<libbpf_sys::bpf_object> which should make it !Sync as well. Then how was I able to use it previously?

I am not a Rust expert and I'm sure I am missing something here. I'll continue looking into it, but any help is appreciated.

Thanks

@d-e-s-o
Copy link
Collaborator

d-e-s-o commented Dec 4, 2024

Can you provide the full reproducible example?

@telbizov
Copy link
Author

telbizov commented Dec 4, 2024

Hi @d-e-s-o,

I'm still trying to wrap my head around completely. It has to do with the modifications on the Bpf structure itself due to the addition of the storage in it because of the 0.24 requirements.

I just tested the following back with 0.23.3:

pub static AAA: LazyLock<Box<MaybeUninit<OpenObject>>> =
    LazyLock::new(|| Box::new(MaybeUninit::uninit()));

This gives the same error. Just as I expected. So it's not the NonNull itself. That has always been !Sync

However, my own code before was different, since I didn't have to keep a separate storage OpenObject in it.

Previously I had:

pub struct Bpf<'a> {
    pub skel: skel::MyOwnSkel<'a>,
}

So this must not have had any NonNull pointers in it and thus was Sync. Now I changed it to:

pub struct Bpf {
    pub skel: skel::MyOwnSkel<'static>,
    storage: Box<MaybeUninit<OpenObject>>,
}

and thus the error.

@telbizov
Copy link
Author

telbizov commented Dec 4, 2024

Generated code in 0.23.3:

    pub struct MyOwnSkel<'a> {
        pub obj: libbpf_rs::Object,
        struct_ops: myown_types::struct_ops,
        skel_config: libbpf_rs::__internal_skel::ObjectSkeletonConfig<'a>,
        pub links: MyOwnLinks,
    }

    unsafe impl Send for MyOwnSkel<'_> {}
    unsafe impl Sync for MyOwnSkel<'_> {}

Generated code in 0.24.8:

pub struct MyOwnSkel<'obj> {
        obj: OwnedRef<'obj, libbpf_rs::Object>,
        pub maps: MyOwnMaps<'obj>,
        pub progs: MyOwnProgs<'obj>,
        struct_ops: StructOps,
        skel_config: libbpf_rs::__internal_skel::ObjectSkeletonConfig<'obj>,
        pub links: MyOwnLinks,
    }

    unsafe impl Send for MyOwnSkel<'_> {}
    unsafe impl Sync for MyOwnSkel<'_> {}

So in both cases libbpf-cargo implements Send and Sync on the skeleton.

@telbizov
Copy link
Author

telbizov commented Dec 4, 2024

So, if my understanding is correct, it is the "bundling" of the storage: Box<MaybeUninit<OpenObject>> in my own structure which makes it !Send. Previously, there was no separate storage that the user needed to pass and the skeleton itself was already marked as Send and Sync

@d-e-s-o
Copy link
Collaborator

d-e-s-o commented Dec 4, 2024

So implement Sync for Bpf then? Problem solved?

@telbizov
Copy link
Author

telbizov commented Dec 4, 2024

So implement Sync for Bpf then? Problem solved?

Thanks for that. It worked, but with caveats.

unsafe impl Send for Bpf {}
unsafe impl Sync for Bpf {}

I tried that before but I also got this clippy (nursery) warning message:

some fields in `Bpf` are not safe to be sent to another thread
use a thread-safe type that implements `Send`
for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#non_send_fields_in_send_ty
`-W clippy::non-send-fields-in-send-ty` implied by `-W clippy::nursery`
to override `-W clippy::nursery` add `#[allow(clippy::non_send_fields_in_send_ty)]`
mod.rs(31, 5): it is not safe to send field `_storage` to another thread

This lint warns about a Send implementation for a type that contains fields that are not safe to be sent across threads. It tries to detect fields that can cause a soundness issue when sent to another thread (e.g., Rc) while allowing !Send fields that are expected to exist in a Send type, such as raw pointers.

This warning made sense to me and I thought that I should not continue with this approach, because I am indeed wrapping a !Send and !Sync field inside. Now that I actually ran the code it works fine.

Is it really safe to do so though in all cases? I didn't read in full the previous (0.23.3) implementation of the skeleton storage, but if it was already part of the skeleton structure itself and the skeleton itself was marked as Send and Sync then I suppose it would have been equivalent to what I am doing now. Was that the case?

Thanks again Daniel

@telbizov telbizov closed this as completed Dec 4, 2024
@d-e-s-o
Copy link
Collaborator

d-e-s-o commented Dec 4, 2024

Is it really safe to do so though in all cases? I didn't read in full the previous (0.23.3) implementation of the skeleton storage, but if it was already part of the skeleton structure itself and the skeleton itself was marked as Send and Sync then I suppose it would have been equivalent to what I am doing now. Was that the case?

There was no concept of "storage" per se in earlier versions from what I recall, as it wasn't necessary. But my understanding is that it is safe to implement both Sync and Send for Bpf as you have it: from what I gather there is nothing that could get us in trouble when moving an object to another thread (e.g., no thread local storage) and, similarly, there is no issue accessing an object in a shared manner from multiple threads (e.g., no unsynchronized interior mutability).

My guess is that Clippy warns because OpenObject isn't actually Sync (or Send). We should probably add such an implementation and then, hopefully, the warning would vanish.

@d-e-s-o
Copy link
Collaborator

d-e-s-o commented Dec 4, 2024

We should probably add such an implementation and then, hopefully, the warning would vanish.

And actually when we do that I guess you wouldn't need the impl on your end at all.

@telbizov
Copy link
Author

telbizov commented Dec 4, 2024

Right. Let me take a look here and pay back a tiny bit with a PR! Reopening this now.

@telbizov telbizov reopened this Dec 4, 2024
@d-e-s-o
Copy link
Collaborator

d-e-s-o commented Dec 4, 2024

Sounds good. FWIW, I was just reminded of this prior discussion/analysis on the matter: #245 (comment)

That being said, by now Object and OpenObject just depend on and wrap C state. I think we should be good to mark them Send & Sync. I realize that is a somewhat stronger statement than what I mentioned in the above discussion, but I think my understanding of Sync has changed slight (I don't know if for the better, though :P). Should there be violations (such as the now fixed unsynchronized mutable global access) these would be bugs in libbpf itself, which we should address there.

@telbizov
Copy link
Author

telbizov commented Dec 4, 2024

Submitted a PR after testing against my code (after I removed my own implementations).
Just saw your comment. Let me update the PR to include Send+Sync for Object too

@telbizov
Copy link
Author

telbizov commented Dec 4, 2024

PR has been updated with implementations on both Object and OpenObject, the rest of the diff is rust formatting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants