Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential undefined behavior from cyclic references between bz_stream and EState/DState. #94

Open
icmccorm opened this issue Apr 26, 2023 · 2 comments

Comments

@icmccorm
Copy link

I've noticed the following pattern is used in allocating a bz_stream object from Rust within src/mem.rs:

let mut raw = Box::new(mem::zeroed());
assert_eq!(
    ffi::BZ2_bzCompressInit(&mut *raw, lvl.level() as c_int, 0, work_factor as c_int),
    0
);

Then, the unique borrow created by &mut *raw is used to create a cyclic data structure between the initialized bz_stream and its EState member, state. For example, in BZ2_bzCompressInit in bzlib.c, you have the following:

int BZ_API(BZ2_bzCompressInit) 
                    ( bz_stream* strm, 
                     ... )
{
   ...
   EState* s;
   ...
   s->strm = strm;
   ...
   strm->state = s;
   ...
}

Every time the allocated bz_stream object is passed from Rust to C, it's done with &mut *raw. In Stacked Borrows, it seems like this would invalidate the SharedRW permission associated with the original pointer stored by s->strm = strm;. In Tree Borrows, it would initially transition the permission for s->strm to being Reserved the next time a &mut *raw is passed, so reading through it would still be allowed.

But, the root bz_stream allocation is both read and written to through its state member, so this would still lead to violations of the model eventually. For instance, in copy_input_until_stop (also in bzlib.c), you have both reads and writes from the EState pointer s to its parent bz_stream:

 ADD_CHAR_TO_BLOCK ( s, (UInt32)(*((UChar*)(s->strm->next_in))) ); 
 s->strm->next_in++;

It seems like the fix for this would be to convert the Box into a raw pointer with Box::into_raw and pass this raw pointer directly to the FFI. Then, it would be re-wrapped it for deallocation in the implementation of Drop for Stream. However, I'm totally unsure if this would be a situation where violating the Stacked Borrows/Tree Borrows models would be problematic. What do you think?

@bjorn3
Copy link
Collaborator

bjorn3 commented Dec 11, 2024

If/when rust-lang/rfcs#3712 lands, this will officially no longer be UB. And with the rust bzip2 implementation miri already accepts this code. (miri rejects it with the original libbz2 as backend only because it can't call into C afaict)

@icmccorm
Copy link
Author

icmccorm commented Dec 11, 2024

This error doesn't seem to be caused by the noalias added to Box. I think it would still happen even if we just grabbed a chunk of memory directly from the global allocator. We're still creating a new, unique mutable reference (&mut bz_stream) and copying it into the foreign heap, where it is used again by C after its lifetime has ended. That would still lead to undefined behavior under Stacked and Tree Borrows, even if the location we're borrowing is treated as identical to WellFormed<T> in the proposal, right? Or is there something I'm missing?

miri rejects it... only because it can't call into C afaict

We found this bug using a custom version of Miri that has limited support for foreign function calls. I can try replacing Box with a call to alloc to check if the bug is still present.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants