Skip to content

CString being noalias is a footgun #136770

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
theemathas opened this issue Feb 9, 2025 · 9 comments
Open

CString being noalias is a footgun #136770

theemathas opened this issue Feb 9, 2025 · 9 comments
Labels
A-box Area: Our favorite opsem complication A-FFI Area: Foreign function interface (FFI) C-bug Category: This is a bug. T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. T-opsem Relevant to the opsem team

Comments

@theemathas
Copy link
Contributor

Internally, String is implemented as a Vec<u8>. However, CString is implemented as a Box<[u8]>.

This means that CString is currently treated as noalias, like Box (but not Vec) is. (See rust-lang/unsafe-code-guidelines#326.) This means that there are additional rules of how pointers pointing into the CString buffer can be used, and violating these rules lead to UB. As far as I can tell, the fact that this applies to CString is not documented anywhere.

I don't know if this has produced any issues in practice, but I believe that this is an undesirable footgun that should be eliminated (e.g., by changing CString to use NonNull<[u8]> instead), especially since CString is intended to be turned into a pointer and passed to FFI, potentially resulting in subtle UB that Miri can't detect.

For example, consider the following code, which is a stand-in for how a program with C FFI might use CString:

use std::ffi::{CString, c_char, CStr};

fn main() {
    let my_string = CString::new("abc").unwrap();
    let ptr: *const c_char = my_string.as_ptr();
    // Suppose that `ptr` is given to some C FFI function here
    let _moved_my_string = my_string;
    // Suppose that `ptr` is re-obtained from C FFI here
    let _my_str = unsafe { CStr::from_ptr(ptr) };
}

This code might seem intuitively fine to many Rust users, but it causes UB according to Miri.

Error from Miri
error: Undefined Behavior: attempting a read access using <2513> at alloc1005[0x0], but that tag does not exist in the borrow stack for this location
   --> /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ffi/c_str.rs:740:22
    |
740 |             unsafe { strlen(s) }
    |                      ^^^^^^^^^
    |                      |
    |                      attempting a read access using <2513> at alloc1005[0x0], but that tag does not exist in the borrow stack for this location
    |                      this error occurs as part of an access at alloc1005[0x0..0x1]
    |
    = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
    = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <2513> was created by a SharedReadOnly retag at offsets [0x0..0x4]
   --> src/main.rs:5:30
    |
5   |     let ptr: *const c_char = my_string.as_ptr();
    |                              ^^^^^^^^^^^^^^^^^^
help: <2513> was later invalidated at offsets [0x0..0x4] by a Unique retag (of a reference/box inside this compound value)
   --> src/main.rs:7:28
    |
7   |     let _moved_my_string = my_string;
    |                            ^^^^^^^^^
    = note: BACKTRACE (of the first span):
    = note: inside `core::ffi::c_str::strlen::runtime` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ffi/c_str.rs:740:22: 740:31
    = note: inside `core::ffi::c_str::strlen` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/intrinsics/mod.rs:3886:9: 3886:61
    = note: inside `std::ffi::CStr::from_ptr::<'_>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ffi/c_str.rs:267:28: 267:39
note: inside `main`
   --> src/main.rs:9:28
    |
9   |     let _my_str = unsafe { CStr::from_ptr(ptr) };
    |                            ^^^^^^^^^^^^^^^^^^^

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

(It's debatable whether CString not having extra capacity is good or not, but that's a separate issue.)

@rustbot labels +A-FFI +A-box +T-libs-api +T-opsem

Meta

The Miri error was from running the code on the playground with rust version 1.86.0-nightly (2025-02-08 43ca9d18e333797f0aa3).

@theemathas theemathas added the C-bug Category: This is a bug. label Feb 9, 2025
@rustbot rustbot added needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. A-box Area: Our favorite opsem complication A-FFI Area: Foreign function interface (FFI) T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. T-opsem Relevant to the opsem team labels Feb 9, 2025
@jieyouxu jieyouxu removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Feb 9, 2025
@the8472
Copy link
Member

the8472 commented Feb 9, 2025

This code might seem intuitively fine to many Rust users, but it causes UB according to Miri.

Only by making assumptions about as_ptr(). If you used slice.as_ptr() and then tried to move the thing the slice was borrowing from that'd be incorrect. So one could equally say "you moved the thing you were borrowing from, of course that's UB". This is the general rule. Vec::as_ptr is the exception and explicitly documented so.

If you base your assumptions on the exceptional case instead of the general case then you'll run into problems... so that intuition seems wrong.

@theemathas
Copy link
Contributor Author

@the8472 Vec::as_ptr's documentation says:

This method guarantees that for the purpose of the aliasing model, this method does not materialize a reference to the underlying slice, and thus the returned pointer will remain valid when mixed with other calls to as_ptr, as_mut_ptr, and as_non_null.

This doesn't actually say anything about noalias. In particular, it doesn't say whether you're allowed to move the Vec or not.

@the8472
Copy link
Member

the8472 commented Feb 9, 2025

Hrm right, I'm not sure if the extra grace that Vec provides is guaranteed or just a "let's not gratuitously break people who relied on this". But my point stands, the norm is that moving invalidates and you need an exception to do this. If your intuition is that this is generally ok then the intuition needs fixing.

I think a better argument might be saying that this is useful and asking for a guarantee. But that'd need a better demonstration what this is useful for, why does it get moved?

@hanna-kruppe
Copy link
Contributor

See also rust-lang/rfcs#3712

@the8472
Copy link
Member

the8472 commented Feb 9, 2025

At least CString does explicitly document that you get the ptr through deref, which means it's derived from a borrow.

CString implements an as_ptr method through the Deref trait.

@joboet
Copy link
Member

joboet commented Feb 12, 2025

For what it's worth, I've found an instance of std code that is unsound because of this. The Command implementation uses this code to generate an environment array for the spawned process:

self.ptrs[l - 1] = item.as_ptr();
self.ptrs.push(ptr::null());
self.items.push(item);

Because the CString is moved, the provenance of the pointers in the array will be invalidated. Unfortunately, this wasn't caught by miri since it doesn't support process spawning...

@theemathas
Copy link
Contributor Author

I've created a new Zulip topic on this issue.

@traviscross
Copy link
Contributor

traviscross commented Feb 13, 2025

For what it's worth, I've found an instance of std code that is unsound because of this. The Command implementation uses this code to generate an environment array for the spawned process:

self.ptrs[l - 1] = item.as_ptr();
self.ptrs.push(ptr::null());
self.items.push(item);

Because the CString is moved, the provenance of the pointers in the array will be invalidated.

For what it's worth, this fails Stacked Borrows but is not necessarily UB as the exact model for Rust is still undetermined.
This is, e.g., accepted by Tree Borrows.

Playground link

cc @RalfJung

@RalfJung
Copy link
Member

Good catch! We have generally strived to keep std compatible with Stacked Borrows. Well, I have strived to do so, and so far my patches were accepted, and we are gating PRs on Miri running the std test suite with Stacked Borrows. But Miri can't run Command so this case got missed.

Cc @rust-lang/opsem

@traviscross traviscross added the T-lang Relevant to the language team, which will review and decide on the PR/issue. label Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-box Area: Our favorite opsem complication A-FFI Area: Foreign function interface (FFI) C-bug Category: This is a bug. T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. T-opsem Relevant to the opsem team
Projects
None yet
Development

No branches or pull requests

8 participants