|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: "Changes to `u128`/`i128` layout in 1.77 and 1.78" |
| 4 | +author: Trevor Gross |
| 5 | +team: The Rust Lang Team <https://www.rust-lang.org/governance/teams/lang> |
| 6 | +--- |
| 7 | + |
| 8 | +Rust has long had an inconsistency with C regarding the alignment of 128-bit integers |
| 9 | +on the x86-32 and x86-64 architectures. This problem has recently been resolved, but |
| 10 | +the fix comes with some effects that are worth being aware of. |
| 11 | + |
| 12 | +As a user, you most likely do not need to worry about these changes unless you are: |
| 13 | + |
| 14 | +1. Assuming the alignment of `i128`/`u128` rather than using `align_of` |
| 15 | +1. Ignoring the `improper_ctypes*` lints and using these types in FFI |
| 16 | + |
| 17 | +There are also no changes to architectures other than x86-32 and x86-64. If your |
| 18 | +code makes heavy use of 128-bit integers, you may notice runtime performance increases |
| 19 | +at a possible cost of additional memory use. |
| 20 | + |
| 21 | +This post documents what the problem was, what changed to fix it, and what to expect |
| 22 | +with the changes. If you are already familiar with the problem and only looking for a |
| 23 | +compatibility matrix, jump to the [Compatibility](#compatibility) section. |
| 24 | + |
| 25 | +# Background |
| 26 | + |
| 27 | +Data types have two intrinsic values that relate to how they can be arranged in memory; |
| 28 | +size and alignment. A type's size is the amount of space it takes up in memory, and its |
| 29 | +alignment specifies which addresses it is allowed to be placed at. |
| 30 | + |
| 31 | +The size of simple types like primitives is usually unambiguous, being the exact size of |
| 32 | +the data they represent with no padding (unused space). For example, an `i64` always has |
| 33 | +a size of 64 bits or 8 bytes. |
| 34 | + |
| 35 | +Alignment, however, can vary. An 8-byte integer _could_ be stored at any memory address |
| 36 | +(1-byte aligned), but most 64-bit computers will get the best performance if it is |
| 37 | +instead stored at a multiple of 8 (8-byte aligned). So, like in other languages, |
| 38 | +primitives in Rust have this most efficient alignment by default. The effects of this |
| 39 | +can be seen when creating composite types ([playground link][composite-playground]): |
| 40 | + |
| 41 | +```rust |
| 42 | +use core::mem::{align_of, offset_of}; |
| 43 | + |
| 44 | +#[repr(C)] |
| 45 | +struct Foo { |
| 46 | + a: u8, // 1-byte aligned |
| 47 | + b: u16, // 2-byte aligned |
| 48 | +} |
| 49 | + |
| 50 | +#[repr(C)] |
| 51 | +struct Bar { |
| 52 | + a: u8, // 1-byte aligned |
| 53 | + b: u64, // 8-byte aligned |
| 54 | +} |
| 55 | + |
| 56 | +println!("Offset of b (u16) in Foo: {}", offset_of!(Foo, b)); |
| 57 | +println!("Alignment of Foo: {}", align_of::<Foo>()); |
| 58 | +println!("Offset of b (u64) in Bar: {}", offset_of!(Bar, b)); |
| 59 | +println!("Alignment of Bar: {}", align_of::<Bar>()); |
| 60 | +``` |
| 61 | + |
| 62 | +Output: |
| 63 | + |
| 64 | +```text |
| 65 | +Offset of b (u16) in Foo: 2 |
| 66 | +Alignment of Foo: 2 |
| 67 | +Offset of b (u64) in Bar: 8 |
| 68 | +Alignment of Bar: 8 |
| 69 | +``` |
| 70 | + |
| 71 | +We see that within a struct, a type will always be placed such that its offset is a |
| 72 | +multiple of its alignment - even if this means unused space (Rust minimizes this by |
| 73 | +default when `repr(C)` is not used). |
| 74 | + |
| 75 | +These numbers are not arbitrary; the application binary interface (ABI) says what they |
| 76 | +should be. In the x86-64 [psABI] (processor-specific ABI) for System V (Unix & Linux), |
| 77 | +_Figure 3.1: Scalar Types_ tells us exactly how primitives should be represented: |
| 78 | + |
| 79 | +| C type | Rust equivalent | `sizeof` | Alignment (bytes) | |
| 80 | +| -------------------- | --------------- | -------- | ----------------- | |
| 81 | +| `char` | `i8` | 1 | 1 | |
| 82 | +| `unsigned char` | `u8` | 1 | 1 | |
| 83 | +| `short` | `i16` | 2 | 2 | |
| 84 | +| **`unsigned short`** | **`u16`** | **2** | **2** | |
| 85 | +| `long` | `i64` | 8 | 8 | |
| 86 | +| **`unsigned long`** | **`u64`** | **8** | **8** | |
| 87 | + |
| 88 | +The ABI only specifies C types, but Rust follows the same definitions both for |
| 89 | +compatibility and for the performance benefits. |
| 90 | + |
| 91 | +# The Incorrect Alignment Problem |
| 92 | + |
| 93 | +If two implementations disagree on the alignment of a data type, they cannot reliably |
| 94 | +share data containing that type. Rust had inconsistent alignment for 128-bit types: |
| 95 | + |
| 96 | +```rust |
| 97 | +println!("alignment of i128: {}", align_of::<i128>()); |
| 98 | +``` |
| 99 | + |
| 100 | +```text |
| 101 | +// rustc 1.76.0 |
| 102 | +alignment of i128: 8 |
| 103 | +``` |
| 104 | + |
| 105 | +```c |
| 106 | +printf("alignment of __int128: %zu\n", _Alignof(__int128)); |
| 107 | +``` |
| 108 | +
|
| 109 | +```text |
| 110 | +// gcc 13.2 |
| 111 | +alignment of __int128: 16 |
| 112 | +
|
| 113 | +// clang 17.0.1 |
| 114 | +alignment of __int128: 16 |
| 115 | +``` |
| 116 | + |
| 117 | +([Godbolt link][align-godbolt]) Looking back at the [psABI], we can see that Rust has |
| 118 | +the wrong alignment here: |
| 119 | + |
| 120 | +| C type | Rust equivalent | `sizeof` | Alignment (bytes) | |
| 121 | +| ------------------- | --------------- | -------- | ----------------- | |
| 122 | +| `__int128` | `i128` | 16 | 16 | |
| 123 | +| `unsigned __int128` | `u128` | 16 | 16 | |
| 124 | + |
| 125 | +It turns out this isn't because of something that Rust is actively doing incorrectly: |
| 126 | +layout of primitives comes from the LLVM codegen backend used by both Rust and Clang, |
| 127 | +among other languages, and it has the alignment for `i128` hardcoded to 8 bytes. |
| 128 | + |
| 129 | +Clang uses the correct alignment only because of a workaround, where the alignment is |
| 130 | +manually set to 16 bytes before handing the type to LLVM. This fixes the layout issue |
| 131 | +but has been the source of some other minor problems.[^f128-segfault][^va-segfault] |
| 132 | +Rust does no such manual adjustement, hence the issue reported at |
| 133 | +<https://github.com/rust-lang/rust/issues/54341>. |
| 134 | + |
| 135 | +# The Calling Convention Problem |
| 136 | + |
| 137 | +There is an additional problem: LLVM does not always do the correct thing when passing |
| 138 | +128-bit integers as function arguments. This was a [known issue in LLVM], before its |
| 139 | +[relevance to Rust was discovered]. |
| 140 | + |
| 141 | +When calling a function, the arguments get passed in registers (special storage |
| 142 | +locations within the CPU) until there are no more slots, then they get "spilled" to |
| 143 | +the stack (the program's memory). The ABI tells us what to do here as well, in the |
| 144 | +section _3.2.3 Parameter Passing_: |
| 145 | + |
| 146 | +> Arguments of type `__int128` offer the same operations as INTEGERs, yet they do not |
| 147 | +> fit into one general purpose register but require two registers. For classification |
| 148 | +> purposes `__int128` is treated as if it were implemented as: |
| 149 | +> |
| 150 | +> ```c |
| 151 | +> typedef struct { |
| 152 | +> long low, high; |
| 153 | +> } __int128; |
| 154 | +> ``` |
| 155 | +> |
| 156 | +> with the exception that arguments of type `__int128` that are stored in memory must be |
| 157 | +> aligned on a 16-byte boundary. |
| 158 | +
|
| 159 | +We can try this out by implementing the calling convention manually. In the below C |
| 160 | +example, inline assembly is used to call `foo(0xaf, val, val, val)` with `val` as |
| 161 | +`0x0x11223344556677889900aabbccddeeff`. |
| 162 | +
|
| 163 | +x86-64 uses the registers `rdi`, `rsi`, `rdx`, `rcx`, `r8`, and `r9` to pass function |
| 164 | +arguments, in that order (you guessed it, this is also in the ABI). Each register |
| 165 | +fits a word (64 bits), and anything that doesn't fit gets `push`ed to the stack. |
| 166 | +
|
| 167 | +```c |
| 168 | +/* full example at <https://godbolt.org/z/5c8cb5cxs> */ |
| 169 | +
|
| 170 | +/* to see the issue, we need a padding value to "mess up" argument alignment */ |
| 171 | +void foo(char pad, __int128 a, __int128 b, __int128 c) { |
| 172 | + printf("%#x\n", pad & 0xff); |
| 173 | + print_i128(a); |
| 174 | + print_i128(b); |
| 175 | + print_i128(c); |
| 176 | +} |
| 177 | +
|
| 178 | +int main() { |
| 179 | + asm( |
| 180 | + /* load arguments that fit in registers */ |
| 181 | + "movl $0xaf, %edi \n\t" /* 1st slot (edi): padding char (`edi` is the |
| 182 | + * same as `rdi`, just a smaller access size) */ |
| 183 | + "movq $0x9900aabbccddeeff, %rsi \n\t" /* 2rd slot (rsi): lower half of `a` */ |
| 184 | + "movq $0x1122334455667788, %rdx \n\t" /* 3nd slot (rdx): upper half of `a` */ |
| 185 | + "movq $0x9900aabbccddeeff, %rcx \n\t" /* 4th slot (rcx): lower half of `b` */ |
| 186 | + "movq $0x1122334455667788, %r8 \n\t" /* 5th slot (r8): upper half of `b` */ |
| 187 | + "movq $0xdeadbeef4c0ffee0, %r9 \n\t" /* 6th slot (r9): should be unused, but |
| 188 | + * let's trick clang! */ |
| 189 | +
|
| 190 | + /* reuse our stored registers to load the stack */ |
| 191 | + "pushq %rdx \n\t" /* upper half of `c` gets passed on the stack */ |
| 192 | + "pushq %rsi \n\t" /* lower half of `c` gets passed on the stack */ |
| 193 | +
|
| 194 | + "call foo \n\t" /* call the function */ |
| 195 | + "addq $16, %rsp \n\t" /* reset the stack */ |
| 196 | + ); |
| 197 | +} |
| 198 | +``` |
| 199 | +
|
| 200 | +Running the above with GCC prints the following expected output: |
| 201 | +
|
| 202 | +``` |
| 203 | +0xaf |
| 204 | +0x11223344556677889900aabbccddeeff |
| 205 | +0x11223344556677889900aabbccddeeff |
| 206 | +0x11223344556677889900aabbccddeeff |
| 207 | +``` |
| 208 | +
|
| 209 | +But running with Clang 17 prints: |
| 210 | +
|
| 211 | +``` |
| 212 | +0xaf |
| 213 | +0x11223344556677889900aabbccddeeff |
| 214 | +0x11223344556677889900aabbccddeeff |
| 215 | +0x9900aabbccddeeffdeadbeef4c0ffee0 |
| 216 | +//^^^^^^^^^^^^^^^^ this should be the lower half |
| 217 | +// ^^^^^^^^^^^^^^^^ look familiar? |
| 218 | +``` |
| 219 | +
|
| 220 | +Surprise! |
| 221 | +
|
| 222 | +This illustrates the second problem: LLVM expects an `i128` to be passed half in a |
| 223 | +register and half on the stack when possible, but this is not allowed by the ABI. |
| 224 | +
|
| 225 | +Since the behavior comes from LLVM and has no reasonable workaround, this is a |
| 226 | +problem in both Clang and Rust. |
| 227 | +
|
| 228 | +# Solutions |
| 229 | +
|
| 230 | +Getting these problems resolved was a lengthy effort by many people, starting with a |
| 231 | +patch by compiler team member Simonas Kazlauskas in 2017: [D28990]. Unfortunately, |
| 232 | +this wound up reverted. It was later attempted again in [D86310] by LLVM contributor |
| 233 | +Harald van Dijk, which is the version that finally landed in October 2023. |
| 234 | +
|
| 235 | +Around the same time, Nikita Popov fixed the calling convention issue with [D158169]. |
| 236 | +Both of these changes made it into LLVM 18, meaning all relevant ABI issues will be |
| 237 | +resolved in both Clang and Rust that use this version (Clang 18 and Rust 1.78 when using |
| 238 | +the bundled LLVM). |
| 239 | +
|
| 240 | +However, `rustc` can also use the version of LLVM installed on the system rather than a |
| 241 | +bundled version, which may be older. To mitigate the chance of problems from differing |
| 242 | +alignment with the same `rustc` version, [a proposal] was introduced to manually |
| 243 | +correct the alignment like Clang has been doing. This was implemented by Matthew Maurer |
| 244 | +in [#11672]. |
| 245 | +
|
| 246 | +Since these changes, Rust now produces the correct alignment: |
| 247 | +
|
| 248 | +```rust |
| 249 | +println!("alignment of i128: {}", align_of::<i128>()); |
| 250 | +``` |
| 251 | +
|
| 252 | +```text |
| 253 | +// rustc 1.77.0 |
| 254 | +alignment of i128: 16 |
| 255 | +``` |
| 256 | +
|
| 257 | +As mentioned above, part of the reason for an ABI to specify the alignment of a datatype |
| 258 | +is because it is more efficient on that architecture. We actually got to see that |
| 259 | +firsthand: the [initial performance run] with the manual alignment change showed |
| 260 | +nontrivial improvements to compiler performance (which relies heavily on 128-bit |
| 261 | +integers to work with integer literals). The downside of increasing alignment is that |
| 262 | +composite types do not always fit together as nicely in memory, leading to an increase |
| 263 | +in usage. Unfortunately this meant some of the performance wins needed to be sacrificed |
| 264 | +to avoid an increased memory footprint. |
| 265 | +
|
| 266 | +[a proposal]: https://github.com/rust-lang/compiler-team/issues/683 |
| 267 | +[#11672]: https://github.com/rust-lang/rust/pull/116672/ |
| 268 | +[D158169]: https://reviews.llvm.org/D158169 |
| 269 | +[D28990]: https://reviews.llvm.org/D28990 |
| 270 | +[D86310]: https://reviews.llvm.org/D86310 |
| 271 | +
|
| 272 | +# Compatibility |
| 273 | +
|
| 274 | +The most imporant question is how compatibility changed as a result of these fixes. In |
| 275 | +short, `i128` and `u128` with Rust using LLVM 18 (the default version starting with |
| 276 | +1.78) will be completely compatible with any version of GCC, as well as Clang 18 and |
| 277 | +above (released March 2024). All other combinations have some incompatible cases, which |
| 278 | +are summarized in the table below: |
| 279 | +
|
| 280 | +| Compiler 1 | Compiler 2 | status | |
| 281 | +| ---------------------------------- | ------------------- | ----------------------------------- | |
| 282 | +| Rust ≥ 1.78 with bundled LLVM (18) | GCC (any version) | Fully compatible | |
| 283 | +| Rust ≥ 1.78 with bundled LLVM (18) | Clang ≥ 18 | Fully compatible | |
| 284 | +| Rust ≥ 1.77 with LLVM ≥ 18 | GCC (any version) | Fully compatible | |
| 285 | +| Rust ≥ 1.77 with LLVM ≥ 18 | Clang ≥ 18 | Fully compatible | |
| 286 | +| Rust ≥ 1.77 with LLVM ≥ 18 | Clang \< 18 | Storage compatible, has calling bug | |
| 287 | +| Rust ≥ 1.77 with LLVM \< 18 | GCC (any version) | Storage compatible, has calling bug | |
| 288 | +| Rust ≥ 1.77 with LLVM \< 18 | Clang (any version) | Storage compatible, has calling bug | |
| 289 | +| Rust \< 1.77[^l] | GCC (any version) | Incompatible | |
| 290 | +| Rust \< 1.77[^l] | Clang (any version) | Incompatible | |
| 291 | +| GCC (any version) | Clang ≥ 18 | Fully compatible | |
| 292 | +| GCC (any version) | Clang \< 18 | Storage compatible with calling bug | |
| 293 | +
|
| 294 | +[^l]: Rust < 1.77 with LLVM 18 will have some degree of compatibility, this is just |
| 295 | + an uncommon combination. |
| 296 | +
|
| 297 | +# Effects & Future Steps |
| 298 | +
|
| 299 | +As mentioned in the introduction, most users will notice no effects of this change |
| 300 | +unless you are already doing something questionable with these types. |
| 301 | +
|
| 302 | +Starting with Rust 1.77, it will be reasonably safe to start experimenting with |
| 303 | +128-bit integers in FFI, with some more certainty coming with the LLVM update |
| 304 | +in 1.78. There is [ongoing discussion] about lifting the lint in an upcoming |
| 305 | +version, but we want to be cautious and avoid introducing silent breakage for users |
| 306 | +whose Rust compiler may be built with an older LLVM. |
| 307 | +
|
| 308 | +[relevance to Rust was discovered]: https://github.com/rust-lang/rust/issues/54341#issuecomment-1064729606 |
| 309 | +[initial performance run]: https://github.com/rust-lang/rust/pull/116672/#issuecomment-1858600381 |
| 310 | +[known issue in llvm]: https://github.com/llvm/llvm-project/issues/41784 |
| 311 | +[psabi]: https://www.uclibc.org/docs/psABI-x86_64.pdf |
| 312 | +[ongoing discussion]: https://github.com/rust-lang/lang-team/issues/255 |
| 313 | +[align-godbolt]: https://godbolt.org/z/h94Ge1vMW |
| 314 | +[composite-playground]: https://play.rust-lang.org/?version=beta&mode=debug&edition=2021&gist=52f349bdea92bf724bc453f37dbd32ea |
| 315 | +[^va-segfault]: https://github.com/llvm/llvm-project/issues/20283 |
| 316 | +[^f128-segfault]: https://bugs.llvm.org/show_bug.cgi?id=50198 |
0 commit comments