-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[u128; N]
array equality test is unproportionally slow
#120839
Comments
This does seem like particularly bad performance, but in general, u128 is not where you go if you need something fast. |
So, for me, it makes sense that comparing a As for why removing the |
I still don't get it. Yes, I do understand that However, in this particular case we're essentially comparing two pairs: In both cases these structures have the same size (32) and alignment (8), so I expect them to be virtually identical in terms of comparison speed. Moreover, the documentation for the intrinsic clearly states: rust/library/core/src/intrinsics.rs Lines 2397 to 2420 in 972452c
If I understood correctly, in both cases this should yield the same byte-string comparison (memcmp) which should take the same amount of time given the same byte representation and alignment. But apparently it isn't. |
Does |
u128 has 16-byte alignment on nightly (#116672), but on stable a simple example at least compiles to the same code for me: https://rust.godbolt.org/z/zj4rEWoxf For larger lengths, array comparisons compile to a bcmp call, so I'd not expect any differences at all from a Rust perspective (the underlying implementation of bcmp may have different behavior for different alignments I guess). Can you provide a minimal, complete example on e.g. godbolt for the slow & fast cases? |
On nightly only, adding a newtype wrapper around |
IIUC, the difference is precisely because nightly aligns My initial guess was that the slowdown is caused by a misaligned load of Now trying to reproduce that with my setup. |
Hello, for my ML research I wrote a generic newtype wrapper around primitive types that act as a fixed-size bit vector.
When profiling my code I've noticed some really strange performance difference depending on what underlying types were used.
It appeared that on the same workload
BitVector<[u128; 2]>
is twice as slow when compared toBitVector<[u64; 4]>
(see profiles below). Notice, that the bit width in both cases is equal: 128 * 2 = 64 * 4 = 256.I then wrote a minimal repdoducible variant of the issue:
Depending on what type is used, on my machine it consistently takes around 4481 iterations per second on
[u64; 4]
and 1750 on[u128; 2]
(2.5 times slower!). Interestingly enough, if I remove& code
part, then both versions start performing more or less the same.The profiles looks even stranger (profile.tar.gz).
u64
version seems normal:...whereas on
u128
you can quickly notice thatSpecArrayEq::spec_eq
's time skyrockets and even dwarfsBTreeSet
iteration!It appears that the unsafe call to
crate::intrinsics::raw_eq
is the culprit:rust/library/core/src/array/equality.rs
Lines 146 to 156 in c29082f
Meta
I built everything with just
cargo test --release
. No LTO, notarget-cpu=native
.rustc --version --verbose
:lscpu
:The text was updated successfully, but these errors were encountered: