Creating an array can be made 2x faster #139875
Labels
A-LLVM
Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.
C-optimization
Category: An issue highlighting optimization opportunities or PRs implementing such
S-waiting-on-LLVM
Status: the compiler-dragon is eepy, can someone get it some tea?
Consider this simple function:
Because
2u64
doesn't have the same bytes throughout, the compile can't callmemset
and instead creates a vectorized loop.However, from my testing, using the
rep stosq
instruction is over twice as fast for large arrays (more than a few hundred elements). Here is a faster version of the same function:Benchmarking both with Criterion:
Compare both of them on Godbolt.
The text was updated successfully, but these errors were encountered: