Releases: kimwalisch/libpopcnt
libpopcnt-3.1
This release improves the AVX512 popcount algorithm. libpopcnt-3.1 is fully backwards compatible with libpopcnt-3.0 and libpopcnt-2.*.
- Improve AVX512 popcount algorithm for trailing 64 bytes.
- AVX512 algorithm does not require AVX512-BITALG extension anymore.
libpopcnt-3.0
libpopcnt-3.0 is a major new release with many improvements, but it is still backwards compatible with libpopcnt-2.*!
The two main new features of libpopcnt-3.0 are: the new ARM SVE popcount algorithm that is up to 3x faster than the ARM NEON popcount algorithm and the new AVX512 VPOPCNT algorithm that is up to 35% faster than the old AVX512 Harley-Seal popcount algorithm. Unlike the old AVX512 algorithm, the new AVX512 VPOPCNT algorithm is also fast for short arrays β₯ 48 bytes.
- Add ARM SVE algorithm.
- Replace AVX512BW algorithm by faster AVX512 VPOPCNTDQ algorithm.
- Add MSVC support for ARM NEON.
- Improve preprocessor checks using
__has_include()
macro. - Port tests from AppVeyor to GitHub actions.
- Get rid of unaligned
uint64_t
memory acceses, this fixes test failures when using GCC compiler sanitizers. - Prefix all libpopcnt macros using
LIBPOPCNT_
to avoid any naming collisions.
libpopcnt-2.6
libpopcnt-2.5
- On x86/x64 runtime CPUID checks are now removed if the user compiles his code with e.g. -march=native (or -mavx512bw).
- On CPU architectures that support unaligned memory accesses, stop aligning memory as it causes branch mispredictions which significantly deteriorate performance for small array sizes.
libpopcnt-2.4
This release enables AVX2
& AVX512
by default (with cpuid
runtime check) for MSVC 2017 or later.
libpopcnt-2.3
See the ChangeLog for what's new.
libpopcnt-2.2
See the ChangeLog for what's new.
libpopcnt-2.1
See the ChangeLog for what's new.
libpopcnt-2.0
See the ChangeLog for what's new.
libpopcnt-1.9
See the ChangeLog for what's new.