This library implements a feature-complete version of the C++ data-parallel math library TS with optional extensions.
DPM currently supports the following architectures for SIMD vectorization:
- x86
- SSE
- SSE2
- SSE3
- SSSE3
- SSE4.1
- SSE4.2
- AVX
- AVX2
- FMA
- AVX512 (see notes)
On architectures without SIMD support, vectorization is emulated via scalar operations.
#define macro | CMake option | Default value | Description |
---|---|---|---|
DPM_INLINE_EXTENSIONS | -DDPM_INLINE_EXTENSIONS | ON | Toggles inlining of the library extension namespace (see notes) |
DPM_HANDLE_ERRORS | -DDPM_HANDLE_ERRORS | ON | Toggles detection of math errors & reporting via math_errhandling (see notes) |
DPM_PROPAGATE_NAN | -DDPM_PROPAGATE_NAN | ON | Toggles guaranteed propagation of NaN (see notes) |
DPM_USE_SVML | -DDPM_USE_SVML | OFF | Enables use of math functions provided by SVML (see notes) |
N/A | -DDPM_BUILD_SHARED | ON | Toggles build of shared library target |
N/A | -DDPM_BUILD_STATIC | ON | Toggles build of static library target |
N/A | -DDPM_USE_IPO | ON | Toggles support for inter-procedural optimization |
N/A | -DDPM_TESTS | OFF | Enables unit test target |
Note that the default value applies only to CMake options.
In order to build the library using CMake, run the following commands:
mkdir build
cd build
cmake ..
cmake --build .
Build artifacts will be found in build/bin
and build/lib
. Minimum required CMake version is 3.20
In order to use the library as a CMake link dependency, you must link to one of the following targets:
dpm-interface
- interface headers of the library.dpm
- static or shared library target, depending on the value ofBUILD_SHARED_LIBS
.
DPM provides the following utilities and extensions to the standard API:
- ABI tags
struct aligned_vector
using packed_buffer = implementation-defined
using common = implementation-defined
- x86
using sse = implementation-defined
using avx = implementation-defined
- Storage traits & accessors
struct native_data_type
struct native_data_size
std::span to_native_data(simd<T, Abi> &)
std::span to_native_data(const simd<T, Abi> &)
std::span to_native_data(simd_mask<T, Abi> &)
std::span to_native_data(const simd_mask<T, Abi> &)
- Shuffle functions
simd<T, Abi> shuffle<Is...>(const simd<T, Abi> &)
simd_mask<T, Abi> shuffle<Is...>(const simd_mask<T, Abi> &)
simd_mask<T, Abi> shuffle<Is...>(const V &)
- Constant-N bit shifts
simd<T, Abi> lsl<N>(const simd<T, Abi> &)
simd<T, Abi> lsr<N>(const simd<T, Abi> &)
simd<T, Abi> asl<N>(const simd<T, Abi> &)
simd<T, Abi> asr<N>(const simd<T, Abi> &)
- Reductions
T hadd(const simd<T, Abi> &)
T hmul(const simd<T, Abi> &)
T hand(const simd<T, Abi> &)
T hxor(const simd<T, Abi> &)
T hor(const simd<T, Abi> &)
- Basic math functions
simd<T, Abi> remquo(const simd<T, Abi> &, const simd<T, Abi> &, simd<T, Abi> &)
simd<T, Abi> nan<T, Abi>(const char *)
- Power math functions
simd<T, Abi> rcp(const simd<T, Abi> &)
simd<T, Abi> rsqrt(const simd<T, Abi> &)
- Nearest integer functions
rebind_simd_t<I, simd<T, Abi>> iround<I>(const simd<T, Abi> &)
rebind_simd_t<I, simd<T, Abi>> irint<I>(const simd<T, Abi> &)
rebind_simd_t<I, simd<T, Abi>> itrunc<I>(const simd<T, Abi> &)
rebind_simd_t<long, simd<T, Abi>> ltrunc(const simd<T, Abi> &)
rebind_simd_t<long long, simd<T, Abi>> lltrunc(const simd<T, Abi> &)
- Floating-point manipulation functions
simd<T, Abi> frexp(const simd<T, Abi> &x, simd<int, Abi> &)
simd<T, Abi> modf(const simd<T, Abi> &x, simd<T, Abi> &)
- Optimization hints
#define DPM_UNREACHABLE()
#define DPM_NEVER_INLINE
#define DPM_FORCEINLINE
#define DPM_ASSUME(cnd)
- Other utilities
class cpuid
Additionally, versions of some operators and math functions accepting a scalar as one of the arguments are provided.
All extensions are available from the dpm::ext
and dpm::simd_abi::ext
namespaces. If DPM_INLINE_EXTENSIONS
option
is enabled, the ext
namespaces are declared as inline.
The standard specifies correct behavior for math functions such as sin
, cos
, etc. for invalid (ex. outside of
domain) inputs. When DPM_HANDLE_ERRORS
option is enabled, DPM will preform explicit runtime checks for such erroneous
inputs as specified by the standard. If DPM_HANDLE_ERRORS
is disabled, results for erroneous inputs are undefined.
When DPM_PROPAGATE_NAN
option is enabled, NaN
arguments are guaranteed to result in NaN
results (unless otherwise
specified), regardless of whether error handling is enabled or not. Note that signalling NaN
s may lose their value.
DPM_HANDLE_ERRORS
does not guarantee that correct FP exceptions will be raised.
Examples of DPM_HANDLE_ERRORS
and DPM_PROPAGATE_NAN
configuration:
Expression | DPM_HANDLE_ERRORS | DPM_PROPAGATE_NAN | None |
---|---|---|---|
sin(0) | 0 | 0 | 0 |
sin(Pi/2) | 1 | 1 | 1 |
sin(inf) | NaN | undefined | undefined |
sin(NaN) | NaN | NaN | undefined |
asin(0) | 0 | 0 | 0 |
asin(1) | Pi/2 | Pi/2 | Pi/2 |
asin(-2) | NaN | undefined | undefined |
asin(inf) | NaN | undefined | undefined |
asin(NaN) | NaN | NaN | undefined |
While DPM does utilize AVX512 instructions for 128- and 256-bit operations, there is no support for 512-wide vector data types. The main reasons being the increased complexity of implementation due to both the fracturing of AVX512 standard, and complexity of most 512-bit wide instructions; as well as relative inefficiency of 512-bit wide registers (for general-purpose use cases) on certain CPUs. See the following articles for details:
- https://lemire.me/blog/2018/09/07/avx-512-when-and-how-to-use-these-new-instructions/
- https://news.ycombinator.com/item?id=21031905
- https://www.phoronix.com/news/Linus-Torvalds-On-AVX-512
When DPM_USE_SVML
is enabled, DPM will use mathematical functions provided by SVML for trigonometric, hyperbolic,
exponential, nearest-integer and error functions instead of the built-in implementation. Note that if DPM_USE_SVML
is enabled, NaN propagation and error handling options are ignored for affected functions, any error handling is left to SVML.