Replies: 1 comment
-
After spending some time on this, Slang does actually support quite a few of these intrinsics, but the names are quite different. __clz maps to “firsthighbit”. There’s also “firstlowbit”, and then “countbits” translates to _popc. One limitation is that SPIR-V assumes these intrinsics work on only 32-bit integers, nothing more, nothing less. Most IHVs support arbitrary bit lengths, but we probably need NVIDIA to push for an extension to enable them. CUDA’s “bfe” and “bfi” intrinsics map to GLSL/SPV’s bitfieldExtract and bitfieldInsert intrinsics. Once we have PR #5020 working, this will be supported on all targets, and hardware accelerated for SPIR-V. (DXIL supports bitfield intrinsics, but HLSL intentionally does not expose them through dxc, nor do they intend to. 😞) Beyond these, there doesn’t seem to be a way to implement PTX’s “permute” intrinsic… |
Beta Was this translation helpful? Give feedback.
-
For many high performance kernels, we use the intrinsics listed here to manipulate the bits of an integer:
https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__INTRINSIC__INT.html
Unfortunately many of these are outright missing from HLSL. For example, __clz and __fns are used in nearly every modern GPU BVH construction kernel I’m aware of, since they are important for per-wave compactions and analyzing common sub-codes in a space filling curve sequence.
Adding these intrinsics directly to Slang would be very valuable for scaling up to serious compute workloads. And at the very least, if the intrinsics aren’t supported in the target IR, it would be very helpful to have some sort of software-fallback built into the slang standard library.
(If this has already been done, then perhaps better documentation pointing to how to translate these instructions into slang?)
Beta Was this translation helpful? Give feedback.
All reactions