Add support for CUDA-style integer intrinsics #4778

natevm · 2024-08-05T20:22:31Z

natevm
Aug 5, 2024
Collaborator

For many high performance kernels, we use the intrinsics listed here to manipulate the bits of an integer:

https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__INTRINSIC__INT.html

Unfortunately many of these are outright missing from HLSL. For example, __clz and __fns are used in nearly every modern GPU BVH construction kernel I’m aware of, since they are important for per-wave compactions and analyzing common sub-codes in a space filling curve sequence.

Adding these intrinsics directly to Slang would be very valuable for scaling up to serious compute workloads. And at the very least, if the intrinsics aren’t supported in the target IR, it would be very helpful to have some sort of software-fallback built into the slang standard library.

(If this has already been done, then perhaps better documentation pointing to how to translate these instructions into slang?)

natevm · 2024-09-12T03:16:22Z

natevm
Sep 12, 2024
Collaborator Author

After spending some time on this, Slang does actually support quite a few of these intrinsics, but the names are quite different.

__clz maps to “firsthighbit”. There’s also “firstlowbit”, and then “countbits” translates to _popc.

One limitation is that SPIR-V assumes these intrinsics work on only 32-bit integers, nothing more, nothing less. Most IHVs support arbitrary bit lengths, but we probably need NVIDIA to push for an extension to enable them.

CUDA’s “bfe” and “bfi” intrinsics map to GLSL/SPV’s bitfieldExtract and bitfieldInsert intrinsics. Once we have PR #5020 working, this will be supported on all targets, and hardware accelerated for SPIR-V. (DXIL supports bitfield intrinsics, but HLSL intentionally does not expose them through dxc, nor do they intend to. 😞)

Beyond these, there doesn’t seem to be a way to implement PTX’s “permute” intrinsic…

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for CUDA-style integer intrinsics #4778

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Add support for CUDA-style integer intrinsics #4778

natevm Aug 5, 2024 Collaborator

Replies: 1 comment

natevm Sep 12, 2024 Collaborator Author

natevm
Aug 5, 2024
Collaborator

natevm
Sep 12, 2024
Collaborator Author