Skip to content

Naive Cuda (tagged for archival purposes)

Pre-release
Pre-release
Compare
Choose a tag to compare
@lukstafi lukstafi released this 21 Jul 08:56
· 1030 commits to master since this release

Cuda FFI, naive, not particularly functional Cuda backend where a "parallel" axis is mapped across blocks and a "minibatch" axis is mapped across threads in a block.

This does not really work because it lacks synchronization across blocks. Also the "parallel axis", "minibatch axis" approach is not really usable (neither for Cuda nor the Gccjit backend).

When using too many total threads, Cuda hangs / takes too long on compilation to PTX. Where the Cuda backend works, the Gccjit backend is way faster.

Other meaningful improvements include: low-level code optimization / simplification; refactorings.