From ea69233ceea2a721c920e773b870e560f12dcb0f Mon Sep 17 00:00:00 2001 From: Alexander Hill Date: Wed, 15 Nov 2023 21:09:13 -0500 Subject: [PATCH] 2023-11-15 09:09:13 PM --- topics/metal-splats.md | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/topics/metal-splats.md b/topics/metal-splats.md index f1152f4..19b3885 100644 --- a/topics/metal-splats.md +++ b/topics/metal-splats.md @@ -11,6 +11,32 @@ - [WIP PR implementing ML metal compute kernels in HF Candle](https://github.com/huggingface/candle/pull/1230/files) - [Good slides on bitonic sorting](https://wiki.rice.edu/confluence/download/attachments/4435861/comp322-s12-lec28-slides-JMC.pdf?version=1&modificationDate=1333163955158) (linked from [here](https://people.cs.rutgers.edu/~venugopa/parallel_summer2012/bitonic_overview.html)) +## 2023-11-15 + +* Think I have a handle on the bitonic sorting algorithm! Will start playing around with a CPU version before going for the GPU version. +* Implemented a working CPU version after a couple of bugs. Did you know C++ vectors don't do bounds checks by default? I didn't and nor did my professional C++ programmer friend. + +Bitonic sort is ~3x slower than the other algorithms at this data scale: + +``` +Generating 1048576 random integers +Generated 1048576 random integers +std::sort() execution time: 29 ms +sort_radix() execution time: 34 ms +sort_bitonic() execution time: 105 ms +``` + +Extending to about 10x slower at higher data scale: + +``` +Generating 16777216 random integers +Generated 16777216 random integers +std::sort() execution time: 296 ms +sort_radix() execution time: 523 ms +sort_bitonic() execution time: 2297 ms +``` + + ## 2023-11-12 ### Algorithms / Reference