-
Notifications
You must be signed in to change notification settings - Fork 19
CUBLAS dot far slower than BLAS dot #17
Comments
It might be expected. The time to transfer data to the GPU over PCIe can be pretty substantial. If you can make your array size a power of 2 OR do multiple ops with the same data on the GPU, you should see better perf. |
I probably misunderstand how this all works, but isn't the only transfer occurring when I say |
Oh derp, you're right. I think it still might be the fact that the array size is not a power of two and is a little small. |
Well, ok, it starts paying off at arrays of size 2^20:
At 2^25, Julia crashes, saying it's out of memory (which is suspicious; I thought it would pay off for smaller data size. Perhaps it's my card (GeForce GT 650M). Anyways, thanks for your help! |
It could be the card, especially if you have a nice CPU. |
I wrote simple functions that perform dot products on
Array
s andCudaArray
s. I'm finding that the CUDA version is about 4x slower. Is this expected?Running this script gives:
(Bonus question: what's up with the EBADF???)
This is on OSX 10.9, Julia 0.4.1 installed from Homebrew, built against OpenBLAS, CUDA 7.5.
The text was updated successfully, but these errors were encountered: