Add vector-vector and matrix-matrix Kronecker product #575

albertomercurio · 2024-12-15T14:05:42Z

The kron method between two AbstractGPUVector or two AbstractGPUMatrix was missing.

This should fix this issue in CUDA.jl. It fixes also #558 .

albertomercurio · 2024-12-15T14:16:15Z

I didn't add the synchronize function since I saw that any implementation doesn't have one. I'm also wondering why since the output might be incorrect until we wait the kernel.

ytdHuang · 2024-12-15T16:27:45Z

Maybe this can also fix #558 ?

pxl-th · 2024-12-15T16:40:00Z

I didn't add the synchronize function since I saw that any implementation doesn't have one. I'm also wondering why since the output might be incorrect until we wait the kernel.

Because with Julia all kernels execute asynchronously (from the CPU perspective) and in a stream-ordered fashion (on the GPU).

Synchronization happens either explicitly by user calling CUDA.synchronize() or implicitly during GPU->CPU transfer Array(gpu_array).

So adding explicit synchronization is redundant and would only slow down GPU.

albertomercurio · 2024-12-15T16:59:19Z

But what if an external user does for example c = kron(a, b); d = c * a?

If it is asynchronous, c may be not updated and d would be uncorrected. Am I wrong?

The user may explicit call synchronize, but the code is in principle type-agnostic, such that it could work on CPU or in the GPU as well.

pxl-th · 2024-12-15T17:24:43Z

But what if an external user does for example c = kron(a, b); d = c * a?

If it is asynchronous, c may be not updated and d would be uncorrected. Am I wrong?

The user may explicit call synchronize, but the code is in principle type-agnostic, such that it could work on CPU or in the GPU as well.

GPU executes kernels in order of their submission. So GPU will first compute kron and only when it is done c * a.
That is true when kernels are executed on the same stream (which is the default).

albertomercurio · 2024-12-15T20:13:58Z

Ok very clear thanks

maleadt

Any reason you went with a different implementation from what we already have in CUDA.jl: https://github.com/JuliaGPU/CUDA.jl/blob/19a08efa06bcb0b5aa88b3a25bb0b336b6538a9a/lib/cublas/linalg.jl#L400-L484?

src/host/linalg.jl

albertomercurio · 2024-12-16T08:24:23Z

Here I'm indexing on C instead of A or B. This should be easier and faster, I did some benchmarks.

The CUDA implementation has the stride stuff, which I never understood if I should use it also for KernelAbstractions.

maleadt · 2024-12-16T13:17:58Z

The CUDA implementation has the stride stuff, which I never understood if I should use it also for KernelAbstractions.

Didn't you implement that kernel in CUDA.jl? JuliaGPU/CUDA.jl#2177

albertomercurio · 2024-12-16T17:16:16Z

Yes. I mean that this implementation should be faster and simpler.

Assuming that we implement the same also for the CUDA case, I still don't understand if I need to use the stride stuff also for KernelAbstractions. From the examples I have seen so far, it seems that it is handled internally in KernelAbstractions.

Add vector-vector Kronecker product

909b31d

Implement also the matrix-matrix kronecker product

e83f582

albertomercurio changed the title ~~Add vector-vector Kronecker product~~ Add vector-vector and matrix-matrix Kronecker product Dec 15, 2024

Add methods also for Transpose and Adjoint

5bdfbed

maleadt reviewed Dec 16, 2024

View reviewed changes

src/host/linalg.jl Outdated Show resolved Hide resolved

Simplify kernel

fc94e65

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add vector-vector and matrix-matrix Kronecker product #575

Add vector-vector and matrix-matrix Kronecker product #575

albertomercurio commented Dec 15, 2024 •

edited

Loading

albertomercurio commented Dec 15, 2024

ytdHuang commented Dec 15, 2024

pxl-th commented Dec 15, 2024

albertomercurio commented Dec 15, 2024

pxl-th commented Dec 15, 2024

albertomercurio commented Dec 15, 2024

maleadt left a comment

albertomercurio commented Dec 16, 2024

maleadt commented Dec 16, 2024

albertomercurio commented Dec 16, 2024

Add vector-vector and matrix-matrix Kronecker product #575

Are you sure you want to change the base?

Add vector-vector and matrix-matrix Kronecker product #575

Conversation

albertomercurio commented Dec 15, 2024 • edited Loading

albertomercurio commented Dec 15, 2024

ytdHuang commented Dec 15, 2024

pxl-th commented Dec 15, 2024

albertomercurio commented Dec 15, 2024

pxl-th commented Dec 15, 2024

albertomercurio commented Dec 15, 2024

maleadt left a comment

Choose a reason for hiding this comment

albertomercurio commented Dec 16, 2024

maleadt commented Dec 16, 2024

albertomercurio commented Dec 16, 2024

albertomercurio commented Dec 15, 2024 •

edited

Loading