Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CuSparseMatrix - CuMatrix multiplication not working: giving Scalar Indexing #2072

Open
lgravina1997 opened this issue Sep 2, 2023 · 16 comments
Labels
bug Something isn't working needs information Further information is requested

Comments

@lgravina1997
Copy link

lgravina1997 commented Sep 2, 2023

Multiplying a CuSparseMatrixCSC with a CuArray gives Scalar indexing.

To reproduce:

    CUDA.allowscalar(false)
    A  = cu(sparse([1,2,3], [1,2,3], [1,2,3]))
    B  = cu(rand(3,1))
    C = A*B

or

    CUDA.allowscalar(false)
    A  = cu(sparse([1,2,3], [1,2,3], [1,2,3]))
    B  = cu(rand(3,1))
    C = similar(B)
    mul!(C, A, B)

Both give the same problem of course.

Version info

Details on Julia:

Julia Version 1.9.2
Commit e4ee485e909 (2023-07-05 09:39 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 20 × 12th Gen Intel(R) Core(TM) i7-12700K
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, alderlake)
  Threads: 21 on 20 virtual cores
Environment:
  JULIA_NUM_THREADS = auto
CUDA runtime 12.1, artifact installation
CUDA driver 12.0
NVIDIA driver 525.125.6

CUDA libraries: 
- CUBLAS: 12.1.3
- CURAND: 10.3.2
- CUFFT: 11.0.2
- CUSOLVER: 11.4.5
- CUSPARSE: 12.1.0
- CUPTI: 18.0.0
- NVML: 12.0.0+525.125.6

Julia packages: 
- CUDA: 4.4.1
- CUDA_Driver_jll: 0.5.0+1
- CUDA_Runtime_jll: 0.6.0+0

Toolchain:
- Julia: 1.9.2
- LLVM: 14.0.6
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5
- Device capability support: sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

1 device:
  0: NVIDIA GeForce RTX 3070 (sm_86, 6.158 GiB [/](https://vscode-remote+ssh-002dremote-002b128-002e178-002e67-002e73.vscode-resource.vscode-cdn.net/) 8.000 GiB available)
@lgravina1997 lgravina1997 added the bug Something isn't working label Sep 2, 2023
@amontoison
Copy link
Member

@dkarrasch
Is it possible that you removed the associated dispatch with #1904?
We should call this routine.

@dkarrasch
Copy link
Contributor

I'm not sure. There's

function LinearAlgebra.generic_matmatmul!(C::CuMatrix{T}, tA, tB, A::CuSparseMatrix{T}, B::DenseCuMatrix{T}, _add::MulAddMul) where {T <: Union{Float16, ComplexF16, BlasFloat}}
tA = tA in ('S', 's', 'H', 'h') ? 'N' : tA
tB = tB in ('S', 's', 'H', 'h') ? 'N' : tB
mm_wrapper(tA, tB, _add.alpha, A, B, _add.beta, C)
end

so we would need the stacktrace to see how dispatch goes and where it deviates from the expected path. It could be that I missed some VERSION-dependent branching, though.

@dkarrasch
Copy link
Contributor

I played with it a little locally, but it seems like it should run by LinearAlgebra.generic_matmatmul! also on v1.9, for which we do have the above method, so we really need the stacktrace (for both calls) to see where it's leaving the right path. I can't test it locally, unfortunately.

@amontoison
Copy link
Member

amontoison commented Sep 7, 2023

@dkarrasch @lgravina1997
I just remarked that sparse([1,2,3], [1,2,3], [1,2,3]) is a sparse matrix with integer coefficients.
It's normal that the products give scalar indexing, it's not a "BlasFloat" type.

@dkarrasch
Copy link
Contributor

True. So, to confirm, for float types everything works as expected @lgravina1997?

@maleadt maleadt added the needs information Further information is requested label Sep 12, 2023
@stmorgenstern
Copy link

Running into this same error currently, while trying to speed up some expensive jacobian calculations. Here's my MWE and full stack trace:

using CUDA,CUDA.CUSPARSE,SparseArrays,LinearAlgebra
N = 20
CUDA.allowscalar(false)
A = cu(sparse(I(N^3)))
B = cu(sparse(I(N^3)))
C = cu(spzeros(N^3,N^3))
mul!(C,A,B)

Stacktrace:

ERROR: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
Stacktrace:
 [1] error(s::String)
   @ Base .\error.jl:35
 [2] assertscalar(op::String)
   @ GPUArraysCore C:\Users\Sam\.julia\packages\GPUArraysCore\uOYfN\src\GPUArraysCore.jl:103
 [3] getindex(xs::CuArray{Int32, 1, CUDA.Mem.DeviceBuffer}, I::Int64)
   @ GPUArrays C:\Users\Sam\.julia\packages\GPUArrays\EZkix\src\host\indexing.jl:9
 [4] getindex(A::CuSparseMatrixCSC{Bool, Int32}, i0::Int64, i1::Int64)
   @ CUDA.CUSPARSE C:\Users\Sam\.julia\packages\CUDA\nbRJk\lib\cusparse\array.jl:310
 [5] _generic_matmatmul!(C::CuSparseMatrixCSC{Float32, Int32}, tA::Char, tB::Char, A::CuSparseMatrixCSC{Bool, Int32}, B::CuSparseMatrixCSC{Bool, Int32}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
   @ LinearAlgebra C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\LinearAlgebra\src\matmul.jl:876
 [6] generic_matmatmul!(C::CuSparseMatrixCSC{Float32, Int32}, tA::Char, tB::Char, A::CuSparseMatrixCSC{Bool, Int32}, B::CuSparseMatrixCSC{Bool, Int32}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
   @ LinearAlgebra C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\LinearAlgebra\src\matmul.jl:844
 [7] mul!
   @ C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\LinearAlgebra\src\matmul.jl:303 [inlined]
 [8] mul!(C::CuSparseMatrixCSC{Float32, Int32}, A::CuSparseMatrixCSC{Bool, Int32}, B::CuSparseMatrixCSC{Bool, Int32})
   @ LinearAlgebra C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\LinearAlgebra\src\matmul.jl:276
 [9] top-level scope
   @ REPL[8]:1

julia version info:

Julia Version 1.9.2
Commit e4ee485e90 (2023-07-05 09:39 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 32 × AMD Ryzen 9 7950X 16-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, znver3)
  Threads: 36 on 32 virtual cores
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 32

CUDA version info

CUDA runtime 12.2, artifact installation
CUDA driver 12.1
NVIDIA driver 531.29.0

CUDA libraries:
- CUBLAS: 12.2.5
- CURAND: 10.3.3
- CUFFT: 11.0.8
- CUSOLVER: 11.5.2
- CUSPARSE: 12.1.2
- CUPTI: 20.0.0
- NVML: 12.0.0+531.29

Julia packages:
- CUDA: 5.0.0
- CUDA_Driver_jll: 0.6.0+3
- CUDA_Runtime_jll: 0.9.2+0

Toolchain:
- Julia: 1.9.2
- LLVM: 14.0.6
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5
- Device capability support: sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

1 device:
  0: NVIDIA GeForce RTX 4070 Ti (sm_89, 9.035 GiB / 11.994 GiB available)

@stmorgenstern
Copy link

Running into this same error currently, while trying to speed up some expensive jacobian calculations. Here's my MWE and full stack trace:

using CUDA,CUDA.CUSPARSE,SparseArrays,LinearAlgebra
N = 20
CUDA.allowscalar(false)
A = cu(sparse(I(N^3)))
B = cu(sparse(I(N^3)))
C = cu(spzeros(N^3,N^3))
mul!(C,A,B)

Stacktrace:

ERROR: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
Stacktrace:
 [1] error(s::String)
   @ Base .\error.jl:35
 [2] assertscalar(op::String)
   @ GPUArraysCore C:\Users\Sam\.julia\packages\GPUArraysCore\uOYfN\src\GPUArraysCore.jl:103
 [3] getindex(xs::CuArray{Int32, 1, CUDA.Mem.DeviceBuffer}, I::Int64)
   @ GPUArrays C:\Users\Sam\.julia\packages\GPUArrays\EZkix\src\host\indexing.jl:9
 [4] getindex(A::CuSparseMatrixCSC{Bool, Int32}, i0::Int64, i1::Int64)
   @ CUDA.CUSPARSE C:\Users\Sam\.julia\packages\CUDA\nbRJk\lib\cusparse\array.jl:310
 [5] _generic_matmatmul!(C::CuSparseMatrixCSC{Float32, Int32}, tA::Char, tB::Char, A::CuSparseMatrixCSC{Bool, Int32}, B::CuSparseMatrixCSC{Bool, Int32}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
   @ LinearAlgebra C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\LinearAlgebra\src\matmul.jl:876
 [6] generic_matmatmul!(C::CuSparseMatrixCSC{Float32, Int32}, tA::Char, tB::Char, A::CuSparseMatrixCSC{Bool, Int32}, B::CuSparseMatrixCSC{Bool, Int32}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
   @ LinearAlgebra C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\LinearAlgebra\src\matmul.jl:844
 [7] mul!
   @ C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\LinearAlgebra\src\matmul.jl:303 [inlined]
 [8] mul!(C::CuSparseMatrixCSC{Float32, Int32}, A::CuSparseMatrixCSC{Bool, Int32}, B::CuSparseMatrixCSC{Bool, Int32})
   @ LinearAlgebra C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\LinearAlgebra\src\matmul.jl:276
 [9] top-level scope
   @ REPL[8]:1

julia version info:

Julia Version 1.9.2
Commit e4ee485e90 (2023-07-05 09:39 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 32 × AMD Ryzen 9 7950X 16-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, znver3)
  Threads: 36 on 32 virtual cores
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 32

CUDA version info

CUDA runtime 12.2, artifact installation
CUDA driver 12.1
NVIDIA driver 531.29.0

CUDA libraries:
- CUBLAS: 12.2.5
- CURAND: 10.3.3
- CUFFT: 11.0.8
- CUSOLVER: 11.5.2
- CUSPARSE: 12.1.2
- CUPTI: 20.0.0
- NVML: 12.0.0+531.29

Julia packages:
- CUDA: 5.0.0
- CUDA_Driver_jll: 0.6.0+3
- CUDA_Runtime_jll: 0.9.2+0

Toolchain:
- Julia: 1.9.2
- LLVM: 14.0.6
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5
- Device capability support: sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

1 device:
  0: NVIDIA GeForce RTX 4070 Ti (sm_89, 9.035 GiB / 11.994 GiB available)

Also while looking into this, I noticed it might be possible that the mul! tests in the CUSPARSE tests are not catching the CuSparseMatrixCSC * CuSparseMatrixCSC case explicitly leading to this error slipping through the cracks

@dkarrasch
Copy link
Contributor

Same error, same cause: #2072 (comment)

What happens if you turn your Is into 1.0Is?

@stmorgenstern
Copy link

stmorgenstern commented Oct 6, 2023

Same error, same cause: #2072 (comment)

What happens if you turn your Is into 1.0Is?

using CUDA,CUDA.CUSPARSE,SparseArrays,LinearAlgebra
N = 20
CUDA.allowscalar(false)
A = cu(sparse(1.0I(N^3)))
B = cu(sparse(1.0I(N^3)))
C = cu(spzeros(N^3,N^3))
mul!(C,A,B)
julia> mul!(C,A,B)
8000×8000 CuSparseMatrixCSC{Float32, Int32} with 8000 stored entries:
Error showing value of type CuSparseMatrixCSC{Float32, Int32}:
ERROR: ArgumentError: 1 == colptr[8000] > colptr[8001] == 0
Stacktrace:
  [1] (::SparseArrays.var"#throwmonotonic#3")(ckp::Int32, ck::Int32, k::Int64)
    @ SparseArrays C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\SparseArrays\src\sparsematrix.jl:141
  [2] sparse_check
    @ C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\SparseArrays\src\sparsematrix.jl:148 [inlined]
  [3] SparseMatrixCSC(m::Int64, n::Int64, colptr::Vector{Int32}, rowval::Vector{Int32}, nzval::Vector{Float32})
    @ SparseArrays C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\SparseArrays\src\sparsematrix.jl:38
  [4] SparseMatrixCSC(x::CuSparseMatrixCSC{Float32, Int32})
    @ CUDA.CUSPARSE C:\Users\Sam\.julia\packages\CUDA\nbRJk\lib\cusparse\array.jl:403
  [5] show(io::IOContext{Base.TTY}, mime::MIME{Symbol("text/plain")}, S::CuSparseMatrixCSC{Float32, Int32})
    @ CUDA.CUSPARSE C:\Users\Sam\.julia\packages\CUDA\nbRJk\lib\cusparse\array.jl:540
  [6] (::REPL.var"#55#56"{REPL.REPLDisplay{REPL.LineEditREPL}, MIME{Symbol("text/plain")}, Base.RefValue{Any}})(io::Any)
    @ REPL C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\REPL\src\REPL.jl:276
  [7] with_repl_linfo(f::Any, repl::REPL.LineEditREPL)
    @ REPL C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\REPL\src\REPL.jl:557
  [8] display(d::REPL.REPLDisplay, mime::MIME{Symbol("text/plain")}, x::Any)
    @ REPL C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\REPL\src\REPL.jl:262
  [9] display(d::REPL.REPLDisplay, x::Any)
    @ REPL C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\REPL\src\REPL.jl:281
 [10] display(x::Any)
    @ Base.Multimedia .\multimedia.jl:340
 [11] #invokelatest#2
    @ .\essentials.jl:816 [inlined]
 [12] invokelatest
    @ .\essentials.jl:813 [inlined]
 [13] print_response(errio::IO, response::Any, show_value::Bool, have_color::Bool, specialdisplay::Union{Nothing, AbstractDisplay})
    @ REPL C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\REPL\src\REPL.jl:305
 [14] (::REPL.var"#57#58"{REPL.LineEditREPL, Pair{Any, Bool}, Bool, Bool})(io::Any)
    @ REPL C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\REPL\src\REPL.jl:287
 [15] with_repl_linfo(f::Any, repl::REPL.LineEditREPL)
    @ REPL C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\REPL\src\REPL.jl:557
 [16] print_response(repl::REPL.AbstractREPL, response::Any, show_value::Bool, have_color::Bool)
    @ REPL C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\REPL\src\REPL.jl:285
 [17] (::REPL.var"#do_respond#80"{Bool, Bool, REPL.var"#93#103"{REPL.LineEditREPL, REPL.REPLHistoryProvider}, REPL.LineEditREPL, REPL.LineEdit.Prompt})(s::REPL.LineEdit.MIState, buf::Any, ok::Bool)
    @ REPL C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\REPL\src\REPL.jl:899
 [18] (::VSCodeServer.var"#101#104"{REPL.var"#do_respond#80"{Bool, Bool, REPL.var"#93#103"{REPL.LineEditREPL, REPL.REPLHistoryProvider}, REPL.LineEditREPL, REPL.LineEdit.Prompt}})(mi::REPL.LineEdit.MIState, buf::IOBuffer, ok::Bool)
    @ VSCodeServer c:\Users\Sam\.vscode\extensions\julialang.language-julia-1.54.2\scripts\packages\VSCodeServer\src\repl.jl:122
 [19] #invokelatest#2
    @ .\essentials.jl:816 [inlined]
 [20] invokelatest
    @ .\essentials.jl:813 [inlined]
 [21] run_interface(terminal::REPL.Terminals.TextTerminal, m::REPL.LineEdit.ModalInterface, s::REPL.LineEdit.MIState)
    @ REPL.LineEdit C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\REPL\src\LineEdit.jl:2647
 [22] run_frontend(repl::REPL.LineEditREPL, backend::REPL.REPLBackendRef)
    @ REPL C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\REPL\src\REPL.jl:1300
 [23] (::REPL.var"#62#68"{REPL.LineEditREPL, REPL.REPLBackendRef})()
    @ REPL .\task.jl:514

Actually, I believe this is just a display error. The calculation seems to be fine if I'm reading it correctly.

@dkarrasch
Copy link
Contributor

But this is just an error in the show method, mul! doesn't throw.

@stmorgenstern
Copy link

But this is just an error in the show method, mul! doesn't throw.

Yup you're right I read it too hastily. Its a bit strange. I ran a couple other tests closer to what I'm using in my actual jacobian calc and adding the 1.0 in front seems to fix things at least at first glance?

@amontoison
Copy link
Member

amontoison commented Oct 6, 2023

A, B and C must have the same type. I don't understand why the result is CuSparseMatrixCSC{Float32, Int32}. It should be a double precision sparse matrix.
Can you check that all your matrices are CuSparseMatrixCSC{Float64,Int32}?

@dkarrasch
Copy link
Contributor

I'd guess so. The specialized mul! methods are restricted to BlasFloat eltypes, and otherwise fall back to something else: GPUArrays.jl, LinearAlgebra, SparseArrays, whatever catches it.

@stmorgenstern
Copy link

Here's the type check from the example with 1.0I

julia> typeof(C),typeof(A),typeof(B)
(CuSparseMatrixCSC{Float32, Int32}, CuSparseMatrixCSC{Float32, Int32}, CuSparseMatrixCSC{Float32, Int32})

@amontoison
Copy link
Member

amontoison commented Oct 6, 2023

Do you have the same error with the CuSparseMatrixCSR format?
What is the version of your CUDA toolkit?

@stmorgenstern
Copy link

Multiplication seems to be fine with CuSparseMatrixCSR format with my related case. My CUDA toolkit is version 12.1 (full version info is above). However, I went back and ran the original code from this issue and found it was creating some very weird behavior:

using CUDA,CUDA.CUSPARSE,SparseArrays,LinearAlgebra
CUDA.allowscalar(false)
A  = cu(sparse([1.,2.,3], [1.,2.,3.], [1.,2.,3.]))
B  = cu(rand(3,1))
C = similar(B)

typeof(A),typeof(B),typeof(C) #(CuSparseMatrixCSC{Float32, Int32}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer})
 mul!(C, A, B)
ERROR: Out of GPU memory trying to allocate 127.995 TiB
Effective GPU memory usage: 10.99% (1.318 GiB/11.994 GiB)
Memory pool usage: 64 bytes (32.000 MiB reserved)

Stacktrace:
  [1] macro expansion
    @ C:\Users\Sam\.julia\packages\CUDA\nbRJk\src\pool.jl:443 [inlined]
  [2] macro expansion
    @ .\timing.jl:393 [inlined]
  [3] #_alloc#996
    @ C:\Users\Sam\.julia\packages\CUDA\nbRJk\src\pool.jl:431 [inlined]
  [4] _alloc
    @ C:\Users\Sam\.julia\packages\CUDA\nbRJk\src\pool.jl:427 [inlined]
  [5] #alloc#995
    @ C:\Users\Sam\.julia\packages\CUDA\nbRJk\src\pool.jl:417 [inlined]
  [6] alloc
    @ C:\Users\Sam\.julia\packages\CUDA\nbRJk\src\pool.jl:411 [inlined]
  [7] CuArray{UInt8, 1, CUDA.Mem.DeviceBuffer}(#unused#::UndefInitializer, dims::Tuple{Int64})
    @ CUDA C:\Users\Sam\.julia\packages\CUDA\nbRJk\src\array.jl:74
  [8] CuArray
    @ C:\Users\Sam\.julia\packages\CUDA\nbRJk\src\array.jl:136 [inlined]
  [9] CuArray
    @ C:\Users\Sam\.julia\packages\CUDA\nbRJk\src\array.jl:149 [inlined]
 [10] with_workspace(f::CUDA.CUSPARSE.var"#1340#1342"{Float32, Char, Bool, Bool, CUDA.CUSPARSE.cusparseSpMMAlg_t, CUDA.CUSPARSE.CuDenseMatrixDescriptor, CUDA.CUSPARSE.CuDenseMatrixDescriptor}, eltyp::Type{UInt8}, size::CUDA.CUSPARSE.var"#bufferSize#1341"{Float32, Char, Bool, Bool, CUDA.CUSPARSE.cusparseSpMMAlg_t, CUDA.CUSPARSE.CuDenseMatrixDescriptor, CUDA.CUSPARSE.CuDenseMatrixDescriptor}, fallback::Nothing; keep::Bool)
    @ CUDA.APIUtils C:\Users\Sam\.julia\packages\CUDA\nbRJk\lib\utils\call.jl:67
 [11] with_workspace
    @ C:\Users\Sam\.julia\packages\CUDA\nbRJk\lib\utils\call.jl:58 [inlined]
 [12] #with_workspace#1
    @ C:\Users\Sam\.julia\packages\CUDA\nbRJk\lib\utils\call.jl:55 [inlined]
 [13] with_workspace (repeats 2 times)
    @ C:\Users\Sam\.julia\packages\CUDA\nbRJk\lib\utils\call.jl:55 [inlined]
 [14] mm!(transa::Char, transb::Char, alpha::Bool, A::CuSparseMatrixCSC{Float32, Int32}, B::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, beta::Bool, C::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, index::Char, algo::CUDA.CUSPARSE.cusparseSpMMAlg_t)
    @ CUDA.CUSPARSE C:\Users\Sam\.julia\packages\CUDA\nbRJk\lib\cusparse\generic.jl:237
 [15] mm!
    @ C:\Users\Sam\.julia\packages\CUDA\nbRJk\lib\cusparse\generic.jl:197 [inlined]
 [16] mm_wrapper(transa::Char, transb::Char, alpha::Bool, A::CuSparseMatrixCSC{Float32, Int32}, B::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, beta::Bool, C::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer})
    @ CUDA.CUSPARSE C:\Users\Sam\.julia\packages\CUDA\nbRJk\lib\cusparse\interfaces.jl:46
 [17] generic_matmatmul!(C::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, tA::Char, tB::Char, A::CuSparseMatrixCSC{Float32, Int32}, B::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
    @ CUDA.CUSPARSE C:\Users\Sam\.julia\packages\CUDA\nbRJk\lib\cusparse\interfaces.jl:76
 [18] mul!
    @ C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\LinearAlgebra\src\matmul.jl:303 [inlined]
 [19] mul!(C::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, A::CuSparseMatrixCSC{Float32, Int32}, B::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer})
    @ LinearAlgebra C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\LinearAlgebra\src\matmul.jl:276
 [20] top-level scope
    @ REPL[8]:1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs information Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants