Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU evaluation of Recurrence() broken on Metal #473

Closed
mchitre opened this issue Dec 28, 2023 · 15 comments · Fixed by #474
Closed

GPU evaluation of Recurrence() broken on Metal #473

mchitre opened this issue Dec 28, 2023 · 15 comments · Fixed by #474
Labels
bug Something isn't working

Comments

@mchitre
Copy link

mchitre commented Dec 28, 2023

This works:

m = Dense(10 => 10)
ps, st = Lux.setup(rng, m)
data = rand(10,10,10)
Lux.apply(m, data, ps, st)  # works
Lux.apply(m, gpu_device()(data), gpu_device()(ps), gpu_device()(st))  # works

but this doesn't:

m = Recurrence(RNNCell(10 => 10))
ps, st = Lux.setup(rng, m)
data = rand(10,10,10)
Lux.apply(m, data, ps, st)  # works
Lux.apply(m, gpu_device()(data), gpu_device()(ps), gpu_device()(st))  # fails

Stack trace:

ERROR: ArgumentError: cannot take the CPU address of a MtlMatrix{Float32, Metal.MTL.MTLResourceStorageModePrivate}
Stacktrace:
  [1] unsafe_convert(::Type{Ptr{Float32}}, x::MtlMatrix{Float32, Metal.MTL.MTLResourceStorageModePrivate})
    @ Metal ~/.julia/packages/Metal/lnkVP/src/array.jl:148
  [2] gemm!(transA::Char, transB::Char, alpha::Float32, A::MtlMatrix{…}, B::SubArray{…}, beta::Float32, C::MtlMatrix{…})
    @ LinearAlgebra.BLAS ~/.julia/juliaup/julia-1.10.0+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/LinearAlgebra/src/blas.jl:1524
  [3] gemm_wrapper!(C::MtlMatrix{…}, tA::Char, tB::Char, A::MtlMatrix{…}, B::SubArray{…}, _add::LinearAlgebra.MulAddMul{…})
    @ LinearAlgebra ~/.julia/juliaup/julia-1.10.0+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:605
  [4] generic_matmatmul!(C::MtlMatrix{…}, tA::Char, tB::Char, A::MtlMatrix{…}, B::SubArray{…}, _add::LinearAlgebra.MulAddMul{…})
    @ LinearAlgebra ~/.julia/juliaup/julia-1.10.0+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:352
  [5] mul!
    @ ~/.julia/juliaup/julia-1.10.0+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:263 [inlined]
  [6] mul!
    @ ~/.julia/juliaup/julia-1.10.0+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:237 [inlined]
  [7] *(A::MtlMatrix{Float32, Metal.MTL.MTLResourceStorageModePrivate}, B::SubArray{Float32, 2, MtlArray{…}, Tuple{…}, false})
    @ LinearAlgebra ~/.julia/juliaup/julia-1.10.0+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:113
  [8] (::RNNCell{…})(::Tuple{…}, ps::@NamedTuple{…}, st::@NamedTuple{…})
    @ Lux ~/.julia/packages/Lux/VFyfk/src/layers/recurrent.jl:259
  [9] (::RNNCell{true})(::Tuple{AbstractMatrix, Tuple{AbstractMatrix}}, ps::Any, st::NamedTuple)
    @ Lux ~/.julia/packages/Lux/VFyfk/src/layers/recurrent.jl:245 [inlined]
 [10] apply
    @ ~/.julia/packages/LuxCore/aumFq/src/LuxCore.jl:115 [inlined]
 [11] (::Recurrence{…})(x::Vector{…}, ps::@NamedTuple{…}, st::@NamedTuple{…})
    @ Lux ~/.julia/packages/Lux/VFyfk/src/layers/recurrent.jl:78
 [12] apply
    @ LuxCore ~/.julia/packages/LuxCore/aumFq/src/LuxCore.jl:115 [inlined]
 [13] Recurrence
    @ Lux ~/.julia/packages/Lux/VFyfk/src/layers/recurrent.jl:74 [inlined]
 [14] apply(model::Recurrence{…}, x::MtlArray{…}, ps::@NamedTuple{…}, st::@NamedTuple{…})
    @ LuxCore ~/.julia/packages/LuxCore/aumFq/src/LuxCore.jl:115
 [15] top-level scope
    @ REPL[93]:1
@avik-pal
Copy link
Member

Can you also post the stacktrace via show(err)?

@mchitre
Copy link
Author

mchitre commented Dec 28, 2023

julia> show(err)
1-element ExceptionStack:
ArgumentError: cannot take the CPU address of a MtlMatrix{Float32, Metal.MTL.MTLResourceStorageModePrivate}
Stacktrace:
  [1] unsafe_convert(::Type{Ptr{Float32}}, x::MtlMatrix{Float32, Metal.MTL.MTLResourceStorageModePrivate})
    @ Metal ~/.julia/packages/Metal/lnkVP/src/array.jl:148
  [2] gemm!(transA::Char, transB::Char, alpha::Float32, A::MtlMatrix{Float32, Metal.MTL.MTLResourceStorageModePrivate}, B::SubArray{Float32, 2, MtlArray{Float32, 3, Metal.MTL.MTLResourceStorageModePrivate}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64, Base.Slice{Base.OneTo{Int64}}}, false}, beta::Float32, C::MtlMatrix{Float32, Metal.MTL.MTLResourceStorageModePrivate})
    @ LinearAlgebra.BLAS ~/.julia/juliaup/julia-1.10.0+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/LinearAlgebra/src/blas.jl:1524
  [3] gemm_wrapper!(C::MtlMatrix{Float32, Metal.MTL.MTLResourceStorageModePrivate}, tA::Char, tB::Char, A::MtlMatrix{Float32, Metal.MTL.MTLResourceStorageModePrivate}, B::SubArray{Float32, 2, MtlArray{Float32, 3, Metal.MTL.MTLResourceStorageModePrivate}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64, Base.Slice{Base.OneTo{Int64}}}, false}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
    @ LinearAlgebra ~/.julia/juliaup/julia-1.10.0+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:605
  [4] generic_matmatmul!(C::MtlMatrix{Float32, Metal.MTL.MTLResourceStorageModePrivate}, tA::Char, tB::Char, A::MtlMatrix{Float32, Metal.MTL.MTLResourceStorageModePrivate}, B::SubArray{Float32, 2, MtlArray{Float32, 3, Metal.MTL.MTLResourceStorageModePrivate}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64, Base.Slice{Base.OneTo{Int64}}}, false}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
    @ LinearAlgebra ~/.julia/juliaup/julia-1.10.0+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:352
  [5] mul!
    @ ~/.julia/juliaup/julia-1.10.0+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:263 [inlined]
  [6] mul!
    @ ~/.julia/juliaup/julia-1.10.0+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:237 [inlined]
  [7] *(A::MtlMatrix{Float32, Metal.MTL.MTLResourceStorageModePrivate}, B::SubArray{Float32, 2, MtlArray{Float32, 3, Metal.MTL.MTLResourceStorageModePrivate}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64, Base.Slice{Base.OneTo{Int64}}}, false})
    @ LinearAlgebra ~/.julia/juliaup/julia-1.10.0+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:113
  [8] (::RNNCell{true, false, typeof(tanh), typeof(zeros32), typeof(glorot_uniform), typeof(ones32)})(::Tuple{SubArray{Float32, 2, MtlArray{Float32, 3, Metal.MTL.MTLResourceStorageModePrivate}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64, Base.Slice{Base.OneTo{Int64}}}, false}, Tuple{Matrix{Float32}}}, ps::@NamedTuple{weight_ih::MtlMatrix{Float32, Metal.MTL.MTLResourceStorageModePrivate}, weight_hh::MtlMatrix{Float32, Metal.MTL.MTLResourceStorageModePrivate}, bias::MtlVector{Float32, Metal.MTL.MTLResourceStorageModePrivate}}, st::@NamedTuple{rng::Xoshiro})
    @ Lux ~/.julia/packages/Lux/VFyfk/src/layers/recurrent.jl:259
  [9] (::RNNCell{true})(::Tuple{AbstractMatrix, Tuple{AbstractMatrix}}, ps::Any, st::NamedTuple)
    @ Lux ~/.julia/packages/Lux/VFyfk/src/layers/recurrent.jl:245 [inlined]
 [10] apply
    @ ~/.julia/packages/LuxCore/aumFq/src/LuxCore.jl:115 [inlined]
 [11] (::Recurrence{false, RNNCell{true, false, typeof(tanh), typeof(zeros32), typeof(glorot_uniform), typeof(ones32)}, BatchLastIndex})(x::Vector{SubArray{Float32, 2, MtlArray{Float32, 3, Metal.MTL.MTLResourceStorageModePrivate}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64, Base.Slice{Base.OneTo{Int64}}}, false}}, ps::@NamedTuple{weight_ih::MtlMatrix{Float32, Metal.MTL.MTLResourceStorageModePrivate}, weight_hh::MtlMatrix{Float32, Metal.MTL.MTLResourceStorageModePrivate}, bias::MtlVector{Float32, Metal.MTL.MTLResourceStorageModePrivate}}, st::@NamedTuple{rng::Xoshiro})
    @ Lux ~/.julia/packages/Lux/VFyfk/src/layers/recurrent.jl:78
 [12] apply
    @ LuxCore ~/.julia/packages/LuxCore/aumFq/src/LuxCore.jl:115 [inlined]
 [13] Recurrence
    @ Lux ~/.julia/packages/Lux/VFyfk/src/layers/recurrent.jl:74 [inlined]
 [14] apply(model::Recurrence{false, RNNCell{true, false, typeof(tanh), typeof(zeros32), typeof(glorot_uniform), typeof(ones32)}, BatchLastIndex}, x::MtlArray{Float32, 3, Metal.MTL.MTLResourceStorageModePrivate}, ps::@NamedTuple{weight_ih::MtlMatrix{Float32, Metal.MTL.MTLResourceStorageModePrivate}, weight_hh::MtlMatrix{Float32, Metal.MTL.MTLResourceStorageModePrivate}, bias::MtlVector{Float32, Metal.MTL.MTLResourceStorageModePrivate}}, st::@NamedTuple{rng::Xoshiro})
    @ LuxCore ~/.julia/packages/LuxCore/aumFq/src/LuxCore.jl:115
 [15] top-level scope
    @ REPL[86]:1
 [16] eval
    @ ./boot.jl:385 [inlined]
 [17] eval
    @ ./Base.jl:88 [inlined]
 [18] repleval(m::Module, code::Expr, ::String)
    @ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.65.2/scripts/packages/VSCodeServer/src/repl.jl:229
 [19] (::VSCodeServer.var"#110#112"{Module, Expr, REPL.LineEditREPL, REPL.LineEdit.Prompt})()
    @ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.65.2/scripts/packages/VSCodeServer/src/repl.jl:192
 [20] with_logstate(f::Function, logstate::Any)
    @ Base.CoreLogging ./logging.jl:515
 [21] with_logger
    @ ./logging.jl:627 [inlined]
 [22] (::VSCodeServer.var"#109#111"{Module, Expr, REPL.LineEditREPL, REPL.LineEdit.Prompt})()
    @ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.65.2/scripts/packages/VSCodeServer/src/repl.jl:193
 [23] #invokelatest#2
    @ Base ./essentials.jl:887 [inlined]
 [24] invokelatest(::Any)
    @ Base ./essentials.jl:884
 [25] (::VSCodeServer.var"#62#63")()
    @ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.65.2/scripts/packages/VSCodeServer/src/eval.jl:34

@avik-pal
Copy link
Member

Ah I see, it is Metal not handling wrappers properly. CUDA is quite good it this as it doesn't return a wrapper type if the storage is contiguous. For AMD I had to patch it at one point, I will see if that can be handled similarly for Metal.

@mchitre
Copy link
Author

mchitre commented Dec 28, 2023

Yes, I confirm it works with CUDA. Changing issue title to reflect this.

@mchitre mchitre changed the title GPU evaluation of Recurrence() broken? GPU evaluation of Recurrence() broken on Metal Dec 28, 2023
@avik-pal
Copy link
Member

As a temporary workaround, you could pass in a vector of matrices instead of the 3D Array.

To solve this problem at its core, either:

  1. Metal needs to handle wrappers correctly and not dispatch to julia generic matmuls (upstream problem, not much I can do here)
  2. Inside the core lux layers wherever a view is created, for Metal we just force a copy (maybe enabling it via a preferences)

@avik-pal avik-pal added the bug Something isn't working label Dec 28, 2023
@mchitre
Copy link
Author

mchitre commented Dec 28, 2023

You mean like this?

data2 = [data[:,i,:] for i  1:10]
Lux.apply(m, gpu_device()(data2), gpu_device()(ps), gpu_device()(st))

On CPU they are equivalent. On Metal, it still doesn't work:

ERROR: ArgumentError: cannot take the CPU address of a MtlMatrix{Float32, Metal.MTL.MTLResourceStorageModePrivate}
Stacktrace:
  [1] unsafe_convert(::Type{Ptr{Float32}}, x::MtlMatrix{Float32, Metal.MTL.MTLResourceStorageModePrivate})
    @ Metal ~/.julia/packages/Metal/lnkVP/src/array.jl:148
  [2] gemm!(transA::Char, transB::Char, alpha::Float32, A::MtlMatrix{…}, B::Matrix{…}, beta::Float32, C::Matrix{…})
    @ LinearAlgebra.BLAS ~/.julia/juliaup/julia-1.10.0+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/LinearAlgebra/src/blas.jl:1524
  [3] gemm_wrapper!(C::Matrix{…}, tA::Char, tB::Char, A::MtlMatrix{…}, B::Matrix{…}, _add::LinearAlgebra.MulAddMul{…})
    @ LinearAlgebra ~/.julia/juliaup/julia-1.10.0+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:605
  [4] generic_matmatmul!
    @ ~/.julia/juliaup/julia-1.10.0+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:352 [inlined]
  [5] mul!
    @ ~/.julia/juliaup/julia-1.10.0+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:263 [inlined]
  [6] mul!
    @ ~/.julia/juliaup/julia-1.10.0+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:237 [inlined]
  [7] *(A::MtlMatrix{Float32, Metal.MTL.MTLResourceStorageModePrivate}, B::Matrix{Float32})
    @ LinearAlgebra ~/.julia/juliaup/julia-1.10.0+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:113
  [8] (::RNNCell{…})(::Tuple{…}, ps::@NamedTuple{…}, st::@NamedTuple{…})
    @ Lux ~/.julia/packages/Lux/VFyfk/src/layers/recurrent.jl:259
  [9] (::RNNCell{true})(::Tuple{AbstractMatrix, Tuple{AbstractMatrix}}, ps::Any, st::NamedTuple)
    @ Lux ~/.julia/packages/Lux/VFyfk/src/layers/recurrent.jl:245 [inlined]
 [10] apply
    @ ~/.julia/packages/LuxCore/aumFq/src/LuxCore.jl:115 [inlined]
 [11] (::Recurrence{…})(x::Vector{…}, ps::@NamedTuple{…}, st::@NamedTuple{…})
    @ Lux ~/.julia/packages/Lux/VFyfk/src/layers/recurrent.jl:78
 [12] apply(model::Recurrence{…}, x::Vector{…}, ps::@NamedTuple{…}, st::@NamedTuple{…})
    @ LuxCore ~/.julia/packages/LuxCore/aumFq/src/LuxCore.jl:115
 [13] top-level scope
    @ REPL[11]:1

@mchitre
Copy link
Author

mchitre commented Dec 28, 2023

For upstream issue, worth opening an issue on Metal.jl? I don't understand details enough to do it, so might need you to open one if you think it worthwhile.

@avik-pal
Copy link
Member

try data2 .|> gpu_device()

@mchitre
Copy link
Author

mchitre commented Dec 31, 2023

data2 = [data[:,i,:] for i  1:10]
data2 = data2 .|> gpu_device()
Lux.apply(m, data2, gpu_device()(ps), gpu_device()(st))

still gives:

ERROR: ArgumentError: cannot take the CPU address of a MtlMatrix{Float32, Metal.MTL.MTLResourceStorageModePrivate}
Stacktrace:
  [1] unsafe_convert(::Type{Ptr{Float32}}, x::MtlMatrix{Float32, Metal.MTL.MTLResourceStorageModePrivate})
    @ Metal ~/.julia/packages/Metal/lnkVP/src/array.jl:148
  [2] gemm!(transA::Char, transB::Char, alpha::Float32, A::MtlMatrix{…}, B::Matrix{…}, beta::Float32, C::Matrix{…})
    @ LinearAlgebra.BLAS ~/.julia/juliaup/julia-1.10.0+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/LinearAlgebra/src/blas.jl:1524
  [3] gemm_wrapper!(C::Matrix{…}, tA::Char, tB::Char, A::MtlMatrix{…}, B::Matrix{…}, _add::LinearAlgebra.MulAddMul{…})
    @ LinearAlgebra ~/.julia/juliaup/julia-1.10.0+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:605
  [4] generic_matmatmul!
    @ ~/.julia/juliaup/julia-1.10.0+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:352 [inlined]
  [5] mul!
    @ ~/.julia/juliaup/julia-1.10.0+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:263 [inlined]
  [6] mul!
    @ ~/.julia/juliaup/julia-1.10.0+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:237 [inlined]
  [7] *(A::MtlMatrix{Float32, Metal.MTL.MTLResourceStorageModePrivate}, B::Matrix{Float32})
    @ LinearAlgebra ~/.julia/juliaup/julia-1.10.0+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:113
  [8] (::RNNCell{…})(::Tuple{…}, ps::@NamedTuple{…}, st::@NamedTuple{…})
    @ Lux ~/.julia/packages/Lux/VFyfk/src/layers/recurrent.jl:259
  [9] (::RNNCell{true})(::Tuple{AbstractMatrix, Tuple{AbstractMatrix}}, ps::Any, st::NamedTuple)
    @ Lux ~/.julia/packages/Lux/VFyfk/src/layers/recurrent.jl:245 [inlined]
 [10] apply
    @ ~/.julia/packages/LuxCore/aumFq/src/LuxCore.jl:115 [inlined]
 [11] (::Recurrence{…})(x::Vector{…}, ps::@NamedTuple{…}, st::@NamedTuple{…})
    @ Lux ~/.julia/packages/Lux/VFyfk/src/layers/recurrent.jl:78
 [12] apply(model::Recurrence{…}, x::Vector{…}, ps::@NamedTuple{…}, st::@NamedTuple{…})
    @ LuxCore ~/.julia/packages/LuxCore/aumFq/src/LuxCore.jl:115
 [13] top-level scope
    @ REPL[26]:1
 [14] top-level scope
    @ ~/.julia/packages/Metal/lnkVP/src/initialization.jl:57

@prbzrg
Copy link
Contributor

prbzrg commented Jan 1, 2024

What if we use Metal APIs for RNG and data?

m = Recurrence(RNNCell(10 => 10))
ps, st = Lux.setup(Metal.gpuarrays_rng(), m)
data = Metal.rand(10,10,10)
Lux.apply(m, data, ps, st)

@mchitre
Copy link
Author

mchitre commented Jan 2, 2024

julia> ps, st = Lux.setup(Metal.gpuarrays_rng(), m)
ERROR: MethodError: no method matching Random.Sampler(::Type{GPUArrays.RNG}, ::Random.SamplerType{UInt32}, ::Val{1})

Closest candidates are:
  Random.Sampler(::Type{<:AbstractRNG}, ::Random.Sampler, ::Union{Val{1}, Val{Inf}})
   @ Random ~/.julia/juliaup/julia-1.10.0+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Random/src/Random.jl:147
  Random.Sampler(::Type{<:AbstractRNG}, ::Any, ::Union{Val{1}, Val{Inf}})
   @ Random ~/.julia/juliaup/julia-1.10.0+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Random/src/Random.jl:183
  Random.Sampler(::Type{<:AbstractRNG}, ::BitSet, ::Union{Val{1}, Val{Inf}})
   @ Random ~/.julia/juliaup/julia-1.10.0+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Random/src/generation.jl:450
  ...

@avik-pal
Copy link
Member

avik-pal commented Jan 2, 2024

We need a dispatch like

@inline function Lux._init_hidden_state(rng::AbstractRNG, rnn, x::AMDGPU.AnyROCArray)
return ROCArray(rnn.init_state(rng, rnn.out_dims, size(x, 2)))
end
for Metal

@avik-pal
Copy link
Member

avik-pal commented Jan 2, 2024

@mchitre can you try out #474? I don't have Apple hardware handy currently to test out the patch. This should fix both the problems.

Regarding the patch @prbzrg posted, if that doesn't work for you it might be worthwhile opening an issue on Metal.jl

@mchitre
Copy link
Author

mchitre commented Jan 3, 2024

@avik-pal yes, I confirm that #474 fixes the problem. The original code in the description runs with it.

@avik-pal
Copy link
Member

avik-pal commented Jan 6, 2024

The patch will be available from the next release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants