Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConvTranspose on GPU fails with certain activation functions #1350

Closed
mossr opened this issue Oct 8, 2020 · 14 comments · Fixed by #1472
Closed

ConvTranspose on GPU fails with certain activation functions #1350

mossr opened this issue Oct 8, 2020 · 14 comments · Fixed by #1472

Comments

@mossr
Copy link

mossr commented Oct 8, 2020

Running ConvTranspose with certain activation functions other than the identity will seg. fault (here's an example with tanh):

using Flux, CUDA
x = rand(1,1,1,1) |> gpu
m = ConvTranspose((1,1), 1=>1, tanh) |> gpu
m(x)

FATAL ERROR: Symbol "__nv_tanhf"not found
signal (22): SIGABRT
...

Reading through JuliaGPU/CUDAnative.jl#632 tells me that the activation function is not being run on the GPU. I tried a few activation functions that I know are part of CUDA (tanh, exp, tanhshrink, softplus) and those all failed, but other activation function like σ and relu worked 🤷‍♂️ (σ is odd to me because exp and softplus fail due to "__nv_expf" not found).

If I run without the tanh activation (which defaults to indentity) this works. I also tried Conv and it does work with other activation functions—thus this might be an issue with conv_transpose_dims or ∇conv_data, but thought I'd get advice before digging any deeper.

This also may be related to JuliaGPU/CUDA.jl#228 if I followed that correctly.

Lastly, ConvTranspose does not have any explicit activation functions passed in the test/conv.jl tests, so this may have gone under the radar.

@arvindmohan
Copy link

I ran into exactly the same issue - glad to see its not an isolated problem. Have you managed a workaround thus far?

@mossr
Copy link
Author

mossr commented Nov 13, 2020

@arvindmohan I haven't found a workaround yet. But it's good to see this issue is gaining some traction.

@arvindmohan
Copy link

Yes, I'm a new user and its surprising no one else has mentioned this (considering how popular autoencoders are). I may be willing to take a stab at fixing it if I get some guidance...

@CarloLucibello
Copy link
Member

This is a puzzling error, I cannot get my head around it. I did the following experiments:

using  Flux


#  no-bias no-act 
function forw0(c::ConvTranspose, x::AbstractArray)
         # ndims(x) == ndims(c.weight)-1 && return squeezebatch(c(reshape(x, size(x)..., 1)))
         σ, b = c.σ, reshape(c.bias, map(_->1, c.stride)..., :, 1)
         cdims = Flux.conv_transpose_dims(c, x)
         ∇conv_data(x, c.weight, cdims)             
 end

#  bias only
function forw1(c::ConvTranspose, x::AbstractArray)
         # ndims(x) == ndims(c.weight)-1 && return squeezebatch(c(reshape(x, size(x)..., 1)))
         σ, b = c.σ, reshape(c.bias, map(_->1, c.stride)..., :, 1)
         cdims = Flux.conv_transpose_dims(c, x)
         ∇conv_data(x, c.weight, cdims) .+ b             
 end

#  activation only
function forw2(c::ConvTranspose, x::AbstractArray)
         # ndims(x) == ndims(c.weight)-1 && return squeezebatch(c(reshape(x, size(x)..., 1)))
         σ, b = c.σ, reshape(c.bias, map(_->1, c.stride)..., :, 1)
         cdims = Flux.conv_transpose_dims(c, x)
         c.σ.(∇conv_data(x, c.weight, cdims))     
 end


#  bias + activation
function forw3(c::ConvTranspose, x::AbstractArray)
         # ndims(x) == ndims(c.weight)-1 && return squeezebatch(c(reshape(x, size(x)..., 1)))
         σ, b = c.σ, reshape(c.bias, map(_->1, c.stride)..., :, 1)
         cdims = Flux.conv_transpose_dims(c, x)
         c.σ.(∇conv_data(x, c.weight, cdims) .+ b)         
 end

x = rand(1,1,1,1) |> gpu
m = ConvTranspose((1,1), 1=>1, tanh) |> gpu
b = rand(1,1,1,1) |> gpu

forw0(m, x)  # no bias no act OK
forw1(m, x)  # bias only OK
forw2(m, x)  # activation only OK 
tanh.(forw0(m, x)  .+  b)  # still OK
forw3(m, x)  # bias + activation ERROR
FATAL ERROR: Symbol "__nv_tanhf"not found
signal (6): Annullato
in expression starting at REPL[16]:1
gsignal at /usr/bin/../lib/libc.so.6 (unknown line)
abort at /usr/bin/../lib/libc.so.6 (unknown line)
unknown function (ip: 0x7f1b2ffd4083)
unknown function (ip: 0x7f1b300e00c8)
unknown function (ip: 0x7f1b300e0db2)
...

Maybe @maleadt can provide some feedback

@ToucheSir
Copy link
Member

ToucheSir commented Nov 14, 2020

You beat me to it! Trying to create a MWE for this without Flux was incredibly frustrating, but I think the following should work:

using NNlib, CUDA
# the next line is required to repro, i.e. everything noted below is not a problem if Zygote isn't imported.
# import Zygote

# inserting a function barrier to pass cdims prevents the issue, and thus @inline introduces it again
@inline function conv_transpose(x::AbstractArray, W, b, σ, cdims)
  σ.(∇conv_data(x, W, cdims) .+ b)
end
function conv_transpose(x::AbstractArray, W::AbstractArray, b::AbstractArray, s::Int64, p::Int64, d::Int64)
  cdims = DenseConvDims(
    # if _any_ parameter is wholly non-constant (e.g. `stride=(s,s)`, or `w_size=size(W)`), the issue occurs
    (1, 1, 1, 1),
    # note how even something this complex is fine (== can be constant folded?)
    (size(W)[1:end-1]..., 1),
    stride=(s, 1),
    padding=(p, p, p, 0),
    dilation=(d, 1),
  )
  conv_transpose(x, W, b, tanh, cdims)
end

weight = cu(rand(Float32, 1, 1, 1, 1))
bias = cu(zeros(Float32, 1, 1, 1, 1))
input = cu(rand(Float32, 1, 1, 1, 1));

conv_transpose(input, weight, bias, 1, 0, 1)

There appears to be some crazy spooky action at a distance happening wrt. importing Zygote. Not sure why though, since AFAICT Zygote doesn't override broadcasting on the forward pass. Wild guess: perhaps something in @adjoint is intercepting/rewriting calls? cc @DhairyaLGandhi

@code_warntype shows a potentially undesirable Union{CuArray{Float32,4}, OffsetArrays.OffsetArray{Float32,4,CuArray{Float32,4}}} instead of CuArray{Float32,4} when certain DenseConvDim type params are non-constant:

Variables
  #self#::Core.Compiler.Const(conv_transpose, false)
  x::CuArray{Float32,4}
  W::CuArray{Float32,4}
  b::CuArray{Float32,4}
  s::Int64
  p::Int64
  d::Int64
  cdims::DenseConvDims{2,_A,_B,_C,_D,_E,_F,_G} where _G where _F where _E where _D where _C where _B where _A

Body::Union{CuArray{Float32,4}, OffsetArrays.OffsetArray{Float32,4,CuArray{Float32,4}}}
1%1  = Main.size(W)::NTuple{4,Int64}%2  = (:stride, :padding, :dilation)::Core.Compiler.Const((:stride, :padding, :dilation), false)
│   %3  = Core.apply_type(Core.NamedTuple, %2)::Core.Compiler.Const(NamedTuple{(:stride, :padding, :dilation),T} where T<:Tuple, false)
│   %4  = Core.tuple(s, 1)::Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(1, false)])
│   %5  = Core.tuple(p, p, p, 0)::Core.Compiler.PartialStruct(NTuple{4,Int64}, Any[Int64, Int64, Int64, Core.Compiler.Const(0, false)])
│   %6  = Core.tuple(d, 1)::Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(1, false)])
│   %7  = Core.tuple(%4, %5, %6)::Core.Compiler.PartialStruct(Tuple{Tuple{Int64,Int64},NTuple{4,Int64},Tuple{Int64,Int64}}, Any[Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(1, false)]), Core.Compiler.PartialStruct(NTuple{4,Int64}, Any[Int64, Int64, Int64, Core.Compiler.Const(0, false)]), Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(1, false)])])
│   %8  = (%3)(%7)::Core.Compiler.PartialStruct(NamedTuple{(:stride, :padding, :dilation),Tuple{Tuple{Int64,Int64},NTuple{4,Int64},Tuple{Int64,Int64}}}, Any[Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(1, false)]), Core.Compiler.PartialStruct(NTuple{4,Int64}, Any[Int64, Int64, Int64, Core.Compiler.Const(0, false)]), Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(1, false)])])
│   %9  = Core.kwfunc(Main.DenseConvDims)::Core.Compiler.Const(Core.var"#Type##kw"(), false)
│   %10 = Core.tuple(1, 1, 1, 1)::Core.Compiler.Const((1, 1, 1, 1), false)
│         (cdims = (%9)(%8, Main.DenseConvDims, %10, %1))
│   %12 = Main.conv_transpose(x, W, b, Main.tanh, cdims)::Union{CuArray{Float32,4}, OffsetArrays.OffsetArray{Float32,4,CuArray{Float32,4}}}
└──       return %12

vs.

Variables
  #self#::Core.Compiler.Const(conv_transpose, false)
  x::CuArray{Float32,4}
  W::CuArray{Float32,4}
  b::CuArray{Float32,4}
  s::Int64
  p::Int64
  d::Int64
  cdims::DenseConvDims{2,_A,1,1,_B,_C,_D,false} where _D where _C where _B where _A

Body::CuArray{Float32,4}
1%1  = Main.size(W)::NTuple{4,Int64}%2  = Base.lastindex(%1)::Core.Compiler.Const(4, false)
│   %3  = (%2 - 1)::Core.Compiler.Const(3, false)
│   %4  = (1:%3)::Core.Compiler.Const(1:3, false)
│   %5  = Base.getindex(%1, %4)::Tuple{Int64,Int64,Int64}%6  = (:stride, :padding, :dilation)::Core.Compiler.Const((:stride, :padding, :dilation), false)
│   %7  = Core.apply_type(Core.NamedTuple, %6)::Core.Compiler.Const(NamedTuple{(:stride, :padding, :dilation),T} where T<:Tuple, false)
│   %8  = Core.tuple(s, 1)::Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(1, false)])
│   %9  = Core.tuple(p, p, p, 0)::Core.Compiler.PartialStruct(NTuple{4,Int64}, Any[Int64, Int64, Int64, Core.Compiler.Const(0, false)])
│   %10 = Core.tuple(d, 1)::Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(1, false)])
│   %11 = Core.tuple(%8, %9, %10)::Core.Compiler.PartialStruct(Tuple{Tuple{Int64,Int64},NTuple{4,Int64},Tuple{Int64,Int64}}, Any[Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(1, false)]), Core.Compiler.PartialStruct(NTuple{4,Int64}, Any[Int64, Int64, Int64, Core.Compiler.Const(0, false)]), Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(1, false)])])
│   %12 = (%7)(%11)::Core.Compiler.PartialStruct(NamedTuple{(:stride, :padding, :dilation),Tuple{Tuple{Int64,Int64},NTuple{4,Int64},Tuple{Int64,Int64}}}, Any[Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(1, false)]), Core.Compiler.PartialStruct(NTuple{4,Int64}, Any[Int64, Int64, Int64, Core.Compiler.Const(0, false)]), Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(1, false)])])
│   %13 = Core.kwfunc(Main.DenseConvDims)::Core.Compiler.Const(Core.var"#Type##kw"(), false)
│   %14 = Core.tuple(1, 1, 1, 1)::Core.Compiler.Const((1, 1, 1, 1), false)
│   %15 = Core.tuple(1)::Core.Compiler.Const((1,), false)
│   %16 = Core._apply_iterate(Base.iterate, Core.tuple, %5, %15)::Core.Compiler.PartialStruct(NTuple{4,Int64}, Any[Int64, Int64, Int64, Core.Compiler.Const(1, false)])
│         (cdims = (%13)(%12, Main.DenseConvDims, %14, %16))
│   %18 = Main.conv_transpose(x, W, b, Main.tanh, cdims)::CuArray{Float32,4}
└──       return %18

However, swapping out tanh for relu produces an almost-identical result to the first @code_warntype, yet doesn't crash...

@vchuravy
Copy link

So the error itself means that you ended up executing code that was intended to run on GPU on the CPU.

Can you run the same code with CUDA.allowscalar(false)?

@mossr
Copy link
Author

mossr commented Jan 22, 2021

I tried my original snippet again and no longer get the seg fault (without even needing the allowscalar call).

I'm using Flux v0.11.1 and CUDA v1.3.3 on Julia v1.5.2

using Flux, CUDA
x = rand(1,1,1,1) |> gpu
m = ConvTranspose((1,1), 1=>1, tanh) |> gpu
m(x)

outputs

1×1×1×1 CuArray{Float32,4}:
[:, :, 1, 1] =
 0.53790766

@DhairyaLGandhi
Copy link
Member

We should definitely increase test coverage in the GPU layer tests with adding more activation functions, but as I understand it, the issue with the op actually does work now?

@ToucheSir
Copy link
Member

I can confirm no issues with that MWE on Flux 0.11.4/Zygote 0.6.2/NNlib 0.7.12/CUDA 2.4.1. @joostveenema can you confirm your related issue has been resolved as well?

@ghost
Copy link

ghost commented Jan 22, 2021

Still crashes with the following MWE. Without the using OffsetArrays this example runs fine..

using Flux
using OffsetArrays

m = ConvTranspose((2, 2), 1 => 1, tanh) |> gpu
z = randn(1, 1, 1, 1) |> gpu
m(z)
Julia 1.5.3 (2020-11-09)
  [052768ef] CUDA v2.4.1 `https://github.com/JuliaGPU/CUDA.jl#v2.4.1`
  [587475ba] Flux v0.11.4
  [872c559c] NNlib v0.7.12
  [6fe1bfb0] OffsetArrays v1.5.1
  [e88e6eb3] Zygote v0.6.2 `https://github.com/FluxML/Zygote.jl#v0.6.2`

Adapting the earlier example in this thread (also crashes):

using NNlib, CUDA
using OffsetArrays

function conv_transpose(x::AbstractArray{yT,N}, W::AbstractArray, b::AbstractArray, s::Int64, p::Int64, d::Int64) where {yT, N}
  cdims = DenseConvDims(
    (1, 1, 1, 1),
    size(W),
  )
  
  # inlined NNlib.∇conv_data
  dx = similar(x, NNlib.input_size(cdims)..., NNlib.channels_in(cdims), size(x, N))
  x2 = NNlib.∇conv_data!(dx, x, w, cdims)  

  tanh.(x2 .+ b)
end

weight = cu(rand(Float32, 1, 1, 1, 1))
bias = cu(zeros(Float32, 1, 1, 1, 1))
input = cu(rand(Float32, 1, 1, 1, 1));

conv_transpose(input, weight, bias, 1, 0, 1)

code_warntype shows that the return type of NNlib.channels_in cannot be inferred. Since the the similar call could now dispatch to the Base.similar implementation in OffsetArrays the result type of the similar call is inferred as Union{CuArray{Float32,4}, OffsetArray{Float32,4,CuArray{Float32,4}}}. This might triger something like JuliaGPU/CUDA.jl#228?

Variables
  #self#::Core.Compiler.Const(conv_transpose, false)
  x::CuArray{Float32,4}
  W::CuArray{Float32,4}
  b::CuArray{Float32,4}
  s::Int64
  p::Int64
  d::Int64
  cdims::DenseConvDims{2,_A,_B,_C,_D,_E,_F,_G} where _G where _F where _E where _D where _C where _B where _A
  dx::Union{CuArray{Float32,4}, OffsetArray{Float32,4,CuArray{Float32,4}}}
  x2::Union{CuArray{Float32,4}, OffsetArray{Float32,4,CuArray{Float32,4}}}

Body::Union{CuArray{Float32,4}, OffsetArray{Float32,4,CuArray{Float32,4}}}
1 ─ %1  = Core.tuple(1, 1, 1, 1)::Core.Compiler.Const((1, 1, 1, 1), false)
│   %2  = Main.size(W)::NTuple{4,Int64}
│         (cdims = Main.DenseConvDims(%1, %2))
│   %4  = Core.tuple(x)::Tuple{CuArray{Float32,4}}
│   %5  = NNlib.input_size::Core.Compiler.Const(NNlib.input_size, false)
│   %6  = (%5)(cdims)::Tuple{Int64,Int64}
│   %7  = NNlib.channels_in::Core.Compiler.Const(NNlib.channels_in, false)
│   %8  = (%7)(cdims)::Any
│   %9  = Main.size(x, Main.N)::Any
│   %10 = Core.tuple(%8, %9)::Tuple{Any,Any}
│         (dx = Core._apply_iterate(Base.iterate, Main.similar, %4, %6, %10))
│   %12 = NNlib.∇conv_data!::Core.Compiler.Const(NNlib.∇conv_data!, false)
│   %13 = dx::Union{CuArray{Float32,4}, OffsetArray{Float32,4,CuArray{Float32,4}}}
│         (x2 = (%12)(%13, x, Main.w, cdims))
│   %15 = Base.broadcasted(Main.:+, x2, b)::Union{Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{4},Nothing,typeof(+),Tuple{CuArray{Float32,4},CuArray{Float32,4}}}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{4},Nothing,typeof(+),Tuple{OffsetArray{Float32,4,CuArray{Float32,4}},CuArray{Float32,4}}}}
│   %16 = Base.broadcasted(Main.tanh, %15)::Union{Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{4},Nothing,typeof(CUDA.tanh),Tuple{Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{4},Nothing,typeof(+),Tuple{CuArray{Float32,4},CuArray{Float32,4}}}}}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{4},Nothing,typeof(CUDA.tanh),Tuple{Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{4},Nothing,typeof(+),Tuple{OffsetArray{Float32,4,CuArray{Float32,4}},CuArray{Float32,4}}}}}}
│   %17 = Base.materialize(%16)::Union{CuArray{Float32,4}, OffsetArray{Float32,4,CuArray{Float32,4}}}
└──       return %17

@ghost
Copy link

ghost commented Feb 8, 2021

@ToucheSir the problems I had are solved in FluxML/NNlib.jl#275

@DhairyaLGandhi
Copy link
Member

Considering both the op and the other mwes seems to work fine, plus theres a pr with the test coverage, we should be able to track this better moving on.

@maleadt
Copy link
Collaborator

maleadt commented Feb 8, 2021

FWIW, these broadcasts should now error instead of crash: JuliaGPU/GPUArrays.jl#345.

@DhairyaLGandhi DhairyaLGandhi linked a pull request Mar 23, 2021 that will close this issue
@CarloLucibello
Copy link
Member

closing as each mwe has been fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants