ConvTranspose on GPU fails with certain activation functions #1350

mossr · 2020-10-08T07:38:02Z

Running ConvTranspose with certain activation functions other than the identity will seg. fault (here's an example with tanh):

using Flux, CUDA
x = rand(1,1,1,1) |> gpu
m = ConvTranspose((1,1), 1=>1, tanh) |> gpu
m(x)

FATAL ERROR: Symbol "__nv_tanhf"not found
signal (22): SIGABRT
...

Reading through JuliaGPU/CUDAnative.jl#632 tells me that the activation function is not being run on the GPU. I tried a few activation functions that I know are part of CUDA (tanh, exp, tanhshrink, softplus) and those all failed, but other activation function like σ and relu worked 🤷‍♂️ (σ is odd to me because exp and softplus fail due to "__nv_expf" not found).

If I run without the tanh activation (which defaults to indentity) this works. I also tried Conv and it does work with other activation functions—thus this might be an issue with conv_transpose_dims or ∇conv_data, but thought I'd get advice before digging any deeper.

This also may be related to JuliaGPU/CUDA.jl#228 if I followed that correctly.

Lastly, ConvTranspose does not have any explicit activation functions passed in the test/conv.jl tests, so this may have gone under the radar.

The text was updated successfully, but these errors were encountered:

arvindmohan · 2020-11-13T05:55:54Z

I ran into exactly the same issue - glad to see its not an isolated problem. Have you managed a workaround thus far?

mossr · 2020-11-13T18:56:54Z

@arvindmohan I haven't found a workaround yet. But it's good to see this issue is gaining some traction.

arvindmohan · 2020-11-14T02:00:43Z

Yes, I'm a new user and its surprising no one else has mentioned this (considering how popular autoencoders are). I may be willing to take a stab at fixing it if I get some guidance...

CarloLucibello · 2020-11-14T13:33:38Z

This is a puzzling error, I cannot get my head around it. I did the following experiments:

using  Flux


#  no-bias no-act 
function forw0(c::ConvTranspose, x::AbstractArray)
         # ndims(x) == ndims(c.weight)-1 && return squeezebatch(c(reshape(x, size(x)..., 1)))
         σ, b = c.σ, reshape(c.bias, map(_->1, c.stride)..., :, 1)
         cdims = Flux.conv_transpose_dims(c, x)
         ∇conv_data(x, c.weight, cdims)             
 end

#  bias only
function forw1(c::ConvTranspose, x::AbstractArray)
         # ndims(x) == ndims(c.weight)-1 && return squeezebatch(c(reshape(x, size(x)..., 1)))
         σ, b = c.σ, reshape(c.bias, map(_->1, c.stride)..., :, 1)
         cdims = Flux.conv_transpose_dims(c, x)
         ∇conv_data(x, c.weight, cdims) .+ b             
 end

#  activation only
function forw2(c::ConvTranspose, x::AbstractArray)
         # ndims(x) == ndims(c.weight)-1 && return squeezebatch(c(reshape(x, size(x)..., 1)))
         σ, b = c.σ, reshape(c.bias, map(_->1, c.stride)..., :, 1)
         cdims = Flux.conv_transpose_dims(c, x)
         c.σ.(∇conv_data(x, c.weight, cdims))     
 end


#  bias + activation
function forw3(c::ConvTranspose, x::AbstractArray)
         # ndims(x) == ndims(c.weight)-1 && return squeezebatch(c(reshape(x, size(x)..., 1)))
         σ, b = c.σ, reshape(c.bias, map(_->1, c.stride)..., :, 1)
         cdims = Flux.conv_transpose_dims(c, x)
         c.σ.(∇conv_data(x, c.weight, cdims) .+ b)         
 end

x = rand(1,1,1,1) |> gpu
m = ConvTranspose((1,1), 1=>1, tanh) |> gpu
b = rand(1,1,1,1) |> gpu

forw0(m, x)  # no bias no act OK
forw1(m, x)  # bias only OK
forw2(m, x)  # activation only OK 
tanh.(forw0(m, x)  .+  b)  # still OK
forw3(m, x)  # bias + activation ERROR

FATAL ERROR: Symbol "__nv_tanhf"not found
signal (6): Annullato
in expression starting at REPL[16]:1
gsignal at /usr/bin/../lib/libc.so.6 (unknown line)
abort at /usr/bin/../lib/libc.so.6 (unknown line)
unknown function (ip: 0x7f1b2ffd4083)
unknown function (ip: 0x7f1b300e00c8)
unknown function (ip: 0x7f1b300e0db2)
...

Maybe @maleadt can provide some feedback

ToucheSir · 2020-11-14T20:09:13Z

You beat me to it! Trying to create a MWE for this without Flux was incredibly frustrating, but I think the following should work:

using NNlib, CUDA
# the next line is required to repro, i.e. everything noted below is not a problem if Zygote isn't imported.
# import Zygote

# inserting a function barrier to pass cdims prevents the issue, and thus @inline introduces it again
@inline function conv_transpose(x::AbstractArray, W, b, σ, cdims)
  σ.(∇conv_data(x, W, cdims) .+ b)
end
function conv_transpose(x::AbstractArray, W::AbstractArray, b::AbstractArray, s::Int64, p::Int64, d::Int64)
  cdims = DenseConvDims(
    # if _any_ parameter is wholly non-constant (e.g. `stride=(s,s)`, or `w_size=size(W)`), the issue occurs
    (1, 1, 1, 1),
    # note how even something this complex is fine (== can be constant folded?)
    (size(W)[1:end-1]..., 1),
    stride=(s, 1),
    padding=(p, p, p, 0),
    dilation=(d, 1),
  )
  conv_transpose(x, W, b, tanh, cdims)
end

weight = cu(rand(Float32, 1, 1, 1, 1))
bias = cu(zeros(Float32, 1, 1, 1, 1))
input = cu(rand(Float32, 1, 1, 1, 1));

conv_transpose(input, weight, bias, 1, 0, 1)

There appears to be some crazy spooky action at a distance happening wrt. importing Zygote. Not sure why though, since AFAICT Zygote doesn't override broadcasting on the forward pass. Wild guess: perhaps something in @adjoint is intercepting/rewriting calls? cc @DhairyaLGandhi

@code_warntype shows a potentially undesirable Union{CuArray{Float32,4}, OffsetArrays.OffsetArray{Float32,4,CuArray{Float32,4}}} instead of CuArray{Float32,4} when certain DenseConvDim type params are non-constant:

Variables
  #self#::Core.Compiler.Const(conv_transpose, false)
  x::CuArray{Float32,4}
  W::CuArray{Float32,4}
  b::CuArray{Float32,4}
  s::Int64
  p::Int64
  d::Int64
  cdims::DenseConvDims{2,_A,_B,_C,_D,_E,_F,_G} where _G where _F where _E where _D where _C where _B where _A

Body::Union{CuArray{Float32,4}, OffsetArrays.OffsetArray{Float32,4,CuArray{Float32,4}}}
1 ─ %1  = Main.size(W)::NTuple{4,Int64}
│   %2  = (:stride, :padding, :dilation)::Core.Compiler.Const((:stride, :padding, :dilation), false)
│   %3  = Core.apply_type(Core.NamedTuple, %2)::Core.Compiler.Const(NamedTuple{(:stride, :padding, :dilation),T} where T<:Tuple, false)
│   %4  = Core.tuple(s, 1)::Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(1, false)])
│   %5  = Core.tuple(p, p, p, 0)::Core.Compiler.PartialStruct(NTuple{4,Int64}, Any[Int64, Int64, Int64, Core.Compiler.Const(0, false)])
│   %6  = Core.tuple(d, 1)::Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(1, false)])
│   %7  = Core.tuple(%4, %5, %6)::Core.Compiler.PartialStruct(Tuple{Tuple{Int64,Int64},NTuple{4,Int64},Tuple{Int64,Int64}}, Any[Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(1, false)]), Core.Compiler.PartialStruct(NTuple{4,Int64}, Any[Int64, Int64, Int64, Core.Compiler.Const(0, false)]), Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(1, false)])])
│   %8  = (%3)(%7)::Core.Compiler.PartialStruct(NamedTuple{(:stride, :padding, :dilation),Tuple{Tuple{Int64,Int64},NTuple{4,Int64},Tuple{Int64,Int64}}}, Any[Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(1, false)]), Core.Compiler.PartialStruct(NTuple{4,Int64}, Any[Int64, Int64, Int64, Core.Compiler.Const(0, false)]), Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(1, false)])])
│   %9  = Core.kwfunc(Main.DenseConvDims)::Core.Compiler.Const(Core.var"#Type##kw"(), false)
│   %10 = Core.tuple(1, 1, 1, 1)::Core.Compiler.Const((1, 1, 1, 1), false)
│         (cdims = (%9)(%8, Main.DenseConvDims, %10, %1))
│   %12 = Main.conv_transpose(x, W, b, Main.tanh, cdims)::Union{CuArray{Float32,4}, OffsetArrays.OffsetArray{Float32,4,CuArray{Float32,4}}}
└──       return %12

vs.

Variables
  #self#::Core.Compiler.Const(conv_transpose, false)
  x::CuArray{Float32,4}
  W::CuArray{Float32,4}
  b::CuArray{Float32,4}
  s::Int64
  p::Int64
  d::Int64
  cdims::DenseConvDims{2,_A,1,1,_B,_C,_D,false} where _D where _C where _B where _A

Body::CuArray{Float32,4}
1 ─ %1  = Main.size(W)::NTuple{4,Int64}
│   %2  = Base.lastindex(%1)::Core.Compiler.Const(4, false)
│   %3  = (%2 - 1)::Core.Compiler.Const(3, false)
│   %4  = (1:%3)::Core.Compiler.Const(1:3, false)
│   %5  = Base.getindex(%1, %4)::Tuple{Int64,Int64,Int64}
│   %6  = (:stride, :padding, :dilation)::Core.Compiler.Const((:stride, :padding, :dilation), false)
│   %7  = Core.apply_type(Core.NamedTuple, %6)::Core.Compiler.Const(NamedTuple{(:stride, :padding, :dilation),T} where T<:Tuple, false)
│   %8  = Core.tuple(s, 1)::Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(1, false)])
│   %9  = Core.tuple(p, p, p, 0)::Core.Compiler.PartialStruct(NTuple{4,Int64}, Any[Int64, Int64, Int64, Core.Compiler.Const(0, false)])
│   %10 = Core.tuple(d, 1)::Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(1, false)])
│   %11 = Core.tuple(%8, %9, %10)::Core.Compiler.PartialStruct(Tuple{Tuple{Int64,Int64},NTuple{4,Int64},Tuple{Int64,Int64}}, Any[Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(1, false)]), Core.Compiler.PartialStruct(NTuple{4,Int64}, Any[Int64, Int64, Int64, Core.Compiler.Const(0, false)]), Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(1, false)])])
│   %12 = (%7)(%11)::Core.Compiler.PartialStruct(NamedTuple{(:stride, :padding, :dilation),Tuple{Tuple{Int64,Int64},NTuple{4,Int64},Tuple{Int64,Int64}}}, Any[Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(1, false)]), Core.Compiler.PartialStruct(NTuple{4,Int64}, Any[Int64, Int64, Int64, Core.Compiler.Const(0, false)]), Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(1, false)])])
│   %13 = Core.kwfunc(Main.DenseConvDims)::Core.Compiler.Const(Core.var"#Type##kw"(), false)
│   %14 = Core.tuple(1, 1, 1, 1)::Core.Compiler.Const((1, 1, 1, 1), false)
│   %15 = Core.tuple(1)::Core.Compiler.Const((1,), false)
│   %16 = Core._apply_iterate(Base.iterate, Core.tuple, %5, %15)::Core.Compiler.PartialStruct(NTuple{4,Int64}, Any[Int64, Int64, Int64, Core.Compiler.Const(1, false)])
│         (cdims = (%13)(%12, Main.DenseConvDims, %14, %16))
│   %18 = Main.conv_transpose(x, W, b, Main.tanh, cdims)::CuArray{Float32,4}
└──       return %18

However, swapping out tanh for relu produces an almost-identical result to the first @code_warntype, yet doesn't crash...

vchuravy · 2021-01-21T21:15:46Z

So the error itself means that you ended up executing code that was intended to run on GPU on the CPU.

Can you run the same code with CUDA.allowscalar(false)?

mossr · 2021-01-22T02:34:14Z

I tried my original snippet again and no longer get the seg fault (without even needing the allowscalar call).

I'm using Flux v0.11.1 and CUDA v1.3.3 on Julia v1.5.2

using Flux, CUDA
x = rand(1,1,1,1) |> gpu
m = ConvTranspose((1,1), 1=>1, tanh) |> gpu
m(x)

outputs

1×1×1×1 CuArray{Float32,4}:
[:, :, 1, 1] =
 0.53790766

DhairyaLGandhi · 2021-01-22T05:42:23Z

We should definitely increase test coverage in the GPU layer tests with adding more activation functions, but as I understand it, the issue with the op actually does work now?

ToucheSir · 2021-01-22T06:11:59Z

I can confirm no issues with that MWE on Flux 0.11.4/Zygote 0.6.2/NNlib 0.7.12/CUDA 2.4.1. @joostveenema can you confirm your related issue has been resolved as well?

ghost · 2021-01-22T11:55:36Z

Still crashes with the following MWE. Without the using OffsetArrays this example runs fine..

using Flux
using OffsetArrays

m = ConvTranspose((2, 2), 1 => 1, tanh) |> gpu
z = randn(1, 1, 1, 1) |> gpu
m(z)

Julia 1.5.3 (2020-11-09)
  [052768ef] CUDA v2.4.1 `https://github.com/JuliaGPU/CUDA.jl#v2.4.1`
  [587475ba] Flux v0.11.4
  [872c559c] NNlib v0.7.12
  [6fe1bfb0] OffsetArrays v1.5.1
  [e88e6eb3] Zygote v0.6.2 `https://github.com/FluxML/Zygote.jl#v0.6.2`

Adapting the earlier example in this thread (also crashes):

using NNlib, CUDA
using OffsetArrays

function conv_transpose(x::AbstractArray{yT,N}, W::AbstractArray, b::AbstractArray, s::Int64, p::Int64, d::Int64) where {yT, N}
  cdims = DenseConvDims(
    (1, 1, 1, 1),
    size(W),
  )
  
  # inlined NNlib.∇conv_data
  dx = similar(x, NNlib.input_size(cdims)..., NNlib.channels_in(cdims), size(x, N))
  x2 = NNlib.∇conv_data!(dx, x, w, cdims)  

  tanh.(x2 .+ b)
end

weight = cu(rand(Float32, 1, 1, 1, 1))
bias = cu(zeros(Float32, 1, 1, 1, 1))
input = cu(rand(Float32, 1, 1, 1, 1));

conv_transpose(input, weight, bias, 1, 0, 1)

code_warntype shows that the return type of NNlib.channels_in cannot be inferred. Since the the similar call could now dispatch to the Base.similar implementation in OffsetArrays the result type of the similar call is inferred as Union{CuArray{Float32,4}, OffsetArray{Float32,4,CuArray{Float32,4}}}. This might triger something like JuliaGPU/CUDA.jl#228?

Variables
  #self#::Core.Compiler.Const(conv_transpose, false)
  x::CuArray{Float32,4}
  W::CuArray{Float32,4}
  b::CuArray{Float32,4}
  s::Int64
  p::Int64
  d::Int64
  cdims::DenseConvDims{2,_A,_B,_C,_D,_E,_F,_G} where _G where _F where _E where _D where _C where _B where _A
  dx::Union{CuArray{Float32,4}, OffsetArray{Float32,4,CuArray{Float32,4}}}
  x2::Union{CuArray{Float32,4}, OffsetArray{Float32,4,CuArray{Float32,4}}}

Body::Union{CuArray{Float32,4}, OffsetArray{Float32,4,CuArray{Float32,4}}}
1 ─ %1  = Core.tuple(1, 1, 1, 1)::Core.Compiler.Const((1, 1, 1, 1), false)
│   %2  = Main.size(W)::NTuple{4,Int64}
│         (cdims = Main.DenseConvDims(%1, %2))
│   %4  = Core.tuple(x)::Tuple{CuArray{Float32,4}}
│   %5  = NNlib.input_size::Core.Compiler.Const(NNlib.input_size, false)
│   %6  = (%5)(cdims)::Tuple{Int64,Int64}
│   %7  = NNlib.channels_in::Core.Compiler.Const(NNlib.channels_in, false)
│   %8  = (%7)(cdims)::Any
│   %9  = Main.size(x, Main.N)::Any
│   %10 = Core.tuple(%8, %9)::Tuple{Any,Any}
│         (dx = Core._apply_iterate(Base.iterate, Main.similar, %4, %6, %10))
│   %12 = NNlib.∇conv_data!::Core.Compiler.Const(NNlib.∇conv_data!, false)
│   %13 = dx::Union{CuArray{Float32,4}, OffsetArray{Float32,4,CuArray{Float32,4}}}
│         (x2 = (%12)(%13, x, Main.w, cdims))
│   %15 = Base.broadcasted(Main.:+, x2, b)::Union{Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{4},Nothing,typeof(+),Tuple{CuArray{Float32,4},CuArray{Float32,4}}}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{4},Nothing,typeof(+),Tuple{OffsetArray{Float32,4,CuArray{Float32,4}},CuArray{Float32,4}}}}
│   %16 = Base.broadcasted(Main.tanh, %15)::Union{Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{4},Nothing,typeof(CUDA.tanh),Tuple{Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{4},Nothing,typeof(+),Tuple{CuArray{Float32,4},CuArray{Float32,4}}}}}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{4},Nothing,typeof(CUDA.tanh),Tuple{Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{4},Nothing,typeof(+),Tuple{OffsetArray{Float32,4,CuArray{Float32,4}},CuArray{Float32,4}}}}}}
│   %17 = Base.materialize(%16)::Union{CuArray{Float32,4}, OffsetArray{Float32,4,CuArray{Float32,4}}}
└──       return %17

ghost · 2021-02-08T11:10:45Z

@ToucheSir the problems I had are solved in FluxML/NNlib.jl#275

DhairyaLGandhi · 2021-02-08T11:40:08Z

Considering both the op and the other mwes seems to work fine, plus theres a pr with the test coverage, we should be able to track this better moving on.

maleadt · 2021-02-08T15:07:35Z

FWIW, these broadcasts should now error instead of crash: JuliaGPU/GPUArrays.jl#345.

CarloLucibello · 2021-03-24T07:50:05Z

closing as each mwe has been fixed

This was referenced Jan 17, 2021

Fix all the models FluxML/model-zoo#266

Open

Fix dcgan_mnist model FluxML/model-zoo#269

Merged

DhairyaLGandhi mentioned this issue Jan 22, 2021

Add activation tests for GPU layers #1472

Merged

ghost mentioned this issue Jan 27, 2021

DenseConvDims not always type stable FluxML/NNlib.jl#274

Closed

ghost mentioned this issue Feb 4, 2021

Force return type of channels_in and channels_out FluxML/NNlib.jl#275

Merged

DhairyaLGandhi linked a pull request Mar 23, 2021 that will close this issue

Add activation tests for GPU layers #1472

Merged

CarloLucibello closed this as completed Mar 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ConvTranspose on GPU fails with certain activation functions #1350

ConvTranspose on GPU fails with certain activation functions #1350

mossr commented Oct 8, 2020

arvindmohan commented Nov 13, 2020

mossr commented Nov 13, 2020

arvindmohan commented Nov 14, 2020

CarloLucibello commented Nov 14, 2020

ToucheSir commented Nov 14, 2020 •

edited

Loading

vchuravy commented Jan 21, 2021

mossr commented Jan 22, 2021

DhairyaLGandhi commented Jan 22, 2021

ToucheSir commented Jan 22, 2021

ghost commented Jan 22, 2021 •

edited by ghost

Loading

ghost commented Feb 8, 2021

DhairyaLGandhi commented Feb 8, 2021

maleadt commented Feb 8, 2021

CarloLucibello commented Mar 24, 2021

ConvTranspose on GPU fails with certain activation functions #1350

ConvTranspose on GPU fails with certain activation functions #1350

Comments

mossr commented Oct 8, 2020

arvindmohan commented Nov 13, 2020

mossr commented Nov 13, 2020

arvindmohan commented Nov 14, 2020

CarloLucibello commented Nov 14, 2020

ToucheSir commented Nov 14, 2020 • edited Loading

vchuravy commented Jan 21, 2021

mossr commented Jan 22, 2021

DhairyaLGandhi commented Jan 22, 2021

ToucheSir commented Jan 22, 2021

ghost commented Jan 22, 2021 • edited by ghost Loading

ghost commented Feb 8, 2021

DhairyaLGandhi commented Feb 8, 2021

maleadt commented Feb 8, 2021

CarloLucibello commented Mar 24, 2021

ToucheSir commented Nov 14, 2020 •

edited

Loading

ghost commented Jan 22, 2021 •

edited by ghost

Loading