-
-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ConvTranspose on GPU fails with certain activation functions #1350
Comments
I ran into exactly the same issue - glad to see its not an isolated problem. Have you managed a workaround thus far? |
@arvindmohan I haven't found a workaround yet. But it's good to see this issue is gaining some traction. |
Yes, I'm a new user and its surprising no one else has mentioned this (considering how popular autoencoders are). I may be willing to take a stab at fixing it if I get some guidance... |
This is a puzzling error, I cannot get my head around it. I did the following experiments: using Flux
# no-bias no-act
function forw0(c::ConvTranspose, x::AbstractArray)
# ndims(x) == ndims(c.weight)-1 && return squeezebatch(c(reshape(x, size(x)..., 1)))
σ, b = c.σ, reshape(c.bias, map(_->1, c.stride)..., :, 1)
cdims = Flux.conv_transpose_dims(c, x)
∇conv_data(x, c.weight, cdims)
end
# bias only
function forw1(c::ConvTranspose, x::AbstractArray)
# ndims(x) == ndims(c.weight)-1 && return squeezebatch(c(reshape(x, size(x)..., 1)))
σ, b = c.σ, reshape(c.bias, map(_->1, c.stride)..., :, 1)
cdims = Flux.conv_transpose_dims(c, x)
∇conv_data(x, c.weight, cdims) .+ b
end
# activation only
function forw2(c::ConvTranspose, x::AbstractArray)
# ndims(x) == ndims(c.weight)-1 && return squeezebatch(c(reshape(x, size(x)..., 1)))
σ, b = c.σ, reshape(c.bias, map(_->1, c.stride)..., :, 1)
cdims = Flux.conv_transpose_dims(c, x)
c.σ.(∇conv_data(x, c.weight, cdims))
end
# bias + activation
function forw3(c::ConvTranspose, x::AbstractArray)
# ndims(x) == ndims(c.weight)-1 && return squeezebatch(c(reshape(x, size(x)..., 1)))
σ, b = c.σ, reshape(c.bias, map(_->1, c.stride)..., :, 1)
cdims = Flux.conv_transpose_dims(c, x)
c.σ.(∇conv_data(x, c.weight, cdims) .+ b)
end
x = rand(1,1,1,1) |> gpu
m = ConvTranspose((1,1), 1=>1, tanh) |> gpu
b = rand(1,1,1,1) |> gpu
forw0(m, x) # no bias no act OK
forw1(m, x) # bias only OK
forw2(m, x) # activation only OK
tanh.(forw0(m, x) .+ b) # still OK
forw3(m, x) # bias + activation ERROR
Maybe @maleadt can provide some feedback |
You beat me to it! Trying to create a MWE for this without Flux was incredibly frustrating, but I think the following should work: using NNlib, CUDA
# the next line is required to repro, i.e. everything noted below is not a problem if Zygote isn't imported.
# import Zygote
# inserting a function barrier to pass cdims prevents the issue, and thus @inline introduces it again
@inline function conv_transpose(x::AbstractArray, W, b, σ, cdims)
σ.(∇conv_data(x, W, cdims) .+ b)
end
function conv_transpose(x::AbstractArray, W::AbstractArray, b::AbstractArray, s::Int64, p::Int64, d::Int64)
cdims = DenseConvDims(
# if _any_ parameter is wholly non-constant (e.g. `stride=(s,s)`, or `w_size=size(W)`), the issue occurs
(1, 1, 1, 1),
# note how even something this complex is fine (== can be constant folded?)
(size(W)[1:end-1]..., 1),
stride=(s, 1),
padding=(p, p, p, 0),
dilation=(d, 1),
)
conv_transpose(x, W, b, tanh, cdims)
end
weight = cu(rand(Float32, 1, 1, 1, 1))
bias = cu(zeros(Float32, 1, 1, 1, 1))
input = cu(rand(Float32, 1, 1, 1, 1));
conv_transpose(input, weight, bias, 1, 0, 1) There appears to be some crazy spooky action at a distance happening wrt. importing Zygote. Not sure why though, since AFAICT Zygote doesn't override broadcasting on the forward pass. Wild guess: perhaps something in @adjoint is intercepting/rewriting calls? cc @DhairyaLGandhi
Variables
#self#::Core.Compiler.Const(conv_transpose, false)
x::CuArray{Float32,4}
W::CuArray{Float32,4}
b::CuArray{Float32,4}
s::Int64
p::Int64
d::Int64
cdims::DenseConvDims{2,_A,_B,_C,_D,_E,_F,_G} where _G where _F where _E where _D where _C where _B where _A
Body::Union{CuArray{Float32,4}, OffsetArrays.OffsetArray{Float32,4,CuArray{Float32,4}}}
1 ─ %1 = Main.size(W)::NTuple{4,Int64}
│ %2 = (:stride, :padding, :dilation)::Core.Compiler.Const((:stride, :padding, :dilation), false)
│ %3 = Core.apply_type(Core.NamedTuple, %2)::Core.Compiler.Const(NamedTuple{(:stride, :padding, :dilation),T} where T<:Tuple, false)
│ %4 = Core.tuple(s, 1)::Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(1, false)])
│ %5 = Core.tuple(p, p, p, 0)::Core.Compiler.PartialStruct(NTuple{4,Int64}, Any[Int64, Int64, Int64, Core.Compiler.Const(0, false)])
│ %6 = Core.tuple(d, 1)::Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(1, false)])
│ %7 = Core.tuple(%4, %5, %6)::Core.Compiler.PartialStruct(Tuple{Tuple{Int64,Int64},NTuple{4,Int64},Tuple{Int64,Int64}}, Any[Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(1, false)]), Core.Compiler.PartialStruct(NTuple{4,Int64}, Any[Int64, Int64, Int64, Core.Compiler.Const(0, false)]), Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(1, false)])])
│ %8 = (%3)(%7)::Core.Compiler.PartialStruct(NamedTuple{(:stride, :padding, :dilation),Tuple{Tuple{Int64,Int64},NTuple{4,Int64},Tuple{Int64,Int64}}}, Any[Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(1, false)]), Core.Compiler.PartialStruct(NTuple{4,Int64}, Any[Int64, Int64, Int64, Core.Compiler.Const(0, false)]), Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(1, false)])])
│ %9 = Core.kwfunc(Main.DenseConvDims)::Core.Compiler.Const(Core.var"#Type##kw"(), false)
│ %10 = Core.tuple(1, 1, 1, 1)::Core.Compiler.Const((1, 1, 1, 1), false)
│ (cdims = (%9)(%8, Main.DenseConvDims, %10, %1))
│ %12 = Main.conv_transpose(x, W, b, Main.tanh, cdims)::Union{CuArray{Float32,4}, OffsetArrays.OffsetArray{Float32,4,CuArray{Float32,4}}}
└── return %12 vs. Variables
#self#::Core.Compiler.Const(conv_transpose, false)
x::CuArray{Float32,4}
W::CuArray{Float32,4}
b::CuArray{Float32,4}
s::Int64
p::Int64
d::Int64
cdims::DenseConvDims{2,_A,1,1,_B,_C,_D,false} where _D where _C where _B where _A
Body::CuArray{Float32,4}
1 ─ %1 = Main.size(W)::NTuple{4,Int64}
│ %2 = Base.lastindex(%1)::Core.Compiler.Const(4, false)
│ %3 = (%2 - 1)::Core.Compiler.Const(3, false)
│ %4 = (1:%3)::Core.Compiler.Const(1:3, false)
│ %5 = Base.getindex(%1, %4)::Tuple{Int64,Int64,Int64}
│ %6 = (:stride, :padding, :dilation)::Core.Compiler.Const((:stride, :padding, :dilation), false)
│ %7 = Core.apply_type(Core.NamedTuple, %6)::Core.Compiler.Const(NamedTuple{(:stride, :padding, :dilation),T} where T<:Tuple, false)
│ %8 = Core.tuple(s, 1)::Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(1, false)])
│ %9 = Core.tuple(p, p, p, 0)::Core.Compiler.PartialStruct(NTuple{4,Int64}, Any[Int64, Int64, Int64, Core.Compiler.Const(0, false)])
│ %10 = Core.tuple(d, 1)::Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(1, false)])
│ %11 = Core.tuple(%8, %9, %10)::Core.Compiler.PartialStruct(Tuple{Tuple{Int64,Int64},NTuple{4,Int64},Tuple{Int64,Int64}}, Any[Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(1, false)]), Core.Compiler.PartialStruct(NTuple{4,Int64}, Any[Int64, Int64, Int64, Core.Compiler.Const(0, false)]), Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(1, false)])])
│ %12 = (%7)(%11)::Core.Compiler.PartialStruct(NamedTuple{(:stride, :padding, :dilation),Tuple{Tuple{Int64,Int64},NTuple{4,Int64},Tuple{Int64,Int64}}}, Any[Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(1, false)]), Core.Compiler.PartialStruct(NTuple{4,Int64}, Any[Int64, Int64, Int64, Core.Compiler.Const(0, false)]), Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(1, false)])])
│ %13 = Core.kwfunc(Main.DenseConvDims)::Core.Compiler.Const(Core.var"#Type##kw"(), false)
│ %14 = Core.tuple(1, 1, 1, 1)::Core.Compiler.Const((1, 1, 1, 1), false)
│ %15 = Core.tuple(1)::Core.Compiler.Const((1,), false)
│ %16 = Core._apply_iterate(Base.iterate, Core.tuple, %5, %15)::Core.Compiler.PartialStruct(NTuple{4,Int64}, Any[Int64, Int64, Int64, Core.Compiler.Const(1, false)])
│ (cdims = (%13)(%12, Main.DenseConvDims, %14, %16))
│ %18 = Main.conv_transpose(x, W, b, Main.tanh, cdims)::CuArray{Float32,4}
└── return %18 However, swapping out |
So the error itself means that you ended up executing code that was intended to run on GPU on the CPU. Can you run the same code with |
I tried my original snippet again and no longer get the seg fault (without even needing the I'm using Flux v0.11.1 and CUDA v1.3.3 on Julia v1.5.2 using Flux, CUDA
x = rand(1,1,1,1) |> gpu
m = ConvTranspose((1,1), 1=>1, tanh) |> gpu
m(x) outputs 1×1×1×1 CuArray{Float32,4}:
[:, :, 1, 1] =
0.53790766 |
We should definitely increase test coverage in the GPU layer tests with adding more activation functions, but as I understand it, the issue with the op actually does work now? |
I can confirm no issues with that MWE on Flux 0.11.4/Zygote 0.6.2/NNlib 0.7.12/CUDA 2.4.1. @joostveenema can you confirm your related issue has been resolved as well? |
Still crashes with the following MWE. Without the
Adapting the earlier example in this thread (also crashes):
code_warntype shows that the return type of
|
@ToucheSir the problems I had are solved in FluxML/NNlib.jl#275 |
Considering both the op and the other mwes seems to work fine, plus theres a pr with the test coverage, we should be able to track this better moving on. |
FWIW, these broadcasts should now error instead of crash: JuliaGPU/GPUArrays.jl#345. |
closing as each mwe has been fixed |
Running
ConvTranspose
with certain activation functions other than theidentity
will seg. fault (here's an example withtanh
):Reading through JuliaGPU/CUDAnative.jl#632 tells me that the activation function is not being run on the GPU. I tried a few activation functions that I know are part of CUDA (
tanh
,exp
,tanhshrink
,softplus
) and those all failed, but other activation function likeσ
andrelu
worked 🤷♂️ (σ
is odd to me becauseexp
andsoftplus
fail due to "__nv_expf" not found).If I run without the
tanh
activation (which defaults toindentity
) this works. I also triedConv
and it does work with other activation functions—thus this might be an issue withconv_transpose_dims
or∇conv_data
, but thought I'd get advice before digging any deeper.This also may be related to JuliaGPU/CUDA.jl#228 if I followed that correctly.
Lastly,
ConvTranspose
does not have any explicit activation functions passed in thetest/conv.jl
tests, so this may have gone under the radar.The text was updated successfully, but these errors were encountered: