Broadcasting and reshaped, transposed, CuArrays #228

mcabbott · 2020-06-17T08:56:58Z

Some broadcasting operations work fine on reshape(transpose(cu( objects, but some fail:

using CUDA
CUDA.allowscalar(false)
mat = cu(rand(2,2))

reshape(transpose(mat),2,1,2) .+ mat # ok

exp.(reshape(transpose(mat), 2,1,2)) # ERROR: scalar getindex is disallowed
CUDA.exp.(reshape(transpose(mat), 2,1,2)) # FATAL ERROR: Symbol "__nv_expf"not found

Something like this appears to be the cause of mcabbott/TensorCast.jl#25, where the failure happens only when broadcasting two such objects, just one is fine:

C = cu(ones(10,2))
L = cu(ones(10,3))

reshape(C,1,2,10) .+ reshape(L', 3,1,10) # ok
reshape(C',1,2,10) .+ reshape(L, 3,1,10) # ok
reshape(C',1,2,10) .+ reshape(L', 3,1,10) # ERROR: scalar getindex is disallowed

This is with CUDA v0.1.0, I get the same errors on CuArrays v2.2.1, and on CuArrays v1.7.2 the only change is this:

julia> CuArrays.CUDAnative.exp.(reshape(transpose(mat), 2,1,2)) #
ERROR: LLVM error: Program used external function '___nv_expf' which could not be resolved!

The text was updated successfully, but these errors were encountered:

maleadt · 2020-06-17T09:56:59Z

Dup of JuliaGPU/Adapt.jl#21. With how Adapt.jl currently works, it's much too complicated/expensive to extend our support for array wrappers to doubly-wrapped arrays (here, ReshapedArray of Transpose of CuArray).

mcabbott · 2020-06-17T10:07:24Z

OK, sorry I didn't search there.

Is there a workaround? Like some mechanism to tell it to keep unwrapping, even if this can't be triggered automatically?

I think what's happening in my second example is that one singly-wrapped CuArray pushes the whole broadcast to be done by CUDA, and then it's happy... is there or should there be a way to make this happen?

zed = similar(mat,1) .= 0
zed .+ CUDA.exp.(reshape(transpose(mat), 2,1,2)) # ok!

maleadt · 2020-06-22T13:23:18Z

Is there a workaround? Like some mechanism to tell it to keep unwrapping, even if this can't be triggered automatically?

Unwrapping would be materializing the wrapper using collect, and that's probably too expensive?

Anyway, you've demonstrated a workaround yourself: make sure there's an actual CuArray (or a single-wrapped one) participating in the broadcast, The resulting broadcast style will then be the GPU one. When you do exp.(reshape(transpose(mat), 2,1,2)), there's nothing with a GPU array style in there so it'll fall back to the default, indexing one. Changing exp to CUDAnative.exp there doesn't fundamentally change the broadcast, but will just try to apply that GPU function to what is already determined will be executing on the CPU (and hence crash).

An alternative workaround is to override the broadcaststyle:

julia> C = cu(ones(10,2));

julia> L = cu(ones(10,3));


julia> Meta.@lower reshape(C',1,2,10) .+ reshape(L', 3,1,10)
:($(Expr(:thunk, CodeInfo(
    @ none within `top-level scope`
1 ─ %1 = var"'"(C)
│   %2 = reshape(%1, 1, 2, 10)
│   %3 = var"'"(L)
│   %4 = reshape(%3, 3, 1, 10)
│   %5 = Base.broadcasted(+, %2, %4)
│   %6 = Base.materialize(%5)
└──      return %6

julia> a = reshape(C',1,2,10);

julia> b = reshape(L', 3,1,10);

julia> bc = Base.broadcasted(+, a, b);

julia> CUDA.allowscalar(false)

julia> Broadcast.materialize(bc)
ERROR: scalar getindex is disallowed
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] assertscalar(::String) at /home/tim/Julia/pkg/GPUArrays/src/host/indexing.jl:41
 [3] getindex at /home/tim/Julia/pkg/GPUArrays/src/host/indexing.jl:96 [inlined]
 [4] _getindex at ./abstractarray.jl:1066 [inlined]
 [5] getindex at ./abstractarray.jl:1043 [inlined]
 [6] getindex at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/adjtrans.jl:190 [inlined]
 [7] _unsafe_getindex_rs at ./reshapedarray.jl:249 [inlined]
 [8] _unsafe_getindex at ./reshapedarray.jl:246 [inlined]
 [9] getindex at ./reshapedarray.jl:234 [inlined]
 [10] _getindex at ./abstractarray.jl:1083 [inlined]
 [11] getindex at ./abstractarray.jl:1043 [inlined]
 [12] _broadcast_getindex at ./broadcast.jl:614 [inlined]
 [13] _getindex at ./broadcast.jl:644 [inlined]
 [14] _broadcast_getindex at ./broadcast.jl:620 [inlined]
 [15] getindex at ./broadcast.jl:575 [inlined]
 [16] macro expansion at ./broadcast.jl:932 [inlined]
 [17] macro expansion at ./simdloop.jl:77 [inlined]
 [18] copyto! at ./broadcast.jl:931 [inlined]
 [19] copyto! at ./broadcast.jl:886 [inlined]
 [20] copy at ./broadcast.jl:862 [inlined]
 [21] materialize(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{3},Nothing,typeof(+),Tuple{Base.ReshapedArray{Float32,3,LinearAlgebra.Adjoint{Float32,CuArray{Float32,2,Nothing}},Tuple{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}}},Base.ReshapedArray{Float32,3,LinearAlgebra.Adjoint{Float32,CuArray{Float32,2,Nothing}},Tuple{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}}}}}) at ./broadcast.jl:837
 [22] top-level scope at REPL[51]:1

julia> bc = Base.broadcasted(CUDA.CuArrayStyle{2}(), +, a, b);

julia> Broadcast.materialize(bc);

Maybe with some tooling this could be made practical for you (Broadcast.@withstyle C reshape(C',1,2,10) .+ reshape(L', 3,1,10)?)

mcabbott · 2020-06-29T12:26:54Z

Thanks for having a look, haven't got back to this.

Re tooling, I was wondering whether cu shouldn't be overloaded so that cu.(arrays...) changes the broadcast. Something similar was done by LazyArrays, with lazy.(x .+ y) delaying materialization. (It started roughly here: JuliaLang/julia#19198 (comment) )

But I also wonder why broadcasting can't see this itself. It is easy to recursively call parent on an array until you get something === itself, this extracts the underlying storage type. I thought there was some similar recursion in broadcasting styles depending on the style of the parent, although am not sure. (I meant to dig a bit, but not yet.)

maleadt · 2020-07-02T11:18:56Z

Re tooling, I was wondering whether cu shouldn't be overloaded so that cu.(arrays...) changes the broadcast. Something similar was done by LazyArrays, with lazy.(x .+ y) delaying materialization. (It started roughly here: JuliaLang/julia#19198 (comment) )

That's an interesting thought! I'm wary of overloading cu though, I have been mainly stripping if of its features to make code somewhat more explicit (it used to perform much more conversions, you could do stuff like cu[1,2], etc), but for broadcast it might make sense.

mcabbott · 2020-07-02T16:37:47Z

OK, right I don't know if cu is perfect, but it might be neat.

For the original issue here, mcabbott/TensorCast.jl#25, the ideal logic would be something that digs through wrappers, and returns an Array or a CuArray depending on what it sees.

But the other use is broadcasting over things like ranges or CartesianIndices, in which case it cannot guess, you will have to specify if you want a CuArray.

leios mentioned this issue Jun 29, 2020

Replace copy_stack_field_down! with broadcasts of reshaped MPIStateArrays CliMA/ClimateMachine.jl#1301

Merged

7 tasks

maleadt added bug Something isn't working help wanted Extra attention is needed cuda array Stuff about CuArray. labels Jul 2, 2020

mossr mentioned this issue Oct 8, 2020

ConvTranspose on GPU fails with certain activation functions FluxML/Flux.jl#1350

Closed

mcabbott mentioned this issue Oct 13, 2020

scalar getindex when shared index changes order mcabbott/TensorCast.jl#28

Closed

dfenn mentioned this issue Oct 13, 2020

FATAL ERROR: Symbol "__nv_tanhf"not found : when using Flux in GPU (not much more details) JuliaLang/julia#36845

Closed

maleadt mentioned this issue Aug 5, 2021

Allow user to change the channel axis for BatchNorm function and the likes FluxML/Flux.jl#1664

Open

foldfelis mentioned this issue Aug 11, 2021

support GPU SciML/FluxNeuralOperators.jl#7

Merged

DomCRose mentioned this issue Aug 15, 2023

ProjectTo causes scalar indexing when taking adjoints of complex CuArray JuliaDiff/ChainRulesCore.jl#624

Open

ablaom mentioned this issue Aug 24, 2023

Bump compat for Metalhead FluxML/MLJFlux.jl#232

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Broadcasting and reshaped, transposed, CuArrays #228

Broadcasting and reshaped, transposed, CuArrays #228

mcabbott commented Jun 17, 2020

maleadt commented Jun 17, 2020

mcabbott commented Jun 17, 2020

maleadt commented Jun 22, 2020

mcabbott commented Jun 29, 2020

maleadt commented Jul 2, 2020

mcabbott commented Jul 2, 2020

Broadcasting and reshaped, transposed, CuArrays #228

Broadcasting and reshaped, transposed, CuArrays #228

Comments

mcabbott commented Jun 17, 2020

maleadt commented Jun 17, 2020

mcabbott commented Jun 17, 2020

maleadt commented Jun 22, 2020

mcabbott commented Jun 29, 2020

maleadt commented Jul 2, 2020

mcabbott commented Jul 2, 2020