You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some of the most basic features of CuArrays, including sum! (as in MWE), are broken (for some sizes and types) and make the program hang and keep consuming all the available resources of the current device.
This happens specifically when R is of narrower element type than A in
function GPUArrays.mapreducedim!(f::F, op::OP, R::AnyCuArray{T},
and the last dimension of A is not divisible by 32.
To reproduce
The Minimal Working Example (MWE) for this bug is:
using CUDA
using GPUArrays
# would not fail if sizeof(TR) >= sizeof(TA)
TA = Float64
TR = Float32
# would not fail for N % 32 == 0
N =1
A = CUDA.ones(TA, N)
R = CUDA.zeros(TR, 1)
# PROGRAM HANGS HERE# Either line has the same effect:sum!(R, A)
# GPUArrays.mapreducedim!(identity, +, R, A)@showall(R .== N) # should be true
This error occurs for any pair of types that need narrowing (Float64 -> Float32, Float64 -> Float16, Float32 -> Float16). In each case, N divisible by 32 works properly, but any other checked values (including 1, 31, 33, 16, 48, and some bigger values following the pattern) yield the erroneous behavior.
Same thing seems to happen happen with some extra dimensions added to R and A, regardless of their value - i.e. A = CUDA.ones(TA, p, q, r, N), R = CUDA.zeros(TR, p, q, r, 1).
Expected behavior
The computations should finish with correct result in finite time and resources usage. The MWE should show that all(R .== N) = true.
Version info
Details on Julia:
Julia Version 1.11.2
Commit 5e9a32e7af2 (2024-12-01 20:02 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 16 × Intel(R) Core(TM) i9-9960X CPU @ 3.10GHz
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, skylake-avx512)
Threads: 13 default, 0 interactive, 12 GC (on 16 virtual cores)
Environment:
JULIA_CUDA_MEMORY_POOL = none
JULIA_CUDA_USE_COMPAT = false
JULIA_PKG_USE_CLI_GIT = true
pawel-tarasiuk-quantumz
changed the title
mapreducedim! size-dependent fail when narrowing float conversions
mapreducedim! size-dependent fail when narrowing float element types
Dec 16, 2024
Description
Some of the most basic features of
CuArray
s, includingsum!
(as in MWE), are broken (for some sizes and types) and make the program hang and keep consuming all the available resources of the current device.This happens specifically when
R
is of narrower element type thanA
inCUDA.jl/src/mapreduce.jl
Line 169 in 4e9513b
A
is not divisible by32
.To reproduce
The Minimal Working Example (MWE) for this bug is:
This error occurs for any pair of types that need narrowing (
Float64
->Float32
,Float64
->Float16
,Float32
->Float16
). In each case,N
divisible by32
works properly, but any other checked values (including1
,31
,33
,16
,48
, and some bigger values following the pattern) yield the erroneous behavior.Same thing seems to happen happen with some extra dimensions added to
R
andA
, regardless of their value - i.e.A = CUDA.ones(TA, p, q, r, N)
,R = CUDA.zeros(TR, p, q, r, 1)
.Expected behavior
The computations should finish with correct result in finite time and resources usage. The MWE should show that
all(R .== N) = true
.Version info
Details on Julia:
Details on CUDA:
The text was updated successfully, but these errors were encountered: