mapreducedim! size-dependent fail when narrowing float element types #2595

pawel-tarasiuk-quantumz · 2024-12-16T17:52:12Z

Description

Some of the most basic features of CuArrays, including sum! (as in MWE), are broken (for some sizes and types) and make the program hang and keep consuming all the available resources of the current device.

This happens specifically when R is of narrower element type than A in

CUDA.jl/src/mapreduce.jl

Line 169 in 4e9513b

function GPUArrays.mapreducedim!(f::F, op::OP, R::AnyCuArray{T},

and the last dimension of A is not divisible by 32.

To reproduce

The Minimal Working Example (MWE) for this bug is:

using CUDA
using GPUArrays

# would not fail if sizeof(TR) >= sizeof(TA)
TA = Float64
TR = Float32

# would not fail for N % 32 == 0
N = 1

A = CUDA.ones(TA, N)
R = CUDA.zeros(TR, 1)

# PROGRAM HANGS HERE
# Either line has the same effect:
sum!(R, A)
# GPUArrays.mapreducedim!(identity, +, R, A)

@show all(R .== N)  # should be true

This error occurs for any pair of types that need narrowing (Float64 -> Float32, Float64 -> Float16, Float32 -> Float16). In each case, N divisible by 32 works properly, but any other checked values (including 1, 31, 33, 16, 48, and some bigger values following the pattern) yield the erroneous behavior.

Same thing seems to happen happen with some extra dimensions added to R and A, regardless of their value - i.e. A = CUDA.ones(TA, p, q, r, N), R = CUDA.zeros(TR, p, q, r, 1).

Expected behavior

The computations should finish with correct result in finite time and resources usage. The MWE should show that all(R .== N) = true.

Version info

Details on Julia:

Julia Version 1.11.2
Commit 5e9a32e7af2 (2024-12-01 20:02 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 16 × Intel(R) Core(TM) i9-9960X CPU @ 3.10GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, skylake-avx512)
Threads: 13 default, 0 interactive, 12 GC (on 16 virtual cores)
Environment:
  JULIA_CUDA_MEMORY_POOL = none
  JULIA_CUDA_USE_COMPAT = false
  JULIA_PKG_USE_CLI_GIT = true

Details on CUDA:

CUDA runtime 12.6, artifact installation
CUDA driver 12.2
NVIDIA driver 535.104.12

CUDA libraries: 
- CUBLAS: 12.6.4
- CURAND: 10.3.7
- CUFFT: 11.3.0
- CUSOLVER: 11.7.1
- CUSPARSE: 12.5.4
- CUPTI: 2024.3.2 (API 24.0.0)
- NVML: 12.0.0+535.104.12

Julia packages: 
- CUDA: 5.5.2
- CUDA_Driver_jll: 0.10.4+0
- CUDA_Runtime_jll: 0.15.5+0

Toolchain:
- Julia: 1.11.2
- LLVM: 16.0.6

Environment:
- JULIA_CUDA_MEMORY_POOL: none
- JULIA_CUDA_USE_COMPAT: false

2 devices:
  0: NVIDIA GeForce RTX 4090 (sm_89, 23.643 GiB / 23.988 GiB available)
  1: NVIDIA GeForce RTX 4090 (sm_89, 23.642 GiB / 23.988 GiB available)

The text was updated successfully, but these errors were encountered:

pawel-tarasiuk-quantumz added the bug Something isn't working label Dec 16, 2024

pawel-tarasiuk-quantumz changed the title ~~mapreducedim! size-dependent fail when narrowing float conversions~~ mapreducedim! size-dependent fail when narrowing float element types Dec 16, 2024

maleadt linked a pull request Dec 16, 2024 that will close this issue

mapreduce: avoid deadlock by forcing the accumulator type. #2596

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mapreducedim! size-dependent fail when narrowing float element types #2595

mapreducedim! size-dependent fail when narrowing float element types #2595

pawel-tarasiuk-quantumz commented Dec 16, 2024 •

edited

Loading

mapreducedim! size-dependent fail when narrowing float element types #2595

mapreducedim! size-dependent fail when narrowing float element types #2595

Comments

pawel-tarasiuk-quantumz commented Dec 16, 2024 • edited Loading

pawel-tarasiuk-quantumz commented Dec 16, 2024 •

edited

Loading