Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mapreducedim! size-dependent fail when narrowing float element types #2595

Open
pawel-tarasiuk-quantumz opened this issue Dec 16, 2024 · 0 comments · May be fixed by #2596
Open

mapreducedim! size-dependent fail when narrowing float element types #2595

pawel-tarasiuk-quantumz opened this issue Dec 16, 2024 · 0 comments · May be fixed by #2596
Labels
bug Something isn't working

Comments

@pawel-tarasiuk-quantumz
Copy link

pawel-tarasiuk-quantumz commented Dec 16, 2024

Description

Some of the most basic features of CuArrays, including sum! (as in MWE), are broken (for some sizes and types) and make the program hang and keep consuming all the available resources of the current device.

This happens specifically when R is of narrower element type than A in

function GPUArrays.mapreducedim!(f::F, op::OP, R::AnyCuArray{T},
and the last dimension of A is not divisible by 32.

To reproduce

The Minimal Working Example (MWE) for this bug is:

using CUDA
using GPUArrays

# would not fail if sizeof(TR) >= sizeof(TA)
TA = Float64
TR = Float32

# would not fail for N % 32 == 0
N = 1

A = CUDA.ones(TA, N)
R = CUDA.zeros(TR, 1)

# PROGRAM HANGS HERE
# Either line has the same effect:
sum!(R, A)
# GPUArrays.mapreducedim!(identity, +, R, A)

@show all(R .== N)  # should be true

This error occurs for any pair of types that need narrowing (Float64 -> Float32, Float64 -> Float16, Float32 -> Float16). In each case, N divisible by 32 works properly, but any other checked values (including 1, 31, 33, 16, 48, and some bigger values following the pattern) yield the erroneous behavior.

Same thing seems to happen happen with some extra dimensions added to R and A, regardless of their value - i.e. A = CUDA.ones(TA, p, q, r, N), R = CUDA.zeros(TR, p, q, r, 1).

Expected behavior

The computations should finish with correct result in finite time and resources usage. The MWE should show that all(R .== N) = true.

Version info

Details on Julia:

Julia Version 1.11.2
Commit 5e9a32e7af2 (2024-12-01 20:02 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 16 × Intel(R) Core(TM) i9-9960X CPU @ 3.10GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, skylake-avx512)
Threads: 13 default, 0 interactive, 12 GC (on 16 virtual cores)
Environment:
  JULIA_CUDA_MEMORY_POOL = none
  JULIA_CUDA_USE_COMPAT = false
  JULIA_PKG_USE_CLI_GIT = true

Details on CUDA:

CUDA runtime 12.6, artifact installation
CUDA driver 12.2
NVIDIA driver 535.104.12

CUDA libraries: 
- CUBLAS: 12.6.4
- CURAND: 10.3.7
- CUFFT: 11.3.0
- CUSOLVER: 11.7.1
- CUSPARSE: 12.5.4
- CUPTI: 2024.3.2 (API 24.0.0)
- NVML: 12.0.0+535.104.12

Julia packages: 
- CUDA: 5.5.2
- CUDA_Driver_jll: 0.10.4+0
- CUDA_Runtime_jll: 0.15.5+0

Toolchain:
- Julia: 1.11.2
- LLVM: 16.0.6

Environment:
- JULIA_CUDA_MEMORY_POOL: none
- JULIA_CUDA_USE_COMPAT: false

2 devices:
  0: NVIDIA GeForce RTX 4090 (sm_89, 23.643 GiB / 23.988 GiB available)
  1: NVIDIA GeForce RTX 4090 (sm_89, 23.642 GiB / 23.988 GiB available)
@pawel-tarasiuk-quantumz pawel-tarasiuk-quantumz added the bug Something isn't working label Dec 16, 2024
@pawel-tarasiuk-quantumz pawel-tarasiuk-quantumz changed the title mapreducedim! size-dependent fail when narrowing float conversions mapreducedim! size-dependent fail when narrowing float element types Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant