Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rand! with Uniform distribution does not work with CUDA arrays #1815

Closed
gkantsidis opened this issue Jan 3, 2024 · 8 comments · Fixed by #1818
Closed

rand! with Uniform distribution does not work with CUDA arrays #1815

gkantsidis opened this issue Jan 3, 2024 · 8 comments · Fixed by #1818

Comments

@gkantsidis
Copy link

The following code to initialize an existing CUDA matrix results in memory corruption (and crashes):
(Note: run in CUDA's compute sanitizer to catch the bug.)

using CUDA
CUDA.run_compute_sanitizer()

using CUDA
using Distributions
using Random

m = CUDA.rand(1024, 1024)
s = Uniform(-1.0, 1.0)
rand!(CUDA.default_rng(), s, m)

It results in a series of messages that include:

========= Invalid __global__ read of size 1 bytes
=========     at _Z16broadcast_kernel15CuKernelContext13CuDeviceArrayI7Float32Li2ELi1EE11BroadcastedI12CuArrayStyleILi2EE5TupleI5OneToI5Int64ES5_IS6_EE8quantileS4_I12CuRefPointerI7UniformI7Float64EE8ExtrudedIS0_IS1_Li2ELi1EES4_I4BoolS12_ES4_IS6_S6_EEEES6_+0x550 in pointer.jl:119
=========     by thread (99,0,0) in block (0,0,0)
=========     Address 0x7f2a88250976 is out of bounds
=========     and is 139,732,598,458,743 bytes after the nearest allocation at 0x1480000000 of size 4,194,304 bytes
=========     Saved host backtrace up to driver entry point at kernel launch time

The problem seems to come from the following:

In materialize!(dest, bc) at broadcast.jl:910
 910  @inline function materialize!(dest, bc::Broadcasted{<:Any})
>911      return materialize!(combine_styles(dest, bc), dest, bc)
 912  end

About to run: <(Base.Broadcast.combine_styles)(Float32[0.64629114 0.78703153 0.52409637 0.6335989 0.8139833 0.772531...>

(which is called by _rand!(rng::AbstractRNG, d::Uniform, A::AbstractArray{<:Real}) in src/univariate/continuous/uniform.jl:156.

The following implementation (which is functionally equivalent to the existing) of _rand! seems to work ok:

function Distributions._rand!(rng::AbstractRNG, d::Uniform, A::AbstractArray{<:Real})
                rand!(rng, A)
                A .*= d.b - d.a
                A .+= d.a
end

Any ideas of what may be going wrong?

Environment

I have tried mostly in Julia 1.10, but have observed the problem in 1.9 (1.9.4 and, I believe, 1.9.3).
I am using Windows and WSL.
I have the following packages:

  [052768ef] CUDA v5.1.1
  [31c24e10] Distributions v0.25.104

The CUDA environment is:

CUDA runtime 12.3, artifact installation
CUDA driver 12.2
NVIDIA driver 550.9.0

CUDA libraries:
- CUBLAS: 12.3.4
- CURAND: 10.3.4
- CUFFT: 11.0.12
- CUSOLVER: 11.5.4
- CUSPARSE: 12.2.0
- CUPTI: 21.0.0
- NVML: 12.0.0+550.9

Julia packages:
- CUDA: 5.1.1
- CUDA_Driver_jll: 0.7.0+0
- CUDA_Runtime_jll: 0.10.1+0

Toolchain:
- Julia: 1.10.0
- LLVM: 15.0.7

1 device:
  0: NVIDIA GeForce RTX 2080 Ti (sm_75, 8.745 GiB / 11.000 GiB available)
@devmotion
Copy link
Member

I assume it is due to broadcasting over a d::Uniform in

A .= quantile.(d, rand!(rng, A))
. Broadcasting over univariate distributions is supported by
# allow broadcasting over distribution objects
# to be decided: how to handle multivariate/matrixvariate distributions?
Broadcast.broadcastable(d::UnivariateDistribution) = Ref(d)
but the Ref might cause problems with CUDA.

Can you check whether

  • changing
    # allow broadcasting over distribution objects
    # to be decided: how to handle multivariate/matrixvariate distributions?
    Broadcast.broadcastable(d::UnivariateDistribution) = Ref(d)
    to Base.broadcastable(d::UnivariateDistribution) = (d,) fixes the issue?
  • changing
    A .= quantile.(d, rand!(rng, A))
    to A .= Base.Fix1(quantile, d).(rand!(rng, A)) fixes the issue?

@gkantsidis
Copy link
Author

Thanks a lot David.
Yes, both changes you suggested seem to work well.

@devmotion
Copy link
Member

The issue in this PR was fixed by #1818 (by avoiding broadcasting over distributions completely in internal code), but of course users might still run into the same issue with code such as (untested since I didn't have access to a GPU):

julia> using Distributions, CUDA

julia> x = CuArray([0.1, 0.5, 0.9]);

julia> x .= quantile.(Uniform(), x)

To me this seems like a bug in CUDA since broadcastable(x::UnivariateDistribution) = Ref(x) is the officially documented way to ensure that x is treated as a scalar in broadcasting: https://docs.julialang.org/en/v1/manual/interfaces/#man-interfaces-broadcasting Have you encountered this issue before @maleadt? Could it be a problem that Uniform() = Uniform(0.0, 1.0) creates a type and operates on Float64? But in that case I don't understand why x .= Base.Fix1(quantile, Uniform()).(x) would work...

@maleadt
Copy link

maleadt commented Jan 8, 2024

I guess this is FluxML/Zygote.jl#1473, which was fixed (by reverting the change) in JuliaGPU/CUDA.jl@a1a72c7. That's part of the latest CUDA.jl release.

@devmotion
Copy link
Member

@gkantsidis can you confirm that the problem with the MWE in #1815 (comment) was fixed in the latest CUDA release?

@gkantsidis
Copy link
Author

Yes, Distributions v0.25.106 works well with CUDA v5.1.2.

@devmotion
Copy link
Member

And Distributions 0.25.105?

@gkantsidis
Copy link
Author

Indeed, with CUDA v5.1.2, Distributions v0.25.105 works correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants