Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scalar iteration with cat(SubArray{<:CuArray}) #2078

Open
evelyne-ringoot opened this issue Sep 11, 2023 · 2 comments
Open

Scalar iteration with cat(SubArray{<:CuArray}) #2078

evelyne-ringoot opened this issue Sep 11, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@evelyne-ringoot
Copy link
Contributor

evelyne-ringoot commented Sep 11, 2023

There appears to be a mismatch between behavior of normal matrices and CuMatrices in how views of rows and columns are treated that causes a scalar indexing issue.

In CuMatrices I observe:

julia> x=CUDA.randn(4,4)
4×4 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:

julia> @views x[1:3,1]
3-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:

julia> @views x[1,1:3]
3-element view(::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, 1, 1:3) with eltype Float32:

julia> @views [1;x[1,1:3]]
┌ Warning: Performing scalar indexing on task Task (runnable) @0x0000022d0ded5750.
4-element Vector{Float32}:

julia> @views [1;x[1:3,1]]
4-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:

In normal matrices on the other hand:

julia> x=randn(4,4)
4×4 Matrix{Float64}:

julia> @views x[1:3,1]
3-element view(::Matrix{Float64}, 1:3, 1) with eltype Float64:

julia> @views x[1,1:3]
3-element view(::Matrix{Float64}, 1, 1:3) with eltype Float64:

julia> @views [1;x[1,1:3]]
4-element Vector{Float64}:

julia> @views [1;x[1:3,1]]
4-element Vector{Float64}:

Versioninfo:

julia> CUDA.versioninfo()
CUDA runtime 11.8, artifact installation
CUDA driver 12.0
NVIDIA driver 528.92.0

Libraries: 
- CUBLAS: 11.11.3
- CURAND: 10.3.0
- CUFFT: 10.9.0
- CUSOLVER: 11.4.1
- CUSPARSE: 11.7.5
- CUPTI: 18.0.0
- NVML: 12.0.0+528.92

Toolchain:
- Julia: 1.8.5
- LLVM: 13.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

1 device:
  0: NVIDIA GeForce RTX 3050 Laptop GPU (sm_86, 3.886 GiB / 4.000 GiB available)

julia> versioninfo()
Julia Version 1.8.5
Commit 17cfb8e65e (2023-01-08 06:45 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 16 × AMD Ryzen 7 5800H with Radeon Graphics
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, znver3)
  Threads: 1 on 16 virtual cores
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 

Does it make sense to expand Base.vcat and Base.hcat lines 141-144 in https://github.com/JuliaGPU/GPUArrays.jl/blob/master/src/host/base.jl from AbstractGPUArray to AnyGPUArray @maleadt ?

Seperately, should @views x[1:3,1] return a view instead of CuArray for conformity with Base matrices?

@evelyne-ringoot evelyne-ringoot added the bug Something isn't working label Sep 11, 2023
@maleadt
Copy link
Member

maleadt commented Sep 12, 2023

should https://github.com/views x[1:3,1] return a view instead of CuArray for conformity with Base matrices?

Most views are represented by CuArray objects, so we don't need the SubArray wrapper (which is responsible for the view part of the output).

So I take it that the issue you have is the scalar iteration when invoking cat with a SubArray{<:CuArray}. Extending those vcat and hcat definitions may work, but I don't feel happy extending everything to AnyGPUArray. As I mentioned before, this really needs some support in Base, as we can't make everything ::AnyGPUArray: It'll severely penalize load times, and you'll quickly run into tricky ambiguities. You can try, though.

@maleadt maleadt changed the title Different behavior for views of columns vs rows of a CuMatrix Scalar iteration with cat(SubArray{<:CuArray}) Sep 12, 2023
@evelyne-ringoot
Copy link
Contributor Author

evelyne-ringoot commented Sep 12, 2023

Maybe an intermediate solution where operations are changed from AbstractGpuArray to a union with (Adjoints of) SubArrays of AbstractGpuArrays - would that be more feasible or would it have similar effects on load times?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants