Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"core/device/intrinsics/atomics" failed when the teste suite of the package "CUDA.jl" #2539

Open
Tree-Yang opened this issue Nov 4, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@Tree-Yang
Copy link

Tree-Yang commented Nov 4, 2024

Describe the bug

I am a new comer of Julia, and plan to conduct some research with the help of deep learning with Julia using Lux.jl and LuxCUDA.jl.

Firstly, the Nvidia driver (v560.94), CUDA driver (v12.6) and CUDA runtime (v12.1.66) were installed on the system. Then, the CUDA.jl packaged is installed in Julia.

Before I went to LuxCUDA.jl, I executed the test suite of the package by Pkg.test("CUDA") as indicated in the document of CUDA.jl. Then, long error information were presented in the REPL. Among the information, I found a sentence instructing me to submit this bug report.

The platform is Windows 11 23H2 (22631.4391), with Julia 1.11.0 (2024-10-07). The CPU is a AMD Ryzen 9 7950X 16-Core Processor and the GPU is NVIDIA GeForce RTX 4060 Ti (16 GiB). The error information is given as follows:

                                                  |          | ---------------- GPU ---------------- | ---------------- CPU ---------------- |
Test                                     (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB) | GC (s) | GC % | Alloc (MB) | RSS (MB) |
core/initialization                           (2) |     4.33 |   1.32 | 30.4 |       0.00 |      N/A |   0.00 |  0.0 |      97.62 |  1073.35 |
gpuarrays/random                              (2) |    35.95 |   0.00 |  0.0 |       0.03 |      N/A |   0.31 |  0.9 |    2332.12 |  1437.32 |
gpuarrays/vectors                             (2) |     0.34 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |      27.46 |  1437.32 |
gpuarrays/base                                (8) |    56.37 |   0.01 |  0.0 |       8.90 |      N/A |   1.06 |  1.9 |    5428.06 |  1494.33 |
gpuarrays/reductions/== isequal               (7) |    62.81 |   0.01 |  0.0 |       1.07 |      N/A |   1.15 |  1.8 |    5987.16 |  1557.92 |
gpuarrays/constructors                        (2) |    53.65 |   0.01 |  0.0 |       0.65 |      N/A |   0.44 |  0.8 |    3086.47 |  1686.26 |
gpuarrays/reductions/reduce                   (4) |   121.36 |   0.01 |  0.0 |       1.21 |      N/A |   2.16 |  1.8 |   11221.25 |  1772.12 |
gpuarrays/math/intrinsics                     (4) |     2.61 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |     118.23 |  1806.80 |
gpuarrays/statistics                          (7) |    80.63 |   0.00 |  0.0 |       1.51 |      N/A |   1.10 |  1.4 |    5853.54 |  2546.65 |
gpuarrays/reductions/mapreducedim!            (5) |   146.05 |   0.01 |  0.0 |       1.54 |      N/A |   1.96 |  1.3 |    9308.33 |  1976.00 |
gpuarrays/uniformscaling                      (5) |     9.34 |   0.00 |  0.0 |       0.01 |      N/A |   0.06 |  0.6 |     477.51 |  2173.38 |
gpuarrays/reductions/sum prod                 (3) |   168.37 |   0.02 |  0.0 |       3.24 |      N/A |   2.36 |  1.4 |   12055.64 |  2933.79 |
gpuarrays/reductions/any all count            (3) |    11.14 |   0.00 |  0.0 |       0.00 |      N/A |   0.12 |  1.1 |     976.57 |  3139.08 |
gpuarrays/interface                           (3) |     2.55 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |     177.97 |  3165.24 |
gpuarrays/reductions/mapreduce                (8) |   127.43 |   0.01 |  0.0 |       1.81 |      N/A |   2.15 |  1.7 |   11542.56 |  2237.68 |
gpuarrays/reductions/mapreducedim!_large      (7) |    45.69 |   0.00 |  0.0 |     818.34 |      N/A |   0.94 |  2.1 |    4559.68 |  2986.98 |
gpuarrays/indexing find                       (8) |    17.45 |   0.00 |  0.0 |       0.13 |      N/A |   0.22 |  1.3 |    1616.33 |  2526.84 |
gpuarrays/linalg/mul!/matrix-matrix           (4) |    95.12 |   0.01 |  0.0 |       0.12 |      N/A |   1.33 |  1.4 |    7807.45 |  2791.06 |
gpuarrays/indexing multidimensional           (3) |    48.30 |   0.00 |  0.0 |       2.07 |      N/A |   0.54 |  1.1 |    4017.65 |  3667.15 |
gpuarrays/math/power                          (8) |    34.86 |   0.00 |  0.0 |       0.01 |      N/A |   0.61 |  1.7 |    4260.19 |  2794.88 |
gpuarrays/linalg/mul!/vector-matrix           (7) |    49.25 |   0.00 |  0.0 |       0.02 |      N/A |   0.58 |  1.2 |    4306.55 |  3280.41 |
gpuarrays/broadcasting                        (6) |   242.02 |   0.01 |  0.0 |       2.00 |      N/A |   2.82 |  1.2 |   14503.63 |  2744.07 |
gpuarrays/indexing scalar                     (8) |    10.98 |   0.00 |  0.0 |       0.01 |      N/A |   0.04 |  0.4 |     738.64 |  2974.64 |
gpuarrays/linalg/norm                         (2) |   160.79 |   0.01 |  0.0 |       0.02 |      N/A |   2.73 |  1.7 |   12389.18 |  4814.56 |
      From worker 2:    WARNING: Method definition var"#3764#kernel"(Any) in module Main at C:\Users\admin\.julia\packages\CUDA\2kjXI\test\core\execution.jl:360 overwritten at C:\Users\admin\.julia\packages\CUDA\2kjXI\test\core\execution.jl:368.
core/execution                                (2) |    42.03 |   0.00 |  0.0 |       0.02 |      N/A |   0.33 |  0.8 |    2268.80 |  5234.02 |
gpuarrays/reductions/reducedim!               (3) |    67.76 |   0.00 |  0.0 |       1.03 |      N/A |   0.79 |  1.2 |    3682.17 |  4235.64 |
gpuarrays/reductions/minimum maximum extrema  (5) |   174.75 |   0.01 |  0.0 |       2.19 |      N/A |   3.02 |  1.7 |   13054.22 |  4498.32 |
core/cudadrv                                  (5) |     7.84 |   0.00 |  0.0 |       0.00 |      N/A |   0.05 |  0.6 |     455.91 |  4578.10 |
libraries/cusparse                            (6) |   124.91 |   0.03 |  0.0 |      12.58 |      N/A |   1.74 |  1.4 |    7966.58 |  3373.87 |
gpuarrays/linalg                              (4) |   150.82 |   0.01 |  0.0 |      26.35 |      N/A |   2.27 |  1.5 |    9424.42 |  3881.22 |
base/array                                    (3) |    76.95 |   0.10 |  0.1 |    1271.01 |      N/A |   0.87 |  1.1 |    5957.10 |  5846.94 |
      From worker 3:
      From worker 3:    Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
      From worker 3:    Exception: EXCEPTION_BREAKPOINT at 0x7ffedc5503ec -- _ZNK4llvm15NVPTXAsmPrinter24getPTXFundamentalTypeStrB5cxx11EPNS_4TypeEb at C:\Users\admin\AppData\Local\julias\julia-1.11\bin\libLLVM-16jl.dll (unknown line)
      From worker 3:    in expression starting at C:\Users\admin\.julia\packages\CUDA\2kjXI\test\core\device\intrinsics\atomics.jl:5
      From worker 3:    _ZNK4llvm15NVPTXAsmPrinter24getPTXFundamentalTypeStrB5cxx11EPNS_4TypeEb at C:\Users\admin\AppData\Local\julias\julia-1.11\bin\libLLVM-16jl.dll (unknown line)
      From worker 3:    _ZN4llvm15NVPTXAsmPrinter21emitFunctionParamListEPKNS_8FunctionERNS_11raw_ostreamE at C:\Users\admin\AppData\Local\julias\julia-1.11\bin\libLLVM-16jl.dll (unknown line)
      From worker 3:    _ZN4llvm15NVPTXAsmPrinter22emitFunctionEntryLabelEv at C:\Users\admin\AppData\Local\julias\julia-1.11\bin\libLLVM-16jl.dll (unknown line)
      From worker 3:    _ZN4llvm10AsmPrinter18emitFunctionHeaderEv at C:\Users\admin\AppData\Local\julias\julia-1.11\bin\libLLVM-16jl.dll (unknown line)
      From worker 3:    _ZN4llvm10AsmPrinter16emitFunctionBodyEv at C:\Users\admin\AppData\Local\julias\julia-1.11\bin\libLLVM-16jl.dll (unknown line)
      From worker 3:    _ZN4llvm15NVPTXAsmPrinter20runOnMachineFunctionERNS_15MachineFunctionE at C:\Users\admin\AppData\Local\julias\julia-1.11\bin\libLLVM-16jl.dll (unknown line)
      From worker 3:    _ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE at C:\Users\admin\AppData\Local\julias\julia-1.11\bin\libLLVM-16jl.dll (unknown line)
      From worker 3:    _ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE at C:\Users\admin\AppData\Local\julias\julia-1.11\bin\libLLVM-16jl.dll (unknown line)
      From worker 3:    _ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE at C:\Users\admin\AppData\Local\julias\julia-1.11\bin\libLLVM-16jl.dll (unknown line)
      From worker 3:    _ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE at C:\Users\admin\AppData\Local\julias\julia-1.11\bin\libLLVM-16jl.dll (unknown line)
      From worker 3:    _ZL21LLVMTargetMachineEmitP23LLVMOpaqueTargetMachineP16LLVMOpaqueModuleRN4llvm17raw_pwrite_streamE19LLVMCodeGenFileTypePPc at C:\Users\admin\AppData\Local\julias\julia-1.11\bin\libLLVM-16jl.dll (unknown line)
      From worker 3:    LLVMTargetMachineEmitToMemoryBuffer at C:\Users\admin\AppData\Local\julias\julia-1.11\bin\libLLVM-16jl.dll (unknown line)
      From worker 3:    LLVMTargetMachineEmitToMemoryBuffer at C:\Users\admin\.julia\packages\LLVM\wMjUU\lib\16\libLLVM.jl:11138
      From worker 3:    emit at C:\Users\admin\.julia\packages\LLVM\wMjUU\src\targetmachine.jl:118
      From worker 3:    mcgen at C:\Users\admin\.julia\packages\GPUCompiler\2CW9L\src\mcgen.jl:75
      From worker 3:    mcgen at C:\Users\admin\.julia\packages\CUDA\2kjXI\src\compiler\compilation.jl:127
      From worker 3:    unknown function (ip: 00000293d5f00007)
      From worker 3:    macro expansion at C:\Users\admin\.julia\packages\TimerOutputs\NRdsv\src\TimerOutput.jl:253 [inlined]
      From worker 3:    macro expansion at C:\Users\admin\.julia\packages\GPUCompiler\2CW9L\src\driver.jl:403 [inlined]
      From worker 3:    macro expansion at C:\Users\admin\.julia\packages\TimerOutputs\NRdsv\src\TimerOutput.jl:253 [inlined]
      From worker 3:    macro expansion at C:\Users\admin\.julia\packages\GPUCompiler\2CW9L\src\driver.jl:400 [inlined]
      From worker 3:    #emit_asm#209 at C:\Users\admin\.julia\packages\GPUCompiler\2CW9L\src\utils.jl:108
      From worker 3:    emit_asm at C:\Users\admin\.julia\packages\GPUCompiler\2CW9L\src\utils.jl:106 [inlined]
      From worker 3:    #codegen#184 at C:\Users\admin\.julia\packages\GPUCompiler\2CW9L\src\driver.jl:120
      From worker 3:    codegen at C:\Users\admin\.julia\packages\GPUCompiler\2CW9L\src\driver.jl:82 [inlined]
      From worker 3:    #compile#183 at C:\Users\admin\.julia\packages\GPUCompiler\2CW9L\src\driver.jl:79
      From worker 3:    compile at C:\Users\admin\.julia\packages\GPUCompiler\2CW9L\src\driver.jl:74 [inlined]
      From worker 3:    #1145 at C:\Users\admin\.julia\packages\CUDA\2kjXI\src\compiler\compilation.jl:250 [inlined]
      From worker 3:    #JuliaContext#182 at C:\Users\admin\.julia\packages\GPUCompiler\2CW9L\src\driver.jl:34
      From worker 3:    unknown function (ip: 000002939b0a5aec)
      From worker 3:    JuliaContext at C:\Users\admin\.julia\packages\GPUCompiler\2CW9L\src\driver.jl:25
      From worker 3:    compile at C:\Users\admin\.julia\packages\CUDA\2kjXI\src\compiler\compilation.jl:249
      From worker 3:    actual_compilation at C:\Users\admin\.julia\packages\GPUCompiler\2CW9L\src\execution.jl:237
      From worker 3:    unknown function (ip: 000002939b0a0c49)
      From worker 3:    cached_compilation at C:\Users\admin\.julia\packages\GPUCompiler\2CW9L\src\execution.jl:151
      From worker 3:    macro expansion at C:\Users\admin\.julia\packages\CUDA\2kjXI\src\compiler\execution.jl:380 [inlined]
      From worker 3:    macro expansion at .\lock.jl:273 [inlined]
      From worker 3:    #cufunction#1169 at C:\Users\admin\.julia\packages\CUDA\2kjXI\src\compiler\execution.jl:375
      From worker 3:    cufunction at C:\Users\admin\.julia\packages\CUDA\2kjXI\src\compiler\execution.jl:372
      From worker 3:    unknown function (ip: 00000294259948c9)
      From worker 3:    jl_apply at C:/workdir/src\julia.h:2157 [inlined]
      From worker 3:    do_call at C:/workdir/src\interpreter.c:126
      From worker 3:    eval_value at C:/workdir/src\interpreter.c:223
      From worker 3:    eval_body at C:/workdir/src\interpreter.c:562
      From worker 3:    eval_body at C:/workdir/src\interpreter.c:539
      From worker 3:    eval_body at C:/workdir/src\interpreter.c:539
      From worker 3:    eval_body at C:/workdir/src\interpreter.c:539
      From worker 3:    eval_body at C:/workdir/src\interpreter.c:539
      From worker 3:    eval_body at C:/workdir/src\interpreter.c:539
      From worker 3:    eval_body at C:/workdir/src\interpreter.c:539
      From worker 3:    jl_interpret_toplevel_thunk at C:/workdir/src\interpreter.c:821
      From worker 3:    jl_toplevel_eval_flex at C:/workdir/src\toplevel.c:943
      From worker 3:    jl_toplevel_eval_flex at C:/workdir/src\toplevel.c:886
      From worker 3:    ijl_toplevel_eval at C:/workdir/src\toplevel.c:952 [inlined]
      From worker 3:    ijl_toplevel_eval_in at C:/workdir/src\toplevel.c:994
      From worker 3:    eval at .\boot.jl:430 [inlined]
      From worker 3:    include_string at .\loading.jl:2628
      From worker 3:    _include at .\loading.jl:2688
      From worker 3:    include at .\sysimg.jl:38 [inlined]
      From worker 3:    #11 at C:\Users\admin\.julia\packages\CUDA\2kjXI\test\runtests.jl:87 [inlined]
      From worker 3:    macro expansion at C:\Users\admin\.julia\packages\CUDA\2kjXI\test\setup.jl:63 [inlined]
      From worker 3:    macro expansion at C:\workdir\usr\share\julia\stdlib\v1.11\Test\src\Test.jl:1700 [inlined]
      From worker 3:    macro expansion at C:\Users\admin\.julia\packages\CUDA\2kjXI\test\setup.jl:63 [inlined]
      From worker 3:    macro expansion at C:\Users\admin\.julia\packages\CUDA\2kjXI\src\utilities.jl:35 [inlined]
      From worker 3:    macro expansion at C:\Users\admin\.julia\packages\CUDA\2kjXI\src\memory.jl:829 [inlined]
      From worker 3:    top-level scope at C:\Users\admin\.julia\packages\CUDA\2kjXI\test\setup.jl:62
      From worker 3:    jl_toplevel_eval_flex at C:/workdir/src\toplevel.c:934
      From worker 3:    ijl_toplevel_eval at C:/workdir/src\toplevel.c:952 [inlined]
      From worker 3:    ijl_toplevel_eval_in at C:/workdir/src\toplevel.c:994
      From worker 3:    eval at .\boot.jl:430 [inlined]
      From worker 3:    runtests at C:\Users\admin\.julia\packages\CUDA\2kjXI\test\setup.jl:74
      From worker 3:    jl_apply at C:/workdir/src\julia.h:2157 [inlined]
      From worker 3:    jl_f__call_latest at C:/workdir/src\builtins.c:875
      From worker 3:    jl_apply at C:/workdir/src\julia.h:2157 [inlined]
      From worker 3:    do_apply at C:/workdir/src\builtins.c:831
      From worker 3:    #invokelatest#2 at .\essentials.jl:1054
      From worker 3:    jl_apply at C:/workdir/src\julia.h:2157 [inlined]
      From worker 3:    do_apply at C:/workdir/src\builtins.c:831
      From worker 3:    invokelatest at .\essentials.jl:1051
      From worker 3:    jl_apply at C:/workdir/src\julia.h:2157 [inlined]
      From worker 3:    do_apply at C:/workdir/src\builtins.c:831
      From worker 3:    #110 at C:\workdir\usr\share\julia\stdlib\v1.11\Distributed\src\process_messages.jl:287
      From worker 3:    run_work_thunk at C:\workdir\usr\share\julia\stdlib\v1.11\Distributed\src\process_messages.jl:70
      From worker 3:    #109 at C:\workdir\usr\share\julia\stdlib\v1.11\Distributed\src\process_messages.jl:287
      From worker 3:    unknown function (ip: 00000293ad7b518b)
      From worker 3:    jl_apply at C:/workdir/src\julia.h:2157 [inlined]
      From worker 3:    start_task at C:/workdir/src\task.c:1202
      From worker 3:    Allocations: 512212331 (Pool: 512175810; Big: 36521); GC: 187
core/device/intrinsics/atomics                (3) |         failed at 2024-11-04T16:29:46.881
Worker 3 terminated.
Unhandled Task ERROR: EOFError: read end of file
Stacktrace:
 [1] (::Base.var"#wait_locked#832")(s::Sockets.TCPSocket, buf::IOBuffer, nb::Int64)
   @ Base .\stream.jl:970
 [2] unsafe_read(s::Sockets.TCPSocket, p::Ptr{UInt8}, nb::UInt64)
   @ Base .\stream.jl:978
 [3] unsafe_read
   @ .\io.jl:891 [inlined]
 [4] unsafe_read(s::Sockets.TCPSocket, p::Base.RefValue{NTuple{4, Int64}}, n::Int64)
   @ Base .\io.jl:890
 [5] read!
   @ .\io.jl:895 [inlined]
 [6] deserialize_hdr_raw
   @ C:\Users\admin\AppData\Local\julias\julia-1.11\share\julia\stdlib\v1.11\Distributed\src\messages.jl:167 [inlined]
 [7] message_handler_loop(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool)
   @ Distributed C:\Users\admin\AppData\Local\julias\julia-1.11\share\julia\stdlib\v1.11\Distributed\src\process_messages.jl:172
 [8] process_tcp_streams(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool)
   @ Distributed C:\Users\admin\AppData\Local\julias\julia-1.11\share\julia\stdlib\v1.11\Distributed\src\process_messages.jl:133
 [9] (::Distributed.var"#103#104"{Sockets.TCPSocket, Sockets.TCPSocket, Bool})()
   @ Distributed C:\Users\admin\AppData\Local\julias\julia-1.11\share\julia\stdlib\v1.11\Distributed\src\process_messages.jl:121
libraries/cusolver/dense                      (8) |   159.34 |   0.08 |  0.1 |     262.80 |      N/A |   3.01 |  1.9 |   12823.25 |  4214.03 |
libraries/cusparse/generic                    (5) |    72.26 |   0.05 |  0.1 |       5.69 |      N/A |   0.85 |  1.2 |    5195.97 |  4935.46 |
libraries/cublas                              (7) |         failed at 2024-11-04T16:30:14.729
libraries/cusparse/conversions                (8) |    18.16 |   0.01 |  0.0 |       1.69 |      N/A |   0.27 |  1.5 |    1812.94 |  4320.41 |
core/device/intrinsics                       (10) |    34.20 |   0.00 |  0.0 |       0.00 |      N/A |   0.24 |  0.7 |    1800.91 |  1322.61 |
core/device/intrinsics/cooperative_groups     (5) |    51.16 |   0.00 |  0.0 |      19.36 |      N/A |   0.29 |  0.6 |    1974.93 |  6827.81 |
base/sorting                                  (6) |    95.61 |   0.01 |  0.0 |     668.44 |      N/A |   3.63 |  3.8 |   13333.46 |  6428.61 |
base/texture                                  (8) |    35.47 |   0.00 |  0.0 |       0.09 |      N/A |   0.47 |  1.3 |    2854.34 |  4607.99 |
core/device/intrinsics/wmma                   (4) |    94.02 |   0.01 |  0.0 |       0.63 |      N/A |   0.73 |  0.8 |    4484.64 |  5136.11 |
libraries/cusparse/interfaces                 (2) |   169.39 |   0.14 |  0.1 |      41.73 |      N/A |   2.12 |  1.3 |   10002.96 |  5989.71 |
core/device/array                             (8) |     4.28 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |     275.82 |  4645.19 |
core/codegen                                  (2) |     4.58 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |     151.96 |  6127.87 |
core/device/intrinsics/memory                 (4) |     8.64 |   0.00 |  0.0 |       0.02 |      N/A |   0.00 |  0.0 |     428.73 |  5360.70 |
libraries/cusolver/dense_generic              (8) |    14.03 |   0.00 |  0.0 |       0.24 |      N/A |   0.09 |  0.6 |     872.69 |  4945.31 |
core/device/intrinsics/output                 (4) |    12.67 |   0.00 |  0.0 |       0.00 |      N/A |   0.06 |  0.5 |     771.77 |  5610.56 |
libraries/cusolver/sparse                    (10) |    25.98 |   0.00 |  0.0 |       0.22 |      N/A |   0.52 |  2.0 |    2275.71 |  1562.21 |
base/random                                   (6) |    29.95 |   0.00 |  0.0 |     256.59 |      N/A |   0.20 |  0.7 |    1752.58 |  6428.61 |
core/pointer                                  (6) |     0.30 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |       9.08 |  6428.61 |
libraries/cusparse/bmm                        (5) |    33.06 |   0.01 |  0.0 |       0.90 |      N/A |   0.75 |  2.3 |    4377.31 |  7180.67 |
core/device/ldg                              (10) |     7.83 |   0.00 |  0.0 |       0.00 |      N/A |   0.06 |  0.8 |     550.05 |  1664.46 |
core/nvml                                     (5) |     0.86 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |      55.99 |  7180.67 |
core/device/random                            (8) |    19.42 |   0.00 |  0.0 |       0.17 |      N/A |   0.06 |  0.3 |     875.75 |  5237.04 |
libraries/cusolver/multigpu                   (4) |    19.63 |   0.00 |  0.0 |     545.60 |      N/A |   0.17 |  0.9 |    1403.80 |  5680.43 |
base/broadcast                                (6) |    13.71 |   0.06 |  0.4 |       0.00 |      N/A |   0.06 |  0.5 |     913.70 |  6428.61 |
base/iterator                                 (6) |     2.70 |   0.00 |  0.0 |       1.93 |      N/A |   0.00 |  0.0 |     218.71 |  6428.61 |
core/device/intrinsics/math                   (2) |    39.40 |   0.00 |  0.0 |       0.00 |      N/A |   0.33 |  0.8 |    2177.75 |  7337.01 |
core/utils                                    (2) |     0.89 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |      59.30 |  7337.01 |
base/threading                                (6) |     2.00 |   0.00 |  0.1 |      10.94 |      N/A |   0.00 |  0.0 |     148.84 |  6428.61 |
libraries/cusparse/device                     (2) |     0.13 |   0.00 |  0.1 |       0.01 |      N/A |   0.00 |  0.0 |       4.52 |  7337.01 |
libraries/staticarrays                        (6) |     1.17 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |     193.19 |  6428.61 |
libraries/cusolver/sparse_factorizations      (8) |    15.84 |   0.00 |  0.0 |       3.73 |      N/A |   0.24 |  1.5 |    1931.48 |  5406.59 |
core/pool                                     (6) |     3.80 |   0.00 |  0.0 |       0.00 |      N/A |   0.88 | 23.2 |     371.56 |  6428.61 |
core/apiutils                                 (6) |     0.15 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |       1.12 |  6428.61 |
libraries/cufft                               (9) |   128.90 |   0.01 |  0.0 |     197.64 |      N/A |   1.73 |  1.3 |    7504.18 |  1827.29 |
libraries/cusparse/reduce                     (9) |     0.01 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |       0.29 |  1841.33 |
base/examples                                 (6) |     7.07 |   5.70 | 80.6 |       0.00 |      N/A |   0.07 |  1.0 |    1333.85 |  6428.61 |
libraries/curand                              (6) |     0.06 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |       3.54 |  6428.61 |
libraries/cusparse/broadcast                  (2) |    29.55 |   0.00 |  0.0 |       0.05 |      N/A |   0.31 |  1.0 |    2271.44 |  7624.40 |
libraries/cusparse/linalg                     (5) |    50.31 |   0.01 |  0.0 |     682.93 |      N/A |   2.67 |  5.3 |   14034.14 | 12997.38 |
base/linalg                                   (8) |    32.31 |   0.00 |  0.0 |    1547.52 |      N/A |   3.34 | 10.3 |    6091.42 |  7156.66 |
base/kernelabstractions                       (9) |    50.58 |   0.00 |  0.0 |      71.04 |      N/A |   0.96 |  1.9 |    4178.91 |  2461.08 |
base/exceptions                              (10) |   191.91 |   0.28 |  0.1 |       0.00 |      N/A |   0.00 |  0.0 |      12.02 |  1665.53 |
core/profile                                  (4) |   286.90 |   0.00 |  0.0 |       0.00 |      N/A |   1.58 |  0.5 |    8662.59 |  5822.62 |
Testing finished in 13 minutes, 21 seconds, 949 milliseconds
core/device/intrinsics/atomics: Error During Test at none:1
  Got exception outside of a @test
  ProcessExitedException(3)
Worker 7 failed running test libraries/cublas:
Some tests did not pass: 3228 passed, 0 failed, 1 errored, 0 broken.
libraries/cublas: Error During Test at C:\Users\admin\.julia\packages\CUDA\2kjXI\test\libraries\cublas.jl:1794
  Got exception outside of a @test
  LLVM error: Cannot select: 0x102f5c58760: v8bf16 = X86ISD::VFPROUND 0x102f5c4e5f0, C:\Users\admin\.julia\packages\BFloat16s\u3WQc\src\bfloat16.jl:158 @[ broadcast.jl:673 @[ broadcast.jl:646 @[ broadcast.jl:605 @[ broadcast.jl:968 @[ simdloop.jl:77 @[ broadcast.jl:967 @[ broadcast.jl:920 @[ broadcast.jl:892 @[ broadcast.jl:867 ] ] ] ] ] ] ] ] ]
    0x102f5c4e5f0: v8f64,ch = load<(load (s512) from %ir.uglygep212, align 8, !tbaa !194, !alias.scope !196, !noalias !81)> 0x102cbe96150, 0x102f5c523a0, undef:i64, essentials.jl:916 @[ array.jl:919 @[ multidimensional.jl:702 @[ broadcast.jl:639 @[ broadcast.jl:670 @[ broadcast.jl:645 @[ broadcast.jl:605 @[ broadcast.jl:968 @[ simdloop.jl:77 @[ broadcast.jl:967 @[ broadcast.jl:920 @[ broadcast.jl:892 @[ broadcast.jl:867 ] ] ] ] ] ] ] ] ] ] ] ]
      0x102f5c523a0: i64 = add 0x102f5c58610, Constant:i64<-192>, essentials.jl:916 @[ array.jl:919 @[ multidimensional.jl:702 @[ broadcast.jl:639 @[ broadcast.jl:670 @[ broadcast.jl:645 @[ broadcast.jl:605 @[ broadcast.jl:968 @[ simdloop.jl:77 @[ broadcast.jl:967 @[ broadcast.jl:920 @[ broadcast.jl:892 @[ broadcast.jl:867 ] ] ] ] ] ] ] ] ] ] ] ]
        0x102f5c58610: i64 = add 0x102f5c4da90, 0x102f5c58680, essentials.jl:916 @[ array.jl:919 @[ multidimensional.jl:702 @[ broadcast.jl:639 @[ broadcast.jl:670 @[ broadcast.jl:645 @[ broadcast.jl:605 @[ broadcast.jl:968 @[ simdloop.jl:77 @[ broadcast.jl:967 @[ broadcast.jl:920 @[ broadcast.jl:892 @[ broadcast.jl:867 ] ] ] ] ] ] ] ] ] ] ] ]
          0x102f5c4da90: i64,ch = CopyFromReg 0x102cbe96150, Register:i64 %53, essentials.jl:916 @[ array.jl:919 @[ multidimensional.jl:702 @[ broadcast.jl:639 @[ broadcast.jl:670 @[ broadcast.jl:645 @[ broadcast.jl:605 @[ broadcast.jl:968 @[ simdloop.jl:77 @[ broadcast.jl:967 @[ broadcast.jl:920 @[ broadcast.jl:892 @[ broadcast.jl:867 ] ] ] ] ] ] ] ] ] ] ] ]
            0x102f5c51d80: i64 = Register %53
          0x102f5c58680: i64 = shl 0x102f5c58530, Constant:i8<3>, essentials.jl:916 @[ array.jl:919 @[ multidimensional.jl:702 @[ broadcast.jl:639 @[ broadcast.jl:670 @[ broadcast.jl:645 @[ broadcast.jl:605 @[ broadcast.jl:968 @[ simdloop.jl:77 @[ broadcast.jl:967 @[ broadcast.jl:920 @[ broadcast.jl:892 @[ broadcast.jl:867 ] ] ] ] ] ] ] ] ] ] ] ]
            0x102f5c58530: i64,ch = CopyFromReg 0x102cbe96150, Register:i64 %54, essentials.jl:916 @[ array.jl:919 @[ multidimensional.jl:702 @[ broadcast.jl:639 @[ broadcast.jl:670 @[ broadcast.jl:645 @[ broadcast.jl:605 @[ broadcast.jl:968 @[ simdloop.jl:77 @[ broadcast.jl:967 @[ broadcast.jl:920 @[ broadcast.jl:892 @[ broadcast.jl:867 ] ] ] ] ] ] ] ] ] ] ] ]
              0x102f5c4e580: i64 = Register %54
            0x102f5c52480: i8 = Constant<3>
        0x102f5c58ed0: i64 = Constant<-192>
      0x102f5c59100: i64 = undef
  In function: julia_materialize_244170
  Stacktrace:
    [1] handle_error(reason::Cstring)
      @ LLVM C:\Users\admin\.julia\packages\LLVM\wMjUU\src\core\context.jl:194
    [2] macro expansion
      @ C:\Users\admin\.julia\packages\CUDA\2kjXI\test\libraries\cublas.jl:1806 [inlined]
    [3] macro expansion
      @ C:\Users\admin\AppData\Local\julias\julia-1.11\share\julia\stdlib\v1.11\Test\src\Test.jl:1700 [inlined]
    [4] macro expansion
      @ C:\Users\admin\.julia\packages\CUDA\2kjXI\test\libraries\cublas.jl:1795 [inlined]
    [5] macro expansion
      @ C:\Users\admin\AppData\Local\julias\julia-1.11\share\julia\stdlib\v1.11\Test\src\Test.jl:1700 [inlined]
    [6] top-level scope
      @ C:\Users\admin\.julia\packages\CUDA\2kjXI\test\libraries\cublas.jl:669
    [7] include
      @ .\sysimg.jl:38 [inlined]
    [8] #11
      @ C:\Users\admin\.julia\packages\CUDA\2kjXI\test\runtests.jl:87 [inlined]
    [9] macro expansion
      @ C:\Users\admin\.julia\packages\CUDA\2kjXI\test\setup.jl:63 [inlined]
   [10] macro expansion
      @ C:\Users\admin\AppData\Local\julias\julia-1.11\share\julia\stdlib\v1.11\Test\src\Test.jl:1700 [inlined]
   [11] macro expansion
      @ C:\Users\admin\.julia\packages\CUDA\2kjXI\test\setup.jl:63 [inlined]
   [12] macro expansion
      @ C:\Users\admin\.julia\packages\CUDA\2kjXI\src\utilities.jl:35 [inlined]
   [13] macro expansion
      @ C:\Users\admin\.julia\packages\CUDA\2kjXI\src\memory.jl:829 [inlined]
   [14] top-level scope
      @ C:\Users\admin\.julia\packages\CUDA\2kjXI\test\setup.jl:62
   [15] eval
      @ .\boot.jl:430 [inlined]
   [16] runtests(f::Function, name::String, time_source::Symbol)
      @ Main C:\Users\admin\.julia\packages\CUDA\2kjXI\test\setup.jl:74
   [17] invokelatest(::Any, ::Any, ::Vararg{Any}; kwargs::@Kwargs{})
      @ Base .\essentials.jl:1054
   [18] invokelatest(::Any, ::Any, ::Vararg{Any})
      @ Base .\essentials.jl:1051
   [19] (::Distributed.var"#110#112"{Distributed.CallMsg{:call_fetch}})()
      @ Distributed C:\Users\admin\AppData\Local\julias\julia-1.11\share\julia\stdlib\v1.11\Distributed\src\process_messages.jl:287
   [20] run_work_thunk(thunk::Distributed.var"#110#112"{Distributed.CallMsg{:call_fetch}}, print_error::Bool)
      @ Distributed C:\Users\admin\AppData\Local\julias\julia-1.11\share\julia\stdlib\v1.11\Distributed\src\process_messages.jl:70
   [21] (::Distributed.var"#109#111"{Distributed.CallMsg{:call_fetch}, Distributed.MsgHeader, Sockets.TCPSocket})()
      @ Distributed C:\Users\admin\AppData\Local\julias\julia-1.11\share\julia\stdlib\v1.11\Distributed\src\process_messages.jl:287

Test Summary:                                  |  Pass  Error  Broken  Total  Time
  Overall                                      | 25094      2      12  25108
    core/initialization                        |    34                    34
    gpuarrays/random                           |    64                    64
    gpuarrays/vectors                          |    10                    10
    gpuarrays/base                             |    96                    96
    gpuarrays/reductions/== isequal            |   312                   312
    gpuarrays/constructors                     |   966                   966
    gpuarrays/reductions/reduce                |   264                   264
    gpuarrays/math/intrinsics                  |    12                    12
    gpuarrays/statistics                       |    84                    84
    gpuarrays/reductions/mapreducedim!         |   312                   312
    gpuarrays/uniformscaling                   |    56                    56
    gpuarrays/reductions/sum prod              |   862                   862
    gpuarrays/reductions/any all count         |   101                   101
    gpuarrays/interface                        |     7                     7
    gpuarrays/reductions/mapreduce             |   396                   396
    gpuarrays/reductions/mapreducedim!_large   |    50                    50
    gpuarrays/indexing find                    |    45                    45
    gpuarrays/linalg/mul!/matrix-matrix        |   432                   432
    gpuarrays/indexing multidimensional        |   101                   101
    gpuarrays/math/power                       |    72                    72
    gpuarrays/linalg/mul!/vector-matrix        |   168                   168
    gpuarrays/broadcasting                     |   364                   364
    gpuarrays/indexing scalar                  |   477                   477
    gpuarrays/linalg/norm                      |   696                   696
    core/execution                             |    86                    86
    gpuarrays/reductions/reducedim!            |   192                   192
    gpuarrays/reductions/minimum maximum extrema |   666                   666
    core/cudadrv                               |   157              3    160
    libraries/cusparse                         |   871                   871
    gpuarrays/linalg                           |   443                   443
    base/array                                 |   399                   399
    core/device/intrinsics/atomics             |            1              1
    libraries/cusolver/dense                   |  3948                  3948
    libraries/cusparse/generic                 |  1300                  1300
    libraries/cublas                           |  3228      1           3229
    libraries/cusparse/conversions             |   136                   136
    core/device/intrinsics                     |    38                    38
    core/device/intrinsics/cooperative_groups  |   515                   515
    base/sorting                               |   276                   276
    base/texture                               |    38              4     42
    core/device/intrinsics/wmma                |   446                   446
    libraries/cusparse/interfaces              |  2136                  2136
    core/device/array                          |    20                    20
    core/codegen                               |    17                    17
    core/device/intrinsics/memory              |    16                    16
    libraries/cusolver/dense_generic           |   108                   108
    core/device/intrinsics/output              |    41                    41
    libraries/cusolver/sparse                  |   112                   112
    base/random                                |   236                   236
    core/pointer                               |    35                    35
    libraries/cusparse/bmm                     |    40                    40
    core/device/ldg                            |    41                    41
    core/nvml                                  |    27              1     28
    core/device/random                         |   156                   156
    libraries/cusolver/multigpu                |    30                    30
    base/broadcast                             |    32                    32
    base/iterator                              |    45                    45
    core/device/intrinsics/math                |   112                   112
    core/utils                                 |    52                    52
    base/threading                             |                           0
    libraries/cusparse/device                  |    10                    10
    libraries/staticarrays                     |     1                     1
    libraries/cusolver/sparse_factorizations   |    36                    36
    core/pool                                  |    10                    10
    core/apiutils                              |     6                     6
    libraries/cufft                            |   368                   368
    libraries/cusparse/reduce                  |                           0
    base/examples                              |     5                     5
    libraries/curand                           |     1                     1
    libraries/cusparse/broadcast               |    65                    65
    libraries/cusparse/linalg                  |    94                    94
    base/linalg                                |    39                    39
    base/kernelabstractions                    |  2441              4   2445
    base/exceptions                            |    21                    21
    core/profile                               |    21                    21
    FAILURE

Error in testset core/device/intrinsics/atomics:
Error During Test at none:1
  Got exception outside of a @test
  ProcessExitedException(3)
Error in testset libraries/cublas:
Error During Test at C:\Users\admin\.julia\packages\CUDA\2kjXI\test\libraries\cublas.jl:1794
  Got exception outside of a @test
  LLVM error: Cannot select: 0x102f5c58760: v8bf16 = X86ISD::VFPROUND 0x102f5c4e5f0, C:\Users\admin\.julia\packages\BFloat16s\u3WQc\src\bfloat16.jl:158 @[ broadcast.jl:673 @[ broadcast.jl:646 @[ broadcast.jl:605 @[ broadcast.jl:968 @[ simdloop.jl:77 @[ broadcast.jl:967 @[ broadcast.jl:920 @[ broadcast.jl:892 @[ broadcast.jl:867 ] ] ] ] ] ] ] ] ]
    0x102f5c4e5f0: v8f64,ch = load<(load (s512) from %ir.uglygep212, align 8, !tbaa !194, !alias.scope !196, !noalias !81)> 0x102cbe96150, 0x102f5c523a0, undef:i64, essentials.jl:916 @[ array.jl:919 @[ multidimensional.jl:702 @[ broadcast.jl:639 @[ broadcast.jl:670 @[ broadcast.jl:645 @[ broadcast.jl:605 @[ broadcast.jl:968 @[ simdloop.jl:77 @[ broadcast.jl:967 @[ broadcast.jl:920 @[ broadcast.jl:892 @[ broadcast.jl:867 ] ] ] ] ] ] ] ] ] ] ] ]
      0x102f5c523a0: i64 = add 0x102f5c58610, Constant:i64<-192>, essentials.jl:916 @[ array.jl:919 @[ multidimensional.jl:702 @[ broadcast.jl:639 @[ broadcast.jl:670 @[ broadcast.jl:645 @[ broadcast.jl:605 @[ broadcast.jl:968 @[ simdloop.jl:77 @[ broadcast.jl:967 @[ broadcast.jl:920 @[ broadcast.jl:892 @[ broadcast.jl:867 ] ] ] ] ] ] ] ] ] ] ] ]
        0x102f5c58610: i64 = add 0x102f5c4da90, 0x102f5c58680, essentials.jl:916 @[ array.jl:919 @[ multidimensional.jl:702 @[ broadcast.jl:639 @[ broadcast.jl:670 @[ broadcast.jl:645 @[ broadcast.jl:605 @[ broadcast.jl:968 @[ simdloop.jl:77 @[ broadcast.jl:967 @[ broadcast.jl:920 @[ broadcast.jl:892 @[ broadcast.jl:867 ] ] ] ] ] ] ] ] ] ] ] ]
          0x102f5c4da90: i64,ch = CopyFromReg 0x102cbe96150, Register:i64 %53, essentials.jl:916 @[ array.jl:919 @[ multidimensional.jl:702 @[ broadcast.jl:639 @[ broadcast.jl:670 @[ broadcast.jl:645 @[ broadcast.jl:605 @[ broadcast.jl:968 @[ simdloop.jl:77 @[ broadcast.jl:967 @[ broadcast.jl:920 @[ broadcast.jl:892 @[ broadcast.jl:867 ] ] ] ] ] ] ] ] ] ] ] ]
            0x102f5c51d80: i64 = Register %53
          0x102f5c58680: i64 = shl 0x102f5c58530, Constant:i8<3>, essentials.jl:916 @[ array.jl:919 @[ multidimensional.jl:702 @[ broadcast.jl:639 @[ broadcast.jl:670 @[ broadcast.jl:645 @[ broadcast.jl:605 @[ broadcast.jl:968 @[ simdloop.jl:77 @[ broadcast.jl:967 @[ broadcast.jl:920 @[ broadcast.jl:892 @[ broadcast.jl:867 ] ] ] ] ] ] ] ] ] ] ] ]
            0x102f5c58530: i64,ch = CopyFromReg 0x102cbe96150, Register:i64 %54, essentials.jl:916 @[ array.jl:919 @[ multidimensional.jl:702 @[ broadcast.jl:639 @[ broadcast.jl:670 @[ broadcast.jl:645 @[ broadcast.jl:605 @[ broadcast.jl:968 @[ simdloop.jl:77 @[ broadcast.jl:967 @[ broadcast.jl:920 @[ broadcast.jl:892 @[ broadcast.jl:867 ] ] ] ] ] ] ] ] ] ] ] ]
              0x102f5c4e580: i64 = Register %54
            0x102f5c52480: i8 = Constant<3>
        0x102f5c58ed0: i64 = Constant<-192>
      0x102f5c59100: i64 = undef
  In function: julia_materialize_244170
  Stacktrace:
    [1] handle_error(reason::Cstring)
      @ LLVM C:\Users\admin\.julia\packages\LLVM\wMjUU\src\core\context.jl:194
    [2] macro expansion
      @ C:\Users\admin\.julia\packages\CUDA\2kjXI\test\libraries\cublas.jl:1806 [inlined]
    [3] macro expansion
      @ C:\Users\admin\AppData\Local\julias\julia-1.11\share\julia\stdlib\v1.11\Test\src\Test.jl:1700 [inlined]
    [4] macro expansion
      @ C:\Users\admin\.julia\packages\CUDA\2kjXI\test\libraries\cublas.jl:1795 [inlined]
    [5] macro expansion
      @ C:\Users\admin\AppData\Local\julias\julia-1.11\share\julia\stdlib\v1.11\Test\src\Test.jl:1700 [inlined]
    [6] top-level scope
      @ C:\Users\admin\.julia\packages\CUDA\2kjXI\test\libraries\cublas.jl:669
    [7] include
      @ .\sysimg.jl:38 [inlined]
    [8] #11
      @ C:\Users\admin\.julia\packages\CUDA\2kjXI\test\runtests.jl:87 [inlined]
    [9] macro expansion
      @ C:\Users\admin\.julia\packages\CUDA\2kjXI\test\setup.jl:63 [inlined]
   [10] macro expansion
      @ C:\Users\admin\AppData\Local\julias\julia-1.11\share\julia\stdlib\v1.11\Test\src\Test.jl:1700 [inlined]
   [11] macro expansion
      @ C:\Users\admin\.julia\packages\CUDA\2kjXI\test\setup.jl:63 [inlined]
   [12] macro expansion
      @ C:\Users\admin\.julia\packages\CUDA\2kjXI\src\utilities.jl:35 [inlined]
   [13] macro expansion
      @ C:\Users\admin\.julia\packages\CUDA\2kjXI\src\memory.jl:829 [inlined]
   [14] top-level scope
      @ C:\Users\admin\.julia\packages\CUDA\2kjXI\test\setup.jl:62
   [15] eval
      @ .\boot.jl:430 [inlined]
   [16] runtests(f::Function, name::String, time_source::Symbol)
      @ Main C:\Users\admin\.julia\packages\CUDA\2kjXI\test\setup.jl:74
   [17] invokelatest(::Any, ::Any, ::Vararg{Any}; kwargs::@Kwargs{})
      @ Base .\essentials.jl:1054
   [18] invokelatest(::Any, ::Any, ::Vararg{Any})
      @ Base .\essentials.jl:1051
   [19] (::Distributed.var"#110#112"{Distributed.CallMsg{:call_fetch}})()
      @ Distributed C:\Users\admin\AppData\Local\julias\julia-1.11\share\julia\stdlib\v1.11\Distributed\src\process_messages.jl:287
   [20] run_work_thunk(thunk::Distributed.var"#110#112"{Distributed.CallMsg{:call_fetch}}, print_error::Bool)
      @ Distributed C:\Users\admin\AppData\Local\julias\julia-1.11\share\julia\stdlib\v1.11\Distributed\src\process_messages.jl:70
   [21] (::Distributed.var"#109#111"{Distributed.CallMsg{:call_fetch}, Distributed.MsgHeader, Sockets.TCPSocket})()
      @ Distributed C:\Users\admin\AppData\Local\julias\julia-1.11\share\julia\stdlib\v1.11\Distributed\src\process_messages.jl:287
ERROR: LoadError: Test run finished with errors
in expression starting at C:\Users\admin\.julia\packages\CUDA\2kjXI\test\runtests.jl:501
ERROR: Package CUDA errored during testing
Stacktrace:
 [1] pkgerror(msg::String)
   @ Pkg.Types C:\Users\admin\AppData\Local\julias\julia-1.11\share\julia\stdlib\v1.11\Pkg\src\Types.jl:68
 [2] test(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; coverage::Bool, julia_args::Cmd, test_args::Cmd, test_fn::Nothing, force_latest_compatible_version::Bool, allow_earlier_backwards_compatible_versions::Bool, allow_reresolve::Bool)
   @ Pkg.Operations C:\Users\admin\AppData\Local\julias\julia-1.11\share\julia\stdlib\v1.11\Pkg\src\Operations.jl:2102
 [3] test
   @ C:\Users\admin\AppData\Local\julias\julia-1.11\share\julia\stdlib\v1.11\Pkg\src\Operations.jl:1987 [inlined]
 [4] test(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; coverage::Bool, test_fn::Nothing, julia_args::Cmd, test_args::Cmd, force_latest_compatible_version::Bool, allow_earlier_backwards_compatible_versions::Bool, allow_reresolve::Bool, kwargs::@Kwargs{io::IOContext{IO}})
   @ Pkg.API C:\Users\admin\AppData\Local\julias\julia-1.11\share\julia\stdlib\v1.11\Pkg\src\API.jl:475
 [5] test(pkgs::Vector{Pkg.Types.PackageSpec}; io::IOContext{IO}, kwargs::@Kwargs{})
   @ Pkg.API C:\Users\admin\AppData\Local\julias\julia-1.11\share\julia\stdlib\v1.11\Pkg\src\API.jl:159
 [6] test(pkgs::Vector{Pkg.Types.PackageSpec})
   @ Pkg.API C:\Users\admin\AppData\Local\julias\julia-1.11\share\julia\stdlib\v1.11\Pkg\src\API.jl:148
 [7] test
   @ C:\Users\admin\AppData\Local\julias\julia-1.11\share\julia\stdlib\v1.11\Pkg\src\API.jl:147 [inlined]
 [8] test(pkg::String)
   @ Pkg.API C:\Users\admin\AppData\Local\julias\julia-1.11\share\julia\stdlib\v1.11\Pkg\src\API.jl:146
 [9] top-level scope
   @ REPL[5]:1

To reproduce

The Minimal Working Example (MWE) for this bug:

using Pkg
using CUDA
Pkg.test("CUDA")
Manifest.toml

...
[[deps.CUDA]]
deps = ["AbstractFFTs", "Adapt", "BFloat16s", "CEnum", "CUDA_Driver_jll", "CUDA_Runtime_Discovery", "CUDA_Runtime_jll", "Crayons", "DataFrames", "ExprTools", "GPUArrays", "GPUCompiler", "KernelAbstractions", "LLVM", "LLVMLoopInfo", "LazyArtifacts", "Libdl", "LinearAlgebra", "Logging", "NVTX", "Preferences", "PrettyTables", "Printf", "Random", "Random123", "RandomNumbers", "Reexport", "Requires", "SparseArrays", "StaticArrays", "Statistics", "demumble_jll"]
git-tree-sha1 = "e0725a467822697171af4dae15cec10b4fc19053"
uuid = "052768ef-5323-5732-b1bb-66c8b64840ba"
version = "5.5.2"
weakdeps = ["ChainRulesCore", "EnzymeCore", "SpecialFunctions"]

    [deps.CUDA.extensions]
    ChainRulesCoreExt = "ChainRulesCore"
    EnzymeCoreExt = "EnzymeCore"
    SpecialFunctionsExt = "SpecialFunctions"

[[deps.CUDA_Driver_jll]]
deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"]
git-tree-sha1 = "ccd1e54610c222fadfd4737dac66bff786f63656"
uuid = "4ee394cb-3365-5eb0-8335-949819d2adfc"
version = "0.10.3+0"

[[deps.CUDA_Runtime_Discovery]]
deps = ["Libdl"]
git-tree-sha1 = "33576c7c1b2500f8e7e6baa082e04563203b3a45"
uuid = "1af6417a-86b4-443c-805f-a4643ffb695f"
version = "0.3.5"

[[deps.CUDA_Runtime_jll]]
deps = ["Artifacts", "CUDA_Driver_jll", "JLLWrappers", "LazyArtifacts", "Libdl", "TOML"]
git-tree-sha1 = "e43727b237b2879a34391eeb81887699a26f8f2f"
uuid = "76a88914-d11a-5bdc-97e0-2f5a05c973a2"
version = "0.15.3+0"

[[deps.CUDNN_jll]]
deps = ["Artifacts", "CUDA_Runtime_jll", "JLLWrappers", "LazyArtifacts", "Libdl", "TOML"]
git-tree-sha1 = "9851af16a2f357a793daa0f13634c82bc7e40419"
uuid = "62b44479-cb7b-5706-934f-f13b2eb2e645"
version = "9.4.0+0"
...
[[deps.GPUArrays]]
deps = ["Adapt", "GPUArraysCore", "LLVM", "LinearAlgebra", "Printf", "Random", "Reexport", "Serialization", "Statistics"]
git-tree-sha1 = "62ee71528cca49be797076a76bdc654a170a523e"
uuid = "0c68f7d7-f131-5f86-a1c3-88cf8149b2d7"
version = "10.3.1"
...
[[deps.GPUCompiler]]
deps = ["ExprTools", "InteractiveUtils", "LLVM", "Libdl", "Logging", "PrecompileTools", "Preferences", "Scratch", "Serialization", "TOML", "TimerOutputs", "UUIDs"]
git-tree-sha1 = "1d6f290a5eb1201cd63574fbc4440c788d5cb38f"
uuid = "61eb1bfa-7361-4325-ad38-22787b887f55"
version = "0.27.8"
...
[[deps.LLVM]]
deps = ["CEnum", "LLVMExtra_jll", "Libdl", "Preferences", "Printf", "Unicode"]
git-tree-sha1 = "d422dfd9707bec6617335dc2ea3c5172a87d5908"
uuid = "929cbde3-209d-540e-8aea-75f648917ca0"
version = "9.1.3"
weakdeps = ["BFloat16s"]

    [deps.LLVM.extensions]
    BFloat16sExt = "BFloat16s"
...
[[deps.cuDNN]]
deps = ["CEnum", "CUDA", "CUDA_Runtime_Discovery", "CUDNN_jll"]
git-tree-sha1 = "4b3ac62501ca73263eaa0d034c772f13c647fba6"
uuid = "02a925ec-e4fe-4b08-9a7e-0d78e3d38ccd"
version = "1.4.0"
...

Expected behavior

It is expected that the test can be passed without errors.

Version info

Details on Julia:

versioninfo()

Julia Version 1.11.0
Commit 501a4f25c2 (2024-10-07 11:40 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 32 × AMD Ryzen 9 7950X 16-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver4)
Threads: 1 default, 0 interactive, 1 GC (on 32 virtual cores)
Environment:
  JULIA_PKG_SERVER = https://mirrors.cernet.edu.cn/julia/

Details on CUDA:

CUDA.versioninfo()

CUDA runtime 12.1, artifact installation
CUDA driver 12.6
NVIDIA driver 560.94.0

CUDA libraries:
- CUBLAS: 12.1.3
- CURAND: 10.3.2
- CUFFT: 11.0.2
- CUSOLVER: 11.4.5
- CUSPARSE: 12.1.0
- CUPTI: 2023.1.1 (API 18.0.0)
- NVML: 12.0.0+560.94

Julia packages:
- CUDA: 5.5.2
- CUDA_Driver_jll: 0.10.3+0
- CUDA_Runtime_jll: 0.15.3+0

Toolchain:
- Julia: 1.11.0
- LLVM: 16.0.6

Preferences:
- CUDA_Runtime_jll.version: 12.1

1 device:
  0: NVIDIA GeForce RTX 4060 Ti (sm_89, 14.959 GiB / 15.996 GiB available)

Additional context

The bug report is submitted according to the instruction from the error information shown in Julia REPL.

...
 From worker 3:
      From worker 3:    Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
      From worker 3:    Exception: EXCEPTION_BREAKPOINT at 0x7ffedc5503ec -- _ZNK4llvm15NVPTXAsmPrinter24getPTXFundamentalTypeStrB5cxx11EPNS_4TypeEb at C:\Users\admin\AppData\Local\julias\julia-1.11\bin\libLLVM-16jl.dll (unknown line)
      From worker 3:    in expression starting at C:\Users\admin\.julia\packages\CUDA\2kjXI\test\core\device\intrinsics\atomics.jl:5
...
@Tree-Yang Tree-Yang added the bug Something isn't working label Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant