Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests failing on Jetson Nano Orin #2580

Open
denglerchr opened this issue Dec 9, 2024 · 3 comments
Open

Tests failing on Jetson Nano Orin #2580

denglerchr opened this issue Dec 9, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@denglerchr
Copy link

Good morning,

An issue I mentioned in #2542 (comment) , but is probably unrelated. If it should be related anyhow, let me know and I will close this one.

To reproduce, use a Jetson Nano Orin, (possibly also other Orin devices, could not try), then:

]add CUDA
]test CUDA

All tests are failing and taking super long to fail as well.
As mentioned in the cited issue, I can provide temporary access to the hardware if required either via ZeroTier or delivery around Luxemburg.

Image

@denglerchr denglerchr added the bug Something isn't working label Dec 9, 2024
@maleadt
Copy link
Member

maleadt commented Dec 9, 2024

Please run the tests with --quickfail so that we can actually see the issue.

@denglerchr
Copy link
Author

Ill run it on Wednesday and come back to you, thanks!

@denglerchr
Copy link
Author

Attached the Result of running it with Pkg.test("CUDA"; test_args=["--quickfail"]):. Looks like it is the same issue as #2542 (comment) ?

     Testing Running tests...
err = CUDA.NVML.NVMLError(CUDA.NVML.NVML_ERROR_NOT_FOUND)
┌ Error: Exception while generating log record in module Main at /home/admin_julia/.julia/packages/CUDA/2kjXI/test/runtests.jl:137
│   exception =
│    NVMLError: Not Found (code 6)
│    Stacktrace:
│      [1] throw_api_error(res::CUDA.NVML.nvmlReturn_enum)
│        @ CUDA.NVML ~/.julia/packages/CUDA/2kjXI/lib/nvml/libnvml.jl:8
│      [2] check
│        @ ~/.julia/packages/CUDA/2kjXI/lib/nvml/libnvml.jl:29 [inlined]
│      [3] nvmlDeviceGetHandleByUUID
│        @ ~/.julia/packages/CUDA/2kjXI/lib/utils/call.jl:34 [inlined]
│      [4] CUDA.NVML.Device(uuid::Base.UUID; mig::Bool)
│        @ CUDA.NVML ~/.julia/packages/CUDA/2kjXI/lib/nvml/device.jl:13
│      [5] Device
│        @ ~/.julia/packages/CUDA/2kjXI/lib/nvml/device.jl:10 [inlined]
│      [6] (::CUDA.var"#query_nvml#1207"{CuDevice})()
│        @ CUDA ~/.julia/packages/CUDA/2kjXI/src/utilities.jl:118
│      [7] versioninfo(io::IOBuffer)
│        @ CUDA ~/.julia/packages/CUDA/2kjXI/src/utilities.jl:140
│      [8] (::var"#27#28")(io::IOBuffer)
│        @ Main ~/.julia/packages/CUDA/2kjXI/test/runtests.jl:137
│      [9] sprint(::Function; context::Nothing, sizehint::Int64)
│        @ Base ./strings/io.jl:114
│     [10] sprint(::Function)
│        @ Base ./strings/io.jl:107
│     [11] macro expansion
│        @ logging/logging.jl:383 [inlined]
│     [12] top-level scope
│        @ ~/.julia/packages/CUDA/2kjXI/test/runtests.jl:393
│     [13] include(fname::String)
│        @ Main ./sysimg.jl:38
│     [14] top-level scope
│        @ none:6
│     [15] eval
│        @ ./boot.jl:430 [inlined]
│     [16] exec_options(opts::Base.JLOptions)
│        @ Base ./client.jl:296
│     [17] _start()
│        @ Base ./client.jl:531
└ @ Main ~/.julia/packages/CUDA/2kjXI/test/runtests.jl:137
[ Info: Testing using device 0 (Orin). To change this, specify the `--gpu` argument to the tests, or set the `CUDA_VISIBLE_DEVICES` environment variable.
[ Info: Running 1 tests in parallel. If this is too many, specify the `--jobs` argument to the tests, or set the `JULIA_CPU_THREADS` environment variable.
                                                  |          | ---------------- GPU ---------------- | ---------------- CPU ---------------- |
Test                                     (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB) | GC (s) | GC % | Alloc (MB) | RSS (MB) |
core/initialization                           (2) |         failed at 2024-12-11T13:45:57.164
TaskFailedException

    nested task error: InterruptException:
    Stacktrace:
     [1] try_yieldto(undo::typeof(identity))
       @ Base ./task.jl:958
     [2] throwto
       @ ./task.jl:970 [inlined]
     [3] (::var"#52#59"{Dict{String, DateTime}, Task, var"#recycle_worker#57"})()
       @ Main ~/.julia/packages/CUDA/2kjXI/test/runtests.jl:390Testing finished in 17 seconds, 305 milliseconds
Worker 2 failed running test core/initialization:
Some tests did not pass: 16 passed, 0 failed, 1 errored, 0 broken.
core/initialization: Error During Test at /home/admin_julia/.julia/packages/CUDA/2kjXI/test/setup.jl:66
  Got exception outside of a @test
  LoadError: NVMLError: Not Found (code 6)
  Stacktrace:
    [1] throw_api_error(res::CUDA.NVML.nvmlReturn_enum)
      @ CUDA.NVML ~/.julia/packages/CUDA/2kjXI/lib/nvml/libnvml.jl:8
    [2] check
      @ ~/.julia/packages/CUDA/2kjXI/lib/nvml/libnvml.jl:29 [inlined]
    [3] nvmlDeviceGetHandleByUUID
      @ ~/.julia/packages/CUDA/2kjXI/lib/utils/call.jl:34 [inlined]
    [4] CUDA.NVML.Device(uuid::Base.UUID; mig::Bool)
      @ CUDA.NVML ~/.julia/packages/CUDA/2kjXI/lib/nvml/device.jl:13
    [5] top-level scope
      @ ~/.julia/packages/CUDA/2kjXI/test/core/initialization.jl:57
    [6] include
      @ ./sysimg.jl:38 [inlined]
    [7] #11
      @ ~/.julia/packages/CUDA/2kjXI/test/runtests.jl:87 [inlined]
    [8] macro expansion
      @ ~/.julia/packages/CUDA/2kjXI/test/setup.jl:67 [inlined]
    [9] macro expansion
      @ ~/.julia/juliaup/julia-1.11.2+0.aarch64.linux.gnu/share/julia/stdlib/v1.11/Test/src/Test.jl:1704 [inlined]
   [10] macro expansion
      @ ~/.julia/packages/CUDA/2kjXI/test/setup.jl:67 [inlined]
   [11] macro expansion
      @ ./timing.jl:581 [inlined]
   [12] top-level scope
      @ ~/.julia/packages/CUDA/2kjXI/test/setup.jl:66
   [13] eval
      @ ./boot.jl:430 [inlined]
   [14] runtests(f::Function, name::String, time_source::Symbol)
      @ Main ~/.julia/packages/CUDA/2kjXI/test/setup.jl:74
   [15] invokelatest(::Any, ::Any, ::Vararg{Any}; kwargs::@Kwargs{})
      @ Base ./essentials.jl:1055
   [16] invokelatest(::Any, ::Any, ::Vararg{Any})
      @ Base ./essentials.jl:1052
   [17] (::Distributed.var"#110#112"{Distributed.CallMsg{:call_fetch}})()
      @ Distributed ~/.julia/juliaup/julia-1.11.2+0.aarch64.linux.gnu/share/julia/stdlib/v1.11/Distributed/src/process_messages.jl:287
   [18] run_work_thunk(thunk::Distributed.var"#110#112"{Distributed.CallMsg{:call_fetch}}, print_error::Bool)
      @ Distributed ~/.julia/juliaup/julia-1.11.2+0.aarch64.linux.gnu/share/julia/stdlib/v1.11/Distributed/src/process_messages.jl:70
   [19] (::Distributed.var"#109#111"{Distributed.CallMsg{:call_fetch}, Distributed.MsgHeader, Sockets.TCPSocket})()
      @ Distributed ~/.julia/juliaup/julia-1.11.2+0.aarch64.linux.gnu/share/julia/stdlib/v1.11/Distributed/src/process_messages.jl:287
  in expression starting at /home/admin_julia/.julia/packages/CUDA/2kjXI/test/core/initialization.jl:36
gpuarrays/reductions/sum prod: gpuarrays/reductions/reduce: gpuarrays/reductions/mapreducedim!: gpuarrays/broadcasting: gpuarrays/reductions/== isequal: gpuarrays/base: gpuarrays/random: gpuarrays/vectors: gpuarrays/constructors: gpuarrays/reductions/mapreduce: gpuarrays/statistics: gpuarrays/linalg/norm: gpuarrays/math/intrinsics: gpuarrays/linalg/mul!/matrix-matrix: gpuarrays/reductions/mapreducedim!_large: gpuarrays/uniformscaling: gpuarrays/reductions/minimum maximum extrema: gpuarrays/reductions/any all count: gpuarrays/interface: gpuarrays/indexing multidimensional: gpuarrays/indexing find: gpuarrays/linalg/mul!/vector-matrix: gpuarrays/math/power: gpuarrays/linalg: gpuarrays/reductions/reducedim!: gpuarrays/indexing scalar: libraries/cublas: libraries/cusparse: libraries/cusolver/dense: core/execution: libraries/cusparse/interfaces: base/array: core/cudadrv: libraries/cusparse/generic: base/sorting: core/device/intrinsics/wmma: core/device/intrinsics/atomics: libraries/cufft: libraries/cusparse/conversions: core/device/intrinsics/cooperative_groups: core/device/intrinsics: base/texture: libraries/cusolver/sparse: libraries/cusparse/bmm: base/random: core/device/array: core/device/intrinsics/memory: core/codegen: libraries/cusolver/dense_generic: core/device/intrinsics/math: core/device/intrinsics/output: core/device/random: libraries/cusolver/multigpu: core/device/ldg: core/pointer: base/broadcast: core/nvml: base/exceptions: libraries/cusparse/linalg: libraries/cusolver/sparse_factorizations: core/profile: base/iterator: base/threading: core/utils: libraries/cusparse/device: libraries/staticarrays: libraries/cusparse/broadcast: core/pool: base/linalg: core/apiutils: base/examples: libraries/cusparse/reduce: base/kernelabstractions: libraries/curand: 
Test Summary:                                  | Pass  Error  Total  Time
  Overall                                      |   16     75     91      
    core/initialization                        |   16      1     17      
    gpuarrays/reductions/sum prod              |           1      1      
    gpuarrays/reductions/reduce                |           1      1      
    gpuarrays/reductions/mapreducedim!         |           1      1      
    gpuarrays/broadcasting                     |           1      1      
    gpuarrays/reductions/== isequal            |           1      1      
    gpuarrays/base                             |           1      1      
    gpuarrays/random                           |           1      1      
    gpuarrays/vectors                          |           1      1      
    gpuarrays/constructors                     |           1      1      
    gpuarrays/reductions/mapreduce             |           1      1      
    gpuarrays/statistics                       |           1      1      
    gpuarrays/linalg/norm                      |           1      1      
    gpuarrays/math/intrinsics                  |           1      1      
    gpuarrays/linalg/mul!/matrix-matrix        |           1      1      
    gpuarrays/reductions/mapreducedim!_large   |           1      1      
    gpuarrays/uniformscaling                   |           1      1      
    gpuarrays/reductions/minimum maximum extrema |           1      1      
    gpuarrays/reductions/any all count         |           1      1      
    gpuarrays/interface                        |           1      1      
    gpuarrays/indexing multidimensional        |           1      1      
    gpuarrays/indexing find                    |           1      1      
    gpuarrays/linalg/mul!/vector-matrix        |           1      1      
    gpuarrays/math/power                       |           1      1      
    gpuarrays/linalg                           |           1      1      
    gpuarrays/reductions/reducedim!            |           1      1      
    gpuarrays/indexing scalar                  |           1      1      
    libraries/cublas                           |           1      1      
    libraries/cusparse                         |           1      1      
    libraries/cusolver/dense                   |           1      1      
    core/execution                             |           1      1      
    libraries/cusparse/interfaces              |           1      1      
    base/array                                 |           1      1      
    core/cudadrv                               |           1      1      
    libraries/cusparse/generic                 |           1      1      
    base/sorting                               |           1      1      
    core/device/intrinsics/wmma                |           1      1      
    core/device/intrinsics/atomics             |           1      1      
    libraries/cufft                            |           1      1      
    libraries/cusparse/conversions             |           1      1      
    core/device/intrinsics/cooperative_groups  |           1      1      
    core/device/intrinsics                     |           1      1      
    base/texture                               |           1      1      
    libraries/cusolver/sparse                  |           1      1      
    libraries/cusparse/bmm                     |           1      1      
    base/random                                |           1      1      
    core/device/array                          |           1      1      
    core/device/intrinsics/memory              |           1      1      
    core/codegen                               |           1      1      
    libraries/cusolver/dense_generic           |           1      1      
    core/device/intrinsics/math                |           1      1      
    core/device/intrinsics/output              |           1      1      
    core/device/random                         |           1      1      
    libraries/cusolver/multigpu                |           1      1      
    core/device/ldg                            |           1      1      
    core/pointer                               |           1      1      
    base/broadcast                             |           1      1      
    core/nvml                                  |           1      1      
    base/exceptions                            |           1      1      
    libraries/cusparse/linalg                  |           1      1      
    libraries/cusolver/sparse_factorizations   |           1      1      
    core/profile                               |           1      1      
    base/iterator                              |           1      1      
    base/threading                             |           1      1      
    core/utils                                 |           1      1      
    libraries/cusparse/device                  |           1      1      
    libraries/staticarrays                     |           1      1      
    libraries/cusparse/broadcast               |           1      1      
    core/pool                                  |           1      1      
    base/linalg                                |           1      1      
    core/apiutils                              |           1      1      
    base/examples                              |           1      1      
    libraries/cusparse/reduce                  |           1      1      
    base/kernelabstractions                    |           1      1      
    libraries/curand                           |           1      1      
    FAILURE

Error in testset core/initialization:
Error During Test at /home/admin_julia/.julia/packages/CUDA/2kjXI/test/setup.jl:66
  Got exception outside of a @test
  LoadError: NVMLError: Not Found (code 6)
  Stacktrace:
    [1] throw_api_error(res::CUDA.NVML.nvmlReturn_enum)
      @ CUDA.NVML ~/.julia/packages/CUDA/2kjXI/lib/nvml/libnvml.jl:8
    [2] check
      @ ~/.julia/packages/CUDA/2kjXI/lib/nvml/libnvml.jl:29 [inlined]
    [3] nvmlDeviceGetHandleByUUID
      @ ~/.julia/packages/CUDA/2kjXI/lib/utils/call.jl:34 [inlined]
    [4] CUDA.NVML.Device(uuid::Base.UUID; mig::Bool)
      @ CUDA.NVML ~/.julia/packages/CUDA/2kjXI/lib/nvml/device.jl:13
    [5] top-level scope
      @ ~/.julia/packages/CUDA/2kjXI/test/core/initialization.jl:57
    [6] include
      @ ./sysimg.jl:38 [inlined]
    [7] #11
      @ ~/.julia/packages/CUDA/2kjXI/test/runtests.jl:87 [inlined]
    [8] macro expansion
      @ ~/.julia/packages/CUDA/2kjXI/test/setup.jl:67 [inlined]
    [9] macro expansion
      @ ~/.julia/juliaup/julia-1.11.2+0.aarch64.linux.gnu/share/julia/stdlib/v1.11/Test/src/Test.jl:1704 [inlined]
   [10] macro expansion
      @ ~/.julia/packages/CUDA/2kjXI/test/setup.jl:67 [inlined]
   [11] macro expansion
      @ ./timing.jl:581 [inlined]
   [12] top-level scope
      @ ~/.julia/packages/CUDA/2kjXI/test/setup.jl:66
   [13] eval
      @ ./boot.jl:430 [inlined]
   [14] runtests(f::Function, name::String, time_source::Symbol)
      @ Main ~/.julia/packages/CUDA/2kjXI/test/setup.jl:74
   [15] invokelatest(::Any, ::Any, ::Vararg{Any}; kwargs::@Kwargs{})
      @ Base ./essentials.jl:1055
   [16] invokelatest(::Any, ::Any, ::Vararg{Any})
      @ Base ./essentials.jl:1052
   [17] (::Distributed.var"#110#112"{Distributed.CallMsg{:call_fetch}})()
      @ Distributed ~/.julia/juliaup/julia-1.11.2+0.aarch64.linux.gnu/share/julia/stdlib/v1.11/Distributed/src/process_messages.jl:287
   [18] run_work_thunk(thunk::Distributed.var"#110#112"{Distributed.CallMsg{:call_fetch}}, print_error::Bool)
      @ Distributed ~/.julia/juliaup/julia-1.11.2+0.aarch64.linux.gnu/share/julia/stdlib/v1.11/Distributed/src/process_messages.jl:70
   [19] (::Distributed.var"#109#111"{Distributed.CallMsg{:call_fetch}, Distributed.MsgHeader, Sockets.TCPSocket})()
      @ Distributed ~/.julia/juliaup/julia-1.11.2+0.aarch64.linux.gnu/share/julia/stdlib/v1.11/Distributed/src/process_messages.jl:287
  in expression starting at /home/admin_julia/.julia/packages/CUDA/2kjXI/test/core/initialization.jl:36
Error in testset gpuarrays/reductions/sum prod:
Interrupted
Error in testset gpuarrays/reductions/reduce:
Interrupted
Error in testset gpuarrays/reductions/mapreducedim!:
Interrupted
Error in testset gpuarrays/broadcasting:
Interrupted
Error in testset gpuarrays/reductions/== isequal:
Interrupted
Error in testset gpuarrays/base:
Interrupted
Error in testset gpuarrays/random:
Interrupted
Error in testset gpuarrays/vectors:
Interrupted
Error in testset gpuarrays/constructors:
Interrupted
Error in testset gpuarrays/reductions/mapreduce:
Interrupted
Error in testset gpuarrays/statistics:
Interrupted
Error in testset gpuarrays/linalg/norm:
Interrupted
Error in testset gpuarrays/math/intrinsics:
Interrupted
Error in testset gpuarrays/linalg/mul!/matrix-matrix:
Interrupted
Error in testset gpuarrays/reductions/mapreducedim!_large:
Interrupted
Error in testset gpuarrays/uniformscaling:
Interrupted
Error in testset gpuarrays/reductions/minimum maximum extrema:
Interrupted
Error in testset gpuarrays/reductions/any all count:
Interrupted
Error in testset gpuarrays/interface:
Interrupted
Error in testset gpuarrays/indexing multidimensional:
Interrupted
Error in testset gpuarrays/indexing find:
Interrupted
Error in testset gpuarrays/linalg/mul!/vector-matrix:
Interrupted
Error in testset gpuarrays/math/power:
Interrupted
Error in testset gpuarrays/linalg:
Interrupted
Error in testset gpuarrays/reductions/reducedim!:
Interrupted
Error in testset gpuarrays/indexing scalar:
Interrupted
Error in testset libraries/cublas:
Interrupted
Error in testset libraries/cusparse:
Interrupted
Error in testset libraries/cusolver/dense:
Interrupted
Error in testset core/execution:
Interrupted
Error in testset libraries/cusparse/interfaces:
Interrupted
Error in testset base/array:
Interrupted
Error in testset core/cudadrv:
Interrupted
Error in testset libraries/cusparse/generic:
Interrupted
Error in testset base/sorting:
Interrupted
Error in testset core/device/intrinsics/wmma:
Interrupted
Error in testset core/device/intrinsics/atomics:
Interrupted
Error in testset libraries/cufft:
Interrupted
Error in testset libraries/cusparse/conversions:
Interrupted
Error in testset core/device/intrinsics/cooperative_groups:
Interrupted
Error in testset core/device/intrinsics:
Interrupted
Error in testset base/texture:
Interrupted
Error in testset libraries/cusolver/sparse:
Interrupted
Error in testset libraries/cusparse/bmm:
Interrupted
Error in testset base/random:
Interrupted
Error in testset core/device/array:
Interrupted
Error in testset core/device/intrinsics/memory:
Interrupted
Error in testset core/codegen:
Interrupted
Error in testset libraries/cusolver/dense_generic:
Interrupted
Error in testset core/device/intrinsics/math:
Interrupted
Error in testset core/device/intrinsics/output:
Interrupted
Error in testset core/device/random:
Interrupted
Error in testset libraries/cusolver/multigpu:
Interrupted
Error in testset core/device/ldg:
Interrupted
Error in testset core/pointer:
Interrupted
Error in testset base/broadcast:
Interrupted
Error in testset core/nvml:
Interrupted
Error in testset base/exceptions:
Interrupted
Error in testset libraries/cusparse/linalg:
Interrupted
Error in testset libraries/cusolver/sparse_factorizations:
Interrupted
Error in testset core/profile:
Interrupted
Error in testset base/iterator:
Interrupted
Error in testset base/threading:
Interrupted
Error in testset core/utils:
Interrupted
Error in testset libraries/cusparse/device:
Interrupted
Error in testset libraries/staticarrays:
Interrupted
Error in testset libraries/cusparse/broadcast:
Interrupted
Error in testset core/pool:
Interrupted
Error in testset base/linalg:
Interrupted
Error in testset core/apiutils:
Interrupted
Error in testset base/examples:
Interrupted
Error in testset libraries/cusparse/reduce:
Interrupted
Error in testset base/kernelabstractions:
Interrupted
Error in testset libraries/curand:
Interrupted
ERROR: LoadError: Test run finished with errors
in expression starting at /home/admin_julia/.julia/packages/CUDA/2kjXI/test/runtests.jl:501
ERROR: Package CUDA errored during testing
Stacktrace:
 [1] pkgerror(msg::String)
   @ Pkg.Types ~/.julia/juliaup/julia-1.11.2+0.aarch64.linux.gnu/share/julia/stdlib/v1.11/Pkg/src/Types.jl:68
 [2] test(ctx::Pkg.Types.Context, pkgs::Vector{…}; coverage::Bool, julia_args::Cmd, test_args::Cmd, test_fn::Nothing, force_latest_compatible_version::Bool, allow_earlier_backwards_compatible_versions::Bool, allow_reresolve::Bool)
   @ Pkg.Operations ~/.julia/juliaup/julia-1.11.2+0.aarch64.linux.gnu/share/julia/stdlib/v1.11/Pkg/src/Operations.jl:2111
 [3] test
   @ ~/.julia/juliaup/julia-1.11.2+0.aarch64.linux.gnu/share/julia/stdlib/v1.11/Pkg/src/Operations.jl:1996 [inlined]
 [4] test(ctx::Pkg.Types.Context, pkgs::Vector{…}; coverage::Bool, test_fn::Nothing, julia_args::Cmd, test_args::Vector{…}, force_latest_compatible_version::Bool, allow_earlier_backwards_compatible_versions::Bool, allow_reresolve::Bool, kwargs::@Kwargs{})
   @ Pkg.API ~/.julia/juliaup/julia-1.11.2+0.aarch64.linux.gnu/share/julia/stdlib/v1.11/Pkg/src/API.jl:475
 [5] test(pkgs::Vector{Pkg.Types.PackageSpec}; io::IOContext{IO}, kwargs::@Kwargs{test_args::Vector{String}})
   @ Pkg.API ~/.julia/juliaup/julia-1.11.2+0.aarch64.linux.gnu/share/julia/stdlib/v1.11/Pkg/src/API.jl:159
 [6] test
   @ ~/.julia/juliaup/julia-1.11.2+0.aarch64.linux.gnu/share/julia/stdlib/v1.11/Pkg/src/API.jl:147 [inlined]
 [7] #test#74
   @ ~/.julia/juliaup/julia-1.11.2+0.aarch64.linux.gnu/share/julia/stdlib/v1.11/Pkg/src/API.jl:146 [inlined]
 [8] top-level scope
   @ REPL[3]:1
Some type information was truncated. Use `show(err)` to see complete types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants