Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more CUB transform benchmarks #2906

Merged
merged 4 commits into from
Nov 26, 2024

Conversation

bernhardmgruber
Copy link
Contributor

@bernhardmgruber bernhardmgruber commented Nov 20, 2024

This PR adds a bunch of new benchmarks for CUB transform. I still have to figure out whether all benchmarks are necessary, because I don't want to cover redundant aspects.

Turns out, we cannot run this using tuning:

#### ERROR exception occured while running cub.bench.transform.other: 'All values in the dictionary are not equal. First value: {'OffsetT{ct}': ['I32', 'I64']} All values: [{'OffsetT{ct}': ['I32', 'I64']}, {'OffsetT{ct}': ['I32', 'I64']}, {'OffsetT{ct}': ['I32', 'I64']}, {'Heaviness{ct}': ['32', '64', '128', '256']}]'

@gevtushenko do all sub benchmarks within the same benchmark file must have the same axes?

Edit: I split the benchmarks into separate files now and this resolved the issue.

Copy link
Contributor

🟩 CI finished in 44m 57s: Pass: 100%/224 | Total: 1d 01h | Avg: 6m 52s | Max: 29m 42s | Hits: 99%/12288
  • 🟩 thrust: Pass: 100%/111 | Total: 12h 18m | Avg: 6m 39s | Max: 23m 02s | Hits: 99%/9260

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 21m 28s | Avg: 10m 44s | Max: 15m 21s
    🟩 cpu
      🟩 amd64              Pass: 100%/103 | Total: 11h 40m | Avg:  6m 47s | Max: 23m 02s | Hits:  99%/9260  
      🟩 arm64              Pass: 100%/8   | Total: 38m 39s | Avg:  4m 49s | Max:  5m 26s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  1h 20m | Avg:  5m 21s | Max: 17m 43s | Hits:  99%/1852  
      🟩 11.8               Pass: 100%/3   | Total: 16m 22s | Avg:  5m 27s | Max:  5m 54s
      🟩 12.5               Pass: 100%/4   | Total:  1h 00m | Avg: 15m 13s | Max: 15m 52s
      🟩 12.6               Pass: 100%/89  | Total:  9h 41m | Avg:  6m 31s | Max: 23m 02s | Hits:  99%/7408  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/4   | Total: 20m 13s | Avg:  5m 03s | Max:  5m 13s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  1h 20m | Avg:  5m 21s | Max: 17m 43s | Hits:  99%/1852  
      🟩 nvcc11.8           Pass: 100%/3   | Total: 16m 22s | Avg:  5m 27s | Max:  5m 54s
      🟩 nvcc12.5           Pass: 100%/4   | Total:  1h 00m | Avg: 15m 13s | Max: 15m 52s
      🟩 nvcc12.6           Pass: 100%/85  | Total:  9h 21m | Avg:  6m 36s | Max: 23m 02s | Hits:  99%/7408  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/4   | Total: 20m 13s | Avg:  5m 03s | Max:  5m 13s
      🟩 nvcc               Pass: 100%/107 | Total: 11h 58m | Avg:  6m 42s | Max: 23m 02s | Hits:  99%/9260  
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total: 33m 30s | Avg:  5m 35s | Max:  6m 55s
      🟩 Clang10            Pass: 100%/3   | Total: 19m 25s | Avg:  6m 28s | Max:  7m 09s
      🟩 Clang11            Pass: 100%/4   | Total: 20m 38s | Avg:  5m 09s | Max:  5m 37s
      🟩 Clang12            Pass: 100%/4   | Total: 20m 22s | Avg:  5m 05s | Max:  5m 15s
      🟩 Clang13            Pass: 100%/4   | Total: 20m 07s | Avg:  5m 01s | Max:  5m 15s
      🟩 Clang14            Pass: 100%/4   | Total: 20m 58s | Avg:  5m 14s | Max:  5m 25s
      🟩 Clang15            Pass: 100%/4   | Total: 20m 53s | Avg:  5m 13s | Max:  5m 27s
      🟩 Clang16            Pass: 100%/4   | Total: 20m 55s | Avg:  5m 13s | Max:  5m 48s
      🟩 Clang17            Pass: 100%/4   | Total: 21m 06s | Avg:  5m 16s | Max:  5m 36s
      🟩 Clang18            Pass: 100%/11  | Total:  1h 05m | Avg:  5m 58s | Max: 14m 05s
      🟩 GCC6               Pass: 100%/2   | Total:  8m 21s | Avg:  4m 10s | Max:  4m 23s
      🟩 GCC7               Pass: 100%/6   | Total: 27m 45s | Avg:  4m 37s | Max:  5m 53s
      🟩 GCC8               Pass: 100%/6   | Total: 28m 54s | Avg:  4m 49s | Max:  5m 23s
      🟩 GCC9               Pass: 100%/6   | Total: 30m 11s | Avg:  5m 01s | Max:  5m 38s
      🟩 GCC10              Pass: 100%/4   | Total: 22m 46s | Avg:  5m 41s | Max:  6m 46s
      🟩 GCC11              Pass: 100%/7   | Total: 38m 19s | Avg:  5m 28s | Max:  5m 56s
      🟩 GCC12              Pass: 100%/4   | Total: 24m 47s | Avg:  6m 11s | Max:  6m 44s
      🟩 GCC13              Pass: 100%/16  | Total:  1h 57m | Avg:  7m 20s | Max: 17m 13s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total: 20m 54s | Avg:  6m 58s | Max:  7m 31s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 17m 43s | Avg: 17m 43s | Max: 17m 43s | Hits:  99%/1852  
      🟩 MSVC14.29          Pass: 100%/2   | Total: 38m 43s | Avg: 19m 21s | Max: 23m 02s | Hits:  99%/3704  
      🟩 MSVC14.39          Pass: 100%/2   | Total: 38m 29s | Avg: 19m 14s | Max: 22m 13s | Hits:  99%/3704  
      🟩 NVHPC24.7          Pass: 100%/4   | Total:  1h 00m | Avg: 15m 13s | Max: 15m 52s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/48  | Total:  4h 23m | Avg:  5m 29s | Max: 14m 05s
      🟩 GCC                Pass: 100%/51  | Total:  4h 58m | Avg:  5m 51s | Max: 17m 13s
      🟩 Intel              Pass: 100%/3   | Total: 20m 54s | Avg:  6m 58s | Max:  7m 31s
      🟩 MSVC               Pass: 100%/5   | Total:  1h 34m | Avg: 18m 59s | Max: 23m 02s | Hits:  99%/9260  
      🟩 NVHPC              Pass: 100%/4   | Total:  1h 00m | Avg: 15m 13s | Max: 15m 52s
    🟩 gpu
      🟩 v100               Pass: 100%/111 | Total: 12h 18m | Avg:  6m 39s | Max: 23m 02s | Hits:  99%/9260  
    🟩 jobs
      🟩 Build              Pass: 100%/103 | Total: 10h 34m | Avg:  6m 09s | Max: 23m 02s | Hits:  99%/7408  
      🟩 TestCPU            Pass: 100%/4   | Total: 46m 14s | Avg: 11m 33s | Max: 22m 13s | Hits:  99%/1852  
      🟩 TestGPU            Pass: 100%/4   | Total: 58m 09s | Avg: 14m 32s | Max: 17m 13s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total: 16m 22s | Avg:  5m 27s | Max:  5m 54s
      🟩 90a                Pass: 100%/4   | Total: 17m 33s | Avg:  4m 23s | Max:  4m 36s
    🟩 std
      🟩 11                 Pass: 100%/30  | Total:  2h 45m | Avg:  5m 31s | Max: 15m 09s
      🟩 14                 Pass: 100%/29  | Total:  3h 05m | Avg:  6m 23s | Max: 17m 43s | Hits:  99%/3704  
      🟩 17                 Pass: 100%/27  | Total:  2h 55m | Avg:  6m 29s | Max: 23m 02s | Hits:  99%/1852  
      🟩 20                 Pass: 100%/23  | Total:  3h 10m | Avg:  8m 17s | Max: 22m 13s | Hits:  99%/3704  
    
  • 🟩 cub: Pass: 100%/110 | Total: 12h 57m | Avg: 7m 03s | Max: 29m 42s | Hits: 99%/3028

    🟩 cpu
      🟩 amd64              Pass: 100%/102 | Total: 12h 15m | Avg:  7m 12s | Max: 29m 42s | Hits:  99%/3028  
      🟩 arm64              Pass: 100%/8   | Total: 42m 05s | Avg:  5m 15s | Max:  6m 16s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  1h 16m | Avg:  5m 05s | Max: 14m 13s | Hits:  99%/757   
      🟩 11.8               Pass: 100%/3   | Total: 17m 26s | Avg:  5m 48s | Max:  6m 07s
      🟩 12.5               Pass: 100%/4   | Total: 39m 13s | Avg:  9m 48s | Max: 10m 29s
      🟩 12.6               Pass: 100%/88  | Total: 10h 44m | Avg:  7m 19s | Max: 29m 42s | Hits:  99%/2271  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/4   | Total: 16m 51s | Avg:  4m 12s | Max:  4m 23s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  1h 16m | Avg:  5m 05s | Max: 14m 13s | Hits:  99%/757   
      🟩 nvcc11.8           Pass: 100%/3   | Total: 17m 26s | Avg:  5m 48s | Max:  6m 07s
      🟩 nvcc12.5           Pass: 100%/4   | Total: 39m 13s | Avg:  9m 48s | Max: 10m 29s
      🟩 nvcc12.6           Pass: 100%/84  | Total: 10h 27m | Avg:  7m 28s | Max: 29m 42s | Hits:  99%/2271  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/4   | Total: 16m 51s | Avg:  4m 12s | Max:  4m 23s
      🟩 nvcc               Pass: 100%/106 | Total: 12h 40m | Avg:  7m 10s | Max: 29m 42s | Hits:  99%/3028  
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total: 33m 44s | Avg:  5m 37s | Max:  7m 01s
      🟩 Clang10            Pass: 100%/3   | Total: 20m 36s | Avg:  6m 52s | Max:  6m 59s
      🟩 Clang11            Pass: 100%/4   | Total: 22m 16s | Avg:  5m 34s | Max:  5m 52s
      🟩 Clang12            Pass: 100%/4   | Total: 22m 20s | Avg:  5m 35s | Max:  5m 47s
      🟩 Clang13            Pass: 100%/4   | Total: 23m 21s | Avg:  5m 50s | Max:  6m 18s
      🟩 Clang14            Pass: 100%/4   | Total: 22m 32s | Avg:  5m 38s | Max:  5m 52s
      🟩 Clang15            Pass: 100%/4   | Total: 22m 34s | Avg:  5m 38s | Max:  5m 56s
      🟩 Clang16            Pass: 100%/4   | Total: 22m 08s | Avg:  5m 32s | Max:  5m 49s
      🟩 Clang17            Pass: 100%/4   | Total: 21m 53s | Avg:  5m 28s | Max:  5m 42s
      🟩 Clang18            Pass: 100%/11  | Total:  1h 27m | Avg:  7m 59s | Max: 28m 56s
      🟩 GCC6               Pass: 100%/2   | Total:  8m 23s | Avg:  4m 11s | Max:  4m 21s
      🟩 GCC7               Pass: 100%/6   | Total: 30m 36s | Avg:  5m 06s | Max:  5m 48s
      🟩 GCC8               Pass: 100%/6   | Total: 30m 18s | Avg:  5m 03s | Max:  5m 56s
      🟩 GCC9               Pass: 100%/6   | Total: 30m 53s | Avg:  5m 08s | Max:  6m 21s
      🟩 GCC10              Pass: 100%/4   | Total: 22m 20s | Avg:  5m 35s | Max:  5m 56s
      🟩 GCC11              Pass: 100%/7   | Total: 39m 55s | Avg:  5m 42s | Max:  6m 07s
      🟩 GCC12              Pass: 100%/4   | Total: 24m 06s | Avg:  6m 01s | Max:  6m 24s
      🟩 GCC13              Pass: 100%/16  | Total:  3h 00m | Avg: 11m 16s | Max: 29m 42s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total: 18m 45s | Avg:  6m 15s | Max:  6m 42s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 14m 13s | Avg: 14m 13s | Max: 14m 13s | Hits:  99%/757   
      🟩 MSVC14.29          Pass: 100%/2   | Total: 25m 46s | Avg: 12m 53s | Max: 13m 22s | Hits:  99%/1514  
      🟩 MSVC14.39          Pass: 100%/1   | Total: 12m 59s | Avg: 12m 59s | Max: 12m 59s | Hits:  99%/757   
      🟩 NVHPC24.7          Pass: 100%/4   | Total: 39m 13s | Avg:  9m 48s | Max: 10m 29s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/48  | Total:  4h 59m | Avg:  6m 14s | Max: 28m 56s
      🟩 GCC                Pass: 100%/51  | Total:  6h 06m | Avg:  7m 11s | Max: 29m 42s
      🟩 Intel              Pass: 100%/3   | Total: 18m 45s | Avg:  6m 15s | Max:  6m 42s
      🟩 MSVC               Pass: 100%/4   | Total: 52m 58s | Avg: 13m 14s | Max: 14m 13s | Hits:  99%/3028  
      🟩 NVHPC              Pass: 100%/4   | Total: 39m 13s | Avg:  9m 48s | Max: 10m 29s
    🟩 gpu
      🟩 v100               Pass: 100%/110 | Total: 12h 57m | Avg:  7m 03s | Max: 29m 42s | Hits:  99%/3028  
    🟩 jobs
      🟩 Build              Pass: 100%/102 | Total: 10h 04m | Avg:  5m 55s | Max: 14m 13s | Hits:  99%/3028  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 21m 50s | Avg: 21m 50s | Max: 21m 50s
      🟩 GraphCapture       Pass: 100%/1   | Total: 15m 06s | Avg: 15m 06s | Max: 15m 06s
      🟩 HostLaunch         Pass: 100%/3   | Total: 56m 11s | Avg: 18m 43s | Max: 20m 57s
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 19m | Avg: 26m 32s | Max: 29m 42s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total: 17m 26s | Avg:  5m 48s | Max:  6m 07s
      🟩 90a                Pass: 100%/4   | Total: 18m 44s | Avg:  4m 41s | Max:  4m 45s
    🟩 std
      🟩 11                 Pass: 100%/30  | Total:  3h 12m | Avg:  6m 25s | Max: 20m 59s
      🟩 14                 Pass: 100%/29  | Total:  2h 57m | Avg:  6m 07s | Max: 14m 13s | Hits:  99%/1514  
      🟩 17                 Pass: 100%/27  | Total:  2h 43m | Avg:  6m 02s | Max: 12m 24s | Hits:  99%/757   
      🟩 20                 Pass: 100%/24  | Total:  4h 03m | Avg: 10m 08s | Max: 29m 42s | Hits:  99%/757   
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 29s | Avg: 4m 44s | Max: 7m 15s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  9m 29s | Avg:  4m 44s | Max:  7m 15s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  9m 29s | Avg:  4m 44s | Max:  7m 15s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  9m 29s | Avg:  4m 44s | Max:  7m 15s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  9m 29s | Avg:  4m 44s | Max:  7m 15s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  9m 29s | Avg:  4m 44s | Max:  7m 15s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  9m 29s | Avg:  4m 44s | Max:  7m 15s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total:  9m 29s | Avg:  4m 44s | Max:  7m 15s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 14s | Avg:  2m 14s | Max:  2m 14s
      🟩 Test               Pass: 100%/1   | Total:  7m 15s | Avg:  7m 15s | Max:  7m 15s
    
  • 🟩 python: Pass: 100%/1 | Total: 15m 46s | Avg: 15m 46s | Max: 15m 46s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 15m 46s | Avg: 15m 46s | Max: 15m 46s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 15m 46s | Avg: 15m 46s | Max: 15m 46s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 15m 46s | Avg: 15m 46s | Max: 15m 46s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 15m 46s | Avg: 15m 46s | Max: 15m 46s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 15m 46s | Avg: 15m 46s | Max: 15m 46s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 15m 46s | Avg: 15m 46s | Max: 15m 46s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 15m 46s | Avg: 15m 46s | Max: 15m 46s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 15m 46s | Avg: 15m 46s | Max: 15m 46s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 224)

# Runner
185 linux-amd64-cpu16
16 linux-arm64-cpu16
14 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16

cub/benchmarks/bench/transform/other.cu Outdated Show resolved Hide resolved
Copy link
Contributor

🟩 CI finished in 47m 48s: Pass: 100%/224 | Total: 1d 00h | Avg: 6m 38s | Max: 26m 39s | Hits: 99%/12288
  • 🟩 thrust: Pass: 100%/111 | Total: 11h 48m | Avg: 6m 23s | Max: 22m 09s | Hits: 99%/9260

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 20m 08s | Avg: 10m 04s | Max: 13m 49s
    🟩 cpu
      🟩 amd64              Pass: 100%/103 | Total: 11h 11m | Avg:  6m 31s | Max: 22m 09s | Hits:  99%/9260  
      🟩 arm64              Pass: 100%/8   | Total: 37m 08s | Avg:  4m 38s | Max:  4m 59s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  1h 18m | Avg:  5m 12s | Max: 18m 17s | Hits:  99%/1852  
      🟩 11.8               Pass: 100%/3   | Total: 15m 14s | Avg:  5m 04s | Max:  5m 34s
      🟩 12.5               Pass: 100%/4   | Total:  1h 00m | Avg: 15m 00s | Max: 16m 49s
      🟩 12.6               Pass: 100%/89  | Total:  9h 15m | Avg:  6m 14s | Max: 22m 09s | Hits:  99%/7408  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/4   | Total: 19m 00s | Avg:  4m 45s | Max:  4m 55s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  1h 18m | Avg:  5m 12s | Max: 18m 17s | Hits:  99%/1852  
      🟩 nvcc11.8           Pass: 100%/3   | Total: 15m 14s | Avg:  5m 04s | Max:  5m 34s
      🟩 nvcc12.5           Pass: 100%/4   | Total:  1h 00m | Avg: 15m 00s | Max: 16m 49s
      🟩 nvcc12.6           Pass: 100%/85  | Total:  8h 56m | Avg:  6m 18s | Max: 22m 09s | Hits:  99%/7408  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/4   | Total: 19m 00s | Avg:  4m 45s | Max:  4m 55s
      🟩 nvcc               Pass: 100%/107 | Total: 11h 29m | Avg:  6m 26s | Max: 22m 09s | Hits:  99%/9260  
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total: 32m 02s | Avg:  5m 20s | Max:  6m 50s
      🟩 Clang10            Pass: 100%/3   | Total: 19m 47s | Avg:  6m 35s | Max:  7m 45s
      🟩 Clang11            Pass: 100%/4   | Total: 20m 32s | Avg:  5m 08s | Max:  5m 34s
      🟩 Clang12            Pass: 100%/4   | Total: 20m 31s | Avg:  5m 07s | Max:  5m 34s
      🟩 Clang13            Pass: 100%/4   | Total: 20m 33s | Avg:  5m 08s | Max:  5m 35s
      🟩 Clang14            Pass: 100%/4   | Total: 20m 19s | Avg:  5m 04s | Max:  5m 42s
      🟩 Clang15            Pass: 100%/4   | Total: 20m 59s | Avg:  5m 14s | Max:  5m 40s
      🟩 Clang16            Pass: 100%/4   | Total: 21m 45s | Avg:  5m 26s | Max:  6m 03s
      🟩 Clang17            Pass: 100%/4   | Total: 20m 28s | Avg:  5m 07s | Max:  5m 24s
      🟩 Clang18            Pass: 100%/11  | Total:  1h 03m | Avg:  5m 44s | Max: 13m 46s
      🟩 GCC6               Pass: 100%/2   | Total:  8m 07s | Avg:  4m 03s | Max:  4m 04s
      🟩 GCC7               Pass: 100%/6   | Total: 26m 55s | Avg:  4m 29s | Max:  5m 28s
      🟩 GCC8               Pass: 100%/6   | Total: 27m 56s | Avg:  4m 39s | Max:  5m 07s
      🟩 GCC9               Pass: 100%/6   | Total: 29m 12s | Avg:  4m 52s | Max:  5m 56s
      🟩 GCC10              Pass: 100%/4   | Total: 20m 38s | Avg:  5m 09s | Max:  5m 35s
      🟩 GCC11              Pass: 100%/7   | Total: 36m 37s | Avg:  5m 13s | Max:  5m 38s
      🟩 GCC12              Pass: 100%/4   | Total: 22m 14s | Avg:  5m 33s | Max:  5m 56s
      🟩 GCC13              Pass: 100%/16  | Total:  1h 48m | Avg:  6m 48s | Max: 13m 49s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total: 18m 54s | Avg:  6m 18s | Max:  6m 42s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 18m 17s | Avg: 18m 17s | Max: 18m 17s | Hits:  99%/1852  
      🟩 MSVC14.29          Pass: 100%/2   | Total: 31m 35s | Avg: 15m 47s | Max: 16m 15s | Hits:  99%/3704  
      🟩 MSVC14.39          Pass: 100%/2   | Total: 39m 28s | Avg: 19m 44s | Max: 22m 09s | Hits:  99%/3704  
      🟩 NVHPC24.7          Pass: 100%/4   | Total:  1h 00m | Avg: 15m 00s | Max: 16m 49s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/48  | Total:  4h 20m | Avg:  5m 25s | Max: 13m 46s
      🟩 GCC                Pass: 100%/51  | Total:  4h 40m | Avg:  5m 30s | Max: 13m 49s
      🟩 Intel              Pass: 100%/3   | Total: 18m 54s | Avg:  6m 18s | Max:  6m 42s
      🟩 MSVC               Pass: 100%/5   | Total:  1h 29m | Avg: 17m 52s | Max: 22m 09s | Hits:  99%/9260  
      🟩 NVHPC              Pass: 100%/4   | Total:  1h 00m | Avg: 15m 00s | Max: 16m 49s
    🟩 gpu
      🟩 v100               Pass: 100%/111 | Total: 11h 48m | Avg:  6m 23s | Max: 22m 09s | Hits:  99%/9260  
    🟩 jobs
      🟩 Build              Pass: 100%/103 | Total: 10h 12m | Avg:  5m 56s | Max: 18m 17s | Hits:  99%/7408  
      🟩 TestCPU            Pass: 100%/4   | Total: 43m 56s | Avg: 10m 59s | Max: 22m 09s | Hits:  99%/1852  
      🟩 TestGPU            Pass: 100%/4   | Total: 52m 50s | Avg: 13m 12s | Max: 13m 49s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total: 15m 14s | Avg:  5m 04s | Max:  5m 34s
      🟩 90a                Pass: 100%/4   | Total: 18m 04s | Avg:  4m 31s | Max:  4m 52s
    🟩 std
      🟩 11                 Pass: 100%/30  | Total:  2h 38m | Avg:  5m 17s | Max: 13m 45s
      🟩 14                 Pass: 100%/29  | Total:  2h 58m | Avg:  6m 09s | Max: 18m 17s | Hits:  99%/3704  
      🟩 17                 Pass: 100%/27  | Total:  2h 48m | Avg:  6m 14s | Max: 15m 25s | Hits:  99%/1852  
      🟩 20                 Pass: 100%/23  | Total:  3h 02m | Avg:  7m 57s | Max: 22m 09s | Hits:  99%/3704  
    
  • 🟩 cub: Pass: 100%/110 | Total: 12h 36m | Avg: 6m 52s | Max: 26m 39s | Hits: 99%/3028

    🟩 cpu
      🟩 amd64              Pass: 100%/102 | Total: 11h 52m | Avg:  6m 59s | Max: 26m 39s | Hits:  99%/3028  
      🟩 arm64              Pass: 100%/8   | Total: 43m 51s | Avg:  5m 28s | Max: 10m 09s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  1h 13m | Avg:  4m 53s | Max: 13m 36s | Hits:  99%/757   
      🟩 11.8               Pass: 100%/3   | Total: 16m 39s | Avg:  5m 33s | Max:  5m 51s
      🟩 12.5               Pass: 100%/4   | Total: 37m 38s | Avg:  9m 24s | Max: 10m 14s
      🟩 12.6               Pass: 100%/88  | Total: 10h 28m | Avg:  7m 08s | Max: 26m 39s | Hits:  99%/2271  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/4   | Total: 16m 37s | Avg:  4m 09s | Max:  4m 12s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  1h 13m | Avg:  4m 53s | Max: 13m 36s | Hits:  99%/757   
      🟩 nvcc11.8           Pass: 100%/3   | Total: 16m 39s | Avg:  5m 33s | Max:  5m 51s
      🟩 nvcc12.5           Pass: 100%/4   | Total: 37m 38s | Avg:  9m 24s | Max: 10m 14s
      🟩 nvcc12.6           Pass: 100%/84  | Total: 10h 12m | Avg:  7m 17s | Max: 26m 39s | Hits:  99%/2271  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/4   | Total: 16m 37s | Avg:  4m 09s | Max:  4m 12s
      🟩 nvcc               Pass: 100%/106 | Total: 12h 19m | Avg:  6m 58s | Max: 26m 39s | Hits:  99%/3028  
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total: 32m 21s | Avg:  5m 23s | Max:  6m 14s
      🟩 Clang10            Pass: 100%/3   | Total: 19m 35s | Avg:  6m 31s | Max:  6m 52s
      🟩 Clang11            Pass: 100%/4   | Total: 21m 13s | Avg:  5m 18s | Max:  5m 40s
      🟩 Clang12            Pass: 100%/4   | Total: 21m 28s | Avg:  5m 22s | Max:  5m 45s
      🟩 Clang13            Pass: 100%/4   | Total: 22m 07s | Avg:  5m 31s | Max:  6m 09s
      🟩 Clang14            Pass: 100%/4   | Total: 22m 30s | Avg:  5m 37s | Max:  5m 52s
      🟩 Clang15            Pass: 100%/4   | Total: 21m 56s | Avg:  5m 29s | Max:  5m 42s
      🟩 Clang16            Pass: 100%/4   | Total: 22m 54s | Avg:  5m 43s | Max:  6m 08s
      🟩 Clang17            Pass: 100%/4   | Total: 22m 18s | Avg:  5m 34s | Max:  6m 04s
      🟩 Clang18            Pass: 100%/11  | Total:  1h 27m | Avg:  7m 56s | Max: 26m 39s
      🟩 GCC6               Pass: 100%/2   | Total:  8m 08s | Avg:  4m 04s | Max:  4m 11s
      🟩 GCC7               Pass: 100%/6   | Total: 27m 54s | Avg:  4m 39s | Max:  5m 31s
      🟩 GCC8               Pass: 100%/6   | Total: 28m 13s | Avg:  4m 42s | Max:  5m 43s
      🟩 GCC9               Pass: 100%/6   | Total: 29m 44s | Avg:  4m 57s | Max:  5m 59s
      🟩 GCC10              Pass: 100%/4   | Total: 23m 09s | Avg:  5m 47s | Max:  6m 09s
      🟩 GCC11              Pass: 100%/7   | Total: 39m 44s | Avg:  5m 40s | Max:  6m 00s
      🟩 GCC12              Pass: 100%/4   | Total: 23m 48s | Avg:  5m 57s | Max:  6m 16s
      🟩 GCC13              Pass: 100%/16  | Total:  2h 55m | Avg: 10m 56s | Max: 23m 26s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total: 19m 03s | Avg:  6m 21s | Max:  6m 59s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 13m 36s | Avg: 13m 36s | Max: 13m 36s | Hits:  99%/757   
      🟩 MSVC14.29          Pass: 100%/2   | Total: 23m 57s | Avg: 11m 58s | Max: 12m 48s | Hits:  99%/1514  
      🟩 MSVC14.39          Pass: 100%/1   | Total: 12m 44s | Avg: 12m 44s | Max: 12m 44s | Hits:  99%/757   
      🟩 NVHPC24.7          Pass: 100%/4   | Total: 37m 38s | Avg:  9m 24s | Max: 10m 14s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/48  | Total:  4h 53m | Avg:  6m 07s | Max: 26m 39s
      🟩 GCC                Pass: 100%/51  | Total:  5h 55m | Avg:  6m 58s | Max: 23m 26s
      🟩 Intel              Pass: 100%/3   | Total: 19m 03s | Avg:  6m 21s | Max:  6m 59s
      🟩 MSVC               Pass: 100%/4   | Total: 50m 17s | Avg: 12m 34s | Max: 13m 36s | Hits:  99%/3028  
      🟩 NVHPC              Pass: 100%/4   | Total: 37m 38s | Avg:  9m 24s | Max: 10m 14s
    🟩 gpu
      🟩 v100               Pass: 100%/110 | Total: 12h 36m | Avg:  6m 52s | Max: 26m 39s | Hits:  99%/3028  
    🟩 jobs
      🟩 Build              Pass: 100%/102 | Total:  9h 49m | Avg:  5m 47s | Max: 13m 36s | Hits:  99%/3028  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 20m 31s | Avg: 20m 31s | Max: 20m 31s
      🟩 GraphCapture       Pass: 100%/1   | Total: 17m 52s | Avg: 17m 52s | Max: 17m 52s
      🟩 HostLaunch         Pass: 100%/3   | Total: 54m 56s | Avg: 18m 18s | Max: 19m 32s
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 13m | Avg: 24m 21s | Max: 26m 39s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total: 16m 39s | Avg:  5m 33s | Max:  5m 51s
      🟩 90a                Pass: 100%/4   | Total: 17m 29s | Avg:  4m 22s | Max:  4m 29s
    🟩 std
      🟩 11                 Pass: 100%/30  | Total:  3h 03m | Avg:  6m 07s | Max: 22m 58s
      🟩 14                 Pass: 100%/29  | Total:  2h 47m | Avg:  5m 46s | Max: 13m 36s | Hits:  99%/1514  
      🟩 17                 Pass: 100%/27  | Total:  2h 45m | Avg:  6m 08s | Max: 12m 48s | Hits:  99%/757   
      🟩 20                 Pass: 100%/24  | Total:  3h 59m | Avg:  9m 58s | Max: 26m 39s | Hits:  99%/757   
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 8m 57s | Avg: 4m 28s | Max: 6m 52s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  8m 57s | Avg:  4m 28s | Max:  6m 52s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  8m 57s | Avg:  4m 28s | Max:  6m 52s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  8m 57s | Avg:  4m 28s | Max:  6m 52s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  8m 57s | Avg:  4m 28s | Max:  6m 52s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  8m 57s | Avg:  4m 28s | Max:  6m 52s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  8m 57s | Avg:  4m 28s | Max:  6m 52s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total:  8m 57s | Avg:  4m 28s | Max:  6m 52s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 05s | Avg:  2m 05s | Max:  2m 05s
      🟩 Test               Pass: 100%/1   | Total:  6m 52s | Avg:  6m 52s | Max:  6m 52s
    
  • 🟩 python: Pass: 100%/1 | Total: 14m 50s | Avg: 14m 50s | Max: 14m 50s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 14m 50s | Avg: 14m 50s | Max: 14m 50s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 14m 50s | Avg: 14m 50s | Max: 14m 50s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 14m 50s | Avg: 14m 50s | Max: 14m 50s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 14m 50s | Avg: 14m 50s | Max: 14m 50s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 14m 50s | Avg: 14m 50s | Max: 14m 50s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 14m 50s | Avg: 14m 50s | Max: 14m 50s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 14m 50s | Avg: 14m 50s | Max: 14m 50s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 14m 50s | Avg: 14m 50s | Max: 14m 50s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 224)

# Runner
185 linux-amd64-cpu16
16 linux-arm64-cpu16
14 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16

@bernhardmgruber bernhardmgruber marked this pull request as ready for review November 25, 2024 08:49
@bernhardmgruber bernhardmgruber requested review from a team as code owners November 25, 2024 08:49
@bernhardmgruber
Copy link
Contributor Author

Non-serious run on my workstation:
image
Looks like compare_complex and heavy are compute bound (as expected) and thus don't reach SOL memory bandwidth. I expected fibonacci (thread divergence) and vertex_transform (more compute than babelstream) to be slower. Let's see what happens on a different GPU.

@bernhardmgruber
Copy link
Contributor Author

On H100:
image
Here, the impact of thread divergence (fib), compute intensity (compare_complex) and compute + register pressure (heavy) are more starkly pronounced. At this point I feel the vertex_transform example is not giving us any new insight.

Copy link
Contributor

🟩 CI finished in 2h 06m: Pass: 100%/224 | Total: 5d 08h | Avg: 34m 28s | Max: 1h 18m | Hits: 87%/12288
  • 🟩 thrust: Pass: 100%/111 | Total: 2d 02h | Avg: 27m 20s | Max: 1h 08m | Hits: 90%/9260

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 59m 34s | Avg: 29m 47s | Max: 32m 44s
    🟩 cpu
      🟩 amd64              Pass: 100%/103 | Total:  1d 23h | Avg: 27m 23s | Max:  1h 08m | Hits:  90%/9260  
      🟩 arm64              Pass: 100%/8   | Total:  3h 33m | Avg: 26m 42s | Max: 40m 51s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  5h 36m | Avg: 22m 26s | Max: 35m 21s | Hits:  99%/1852  
      🟩 11.8               Pass: 100%/3   | Total:  1h 41m | Avg: 33m 56s | Max: 51m 26s
      🟩 12.5               Pass: 100%/4   | Total:  3h 14m | Avg: 48m 39s | Max:  1h 02m
      🟩 12.6               Pass: 100%/89  | Total:  1d 16h | Avg: 26m 58s | Max:  1h 08m | Hits:  88%/7408  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/4   | Total:  1h 31m | Avg: 22m 46s | Max: 32m 14s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  5h 36m | Avg: 22m 26s | Max: 35m 21s | Hits:  99%/1852  
      🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 41m | Avg: 33m 56s | Max: 51m 26s
      🟩 nvcc12.5           Pass: 100%/4   | Total:  3h 14m | Avg: 48m 39s | Max:  1h 02m
      🟩 nvcc12.6           Pass: 100%/85  | Total:  1d 14h | Avg: 27m 10s | Max:  1h 08m | Hits:  88%/7408  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/4   | Total:  1h 31m | Avg: 22m 46s | Max: 32m 14s
      🟩 nvcc               Pass: 100%/107 | Total:  2d 01h | Avg: 27m 30s | Max:  1h 08m | Hits:  90%/9260  
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  2h 24m | Avg: 24m 08s | Max: 38m 30s
      🟩 Clang10            Pass: 100%/3   | Total:  1h 32m | Avg: 30m 46s | Max: 46m 42s
      🟩 Clang11            Pass: 100%/4   | Total:  1h 52m | Avg: 28m 03s | Max: 37m 42s
      🟩 Clang12            Pass: 100%/4   | Total:  1h 54m | Avg: 28m 36s | Max: 39m 38s
      🟩 Clang13            Pass: 100%/4   | Total:  1h 46m | Avg: 26m 41s | Max: 35m 10s
      🟩 Clang14            Pass: 100%/4   | Total:  1h 49m | Avg: 27m 19s | Max: 37m 06s
      🟩 Clang15            Pass: 100%/4   | Total:  1h 55m | Avg: 28m 45s | Max: 40m 46s
      🟩 Clang16            Pass: 100%/4   | Total:  1h 48m | Avg: 27m 02s | Max: 36m 54s
      🟩 Clang17            Pass: 100%/4   | Total:  1h 52m | Avg: 28m 03s | Max: 37m 04s
      🟩 Clang18            Pass: 100%/11  | Total:  4h 21m | Avg: 23m 46s | Max: 34m 34s
      🟩 GCC6               Pass: 100%/2   | Total: 36m 09s | Avg: 18m 04s | Max: 32m 16s
      🟩 GCC7               Pass: 100%/6   | Total:  2h 29m | Avg: 24m 58s | Max: 36m 10s
      🟩 GCC8               Pass: 100%/6   | Total:  2h 26m | Avg: 24m 22s | Max: 38m 06s
      🟩 GCC9               Pass: 100%/6   | Total:  2h 32m | Avg: 25m 28s | Max: 38m 37s
      🟩 GCC10              Pass: 100%/4   | Total:  2h 11m | Avg: 32m 48s | Max: 43m 46s
      🟩 GCC11              Pass: 100%/7   | Total:  3h 45m | Avg: 32m 15s | Max: 51m 26s
      🟩 GCC12              Pass: 100%/4   | Total:  2h 13m | Avg: 33m 23s | Max: 50m 22s
      🟩 GCC13              Pass: 100%/16  | Total:  5h 29m | Avg: 20m 34s | Max: 41m 59s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 51m | Avg: 37m 06s | Max:  1h 02m
      🟩 MSVC14.16          Pass: 100%/1   | Total: 18m 38s | Avg: 18m 38s | Max: 18m 38s | Hits:  99%/1852  
      🟩 MSVC14.29          Pass: 100%/2   | Total: 36m 18s | Avg: 18m 09s | Max: 20m 17s | Hits:  99%/3704  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 31m | Avg: 45m 52s | Max:  1h 08m | Hits:  78%/3704  
      🟩 NVHPC24.7          Pass: 100%/4   | Total:  3h 14m | Avg: 48m 39s | Max:  1h 02m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/48  | Total: 21h 16m | Avg: 26m 35s | Max: 46m 42s
      🟩 GCC                Pass: 100%/51  | Total: 21h 44m | Avg: 25m 35s | Max: 51m 26s
      🟩 Intel              Pass: 100%/3   | Total:  1h 51m | Avg: 37m 06s | Max:  1h 02m
      🟩 MSVC               Pass: 100%/5   | Total:  2h 26m | Avg: 29m 20s | Max:  1h 08m | Hits:  90%/9260  
      🟩 NVHPC              Pass: 100%/4   | Total:  3h 14m | Avg: 48m 39s | Max:  1h 02m
    🟩 gpu
      🟩 v100               Pass: 100%/111 | Total:  2d 02h | Avg: 27m 20s | Max:  1h 08m | Hits:  90%/9260  
    🟩 jobs
      🟩 Build              Pass: 100%/103 | Total:  2d 00h | Avg: 28m 16s | Max:  1h 08m | Hits:  88%/7408  
      🟩 TestCPU            Pass: 100%/4   | Total: 45m 24s | Avg: 11m 21s | Max: 22m 46s | Hits:  99%/1852  
      🟩 TestGPU            Pass: 100%/4   | Total:  1h 17m | Avg: 19m 19s | Max: 27m 54s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 41m | Avg: 33m 56s | Max: 51m 26s
      🟩 90a                Pass: 100%/4   | Total:  1h 11m | Avg: 17m 47s | Max: 24m 34s
    🟩 std
      🟩 11                 Pass: 100%/30  | Total:  2h 45m | Avg:  5m 30s | Max: 14m 33s
      🟩 14                 Pass: 100%/29  | Total: 16h 36m | Avg: 34m 22s | Max: 59m 12s | Hits:  99%/3704  
      🟩 17                 Pass: 100%/27  | Total: 16h 56m | Avg: 37m 38s | Max:  1h 02m | Hits:  99%/1852  
      🟩 20                 Pass: 100%/23  | Total: 13h 16m | Avg: 34m 37s | Max:  1h 08m | Hits:  78%/3704  
    
  • 🟩 cub: Pass: 100%/110 | Total: 3d 05h | Avg: 42m 23s | Max: 1h 18m | Hits: 75%/3028

    🟩 cpu
      🟩 amd64              Pass: 100%/102 | Total:  2d 23h | Avg: 42m 06s | Max:  1h 18m | Hits:  75%/3028  
      🟩 arm64              Pass: 100%/8   | Total:  6h 09m | Avg: 46m 11s | Max:  1h 01m
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  8h 14m | Avg: 32m 56s | Max: 52m 48s | Hits:  99%/757   
      🟩 11.8               Pass: 100%/3   | Total:  2h 45m | Avg: 55m 05s | Max:  1h 18m
      🟩 12.5               Pass: 100%/4   | Total:  3h 42m | Avg: 55m 39s | Max:  1h 14m
      🟩 12.6               Pass: 100%/88  | Total:  2d 15h | Avg: 42m 58s | Max:  1h 17m | Hits:  67%/2271  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/4   | Total:  3h 16m | Avg: 49m 14s | Max:  1h 05m
      🟩 nvcc11.1           Pass: 100%/15  | Total:  8h 14m | Avg: 32m 56s | Max: 52m 48s | Hits:  99%/757   
      🟩 nvcc11.8           Pass: 100%/3   | Total:  2h 45m | Avg: 55m 05s | Max:  1h 18m
      🟩 nvcc12.5           Pass: 100%/4   | Total:  3h 42m | Avg: 55m 39s | Max:  1h 14m
      🟩 nvcc12.6           Pass: 100%/84  | Total:  2d 11h | Avg: 42m 40s | Max:  1h 17m | Hits:  67%/2271  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/4   | Total:  3h 16m | Avg: 49m 14s | Max:  1h 05m
      🟩 nvcc               Pass: 100%/106 | Total:  3d 02h | Avg: 42m 08s | Max:  1h 18m | Hits:  75%/3028  
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  3h 47m | Avg: 37m 54s | Max: 55m 33s
      🟩 Clang10            Pass: 100%/3   | Total:  2h 06m | Avg: 42m 08s | Max: 58m 20s
      🟩 Clang11            Pass: 100%/4   | Total:  3h 04m | Avg: 46m 07s | Max:  1h 03m
      🟩 Clang12            Pass: 100%/4   | Total:  3h 12m | Avg: 48m 12s | Max:  1h 02m
      🟩 Clang13            Pass: 100%/4   | Total:  3h 05m | Avg: 46m 24s | Max:  1h 02m
      🟩 Clang14            Pass: 100%/4   | Total:  3h 03m | Avg: 45m 53s | Max: 58m 11s
      🟩 Clang15            Pass: 100%/4   | Total:  3h 06m | Avg: 46m 36s | Max:  1h 01m
      🟩 Clang16            Pass: 100%/4   | Total:  3h 02m | Avg: 45m 39s | Max:  1h 01m
      🟩 Clang17            Pass: 100%/4   | Total:  3h 24m | Avg: 51m 14s | Max:  1h 17m
      🟩 Clang18            Pass: 100%/11  | Total:  8h 46m | Avg: 47m 51s | Max:  1h 05m
      🟩 GCC6               Pass: 100%/2   | Total: 55m 01s | Avg: 27m 30s | Max: 50m 51s
      🟩 GCC7               Pass: 100%/6   | Total:  3h 49m | Avg: 38m 16s | Max: 58m 44s
      🟩 GCC8               Pass: 100%/6   | Total:  3h 54m | Avg: 39m 04s | Max:  1h 00m
      🟩 GCC9               Pass: 100%/6   | Total:  3h 58m | Avg: 39m 44s | Max:  1h 02m
      🟩 GCC10              Pass: 100%/4   | Total:  3h 11m | Avg: 47m 47s | Max:  1h 03m
      🟩 GCC11              Pass: 100%/7   | Total:  5h 52m | Avg: 50m 21s | Max:  1h 18m
      🟩 GCC12              Pass: 100%/4   | Total:  3h 11m | Avg: 47m 46s | Max:  1h 02m
      🟩 GCC13              Pass: 100%/16  | Total:  8h 24m | Avg: 31m 30s | Max:  1h 01m
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  2h 17m | Avg: 45m 48s | Max:  1h 05m
      🟩 MSVC14.16          Pass: 100%/1   | Total: 14m 25s | Avg: 14m 25s | Max: 14m 25s | Hits:  99%/757   
      🟩 MSVC14.29          Pass: 100%/2   | Total: 24m 21s | Avg: 12m 10s | Max: 12m 34s | Hits:  99%/1514  
      🟩 MSVC14.39          Pass: 100%/1   | Total:  1h 07m | Avg:  1h 07m | Max:  1h 07m | Hits:   3%/757   
      🟩 NVHPC24.7          Pass: 100%/4   | Total:  3h 42m | Avg: 55m 39s | Max:  1h 14m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/48  | Total:  1d 12h | Avg: 45m 51s | Max:  1h 17m
      🟩 GCC                Pass: 100%/51  | Total:  1d 09h | Avg: 39m 08s | Max:  1h 18m
      🟩 Intel              Pass: 100%/3   | Total:  2h 17m | Avg: 45m 48s | Max:  1h 05m
      🟩 MSVC               Pass: 100%/4   | Total:  1h 46m | Avg: 26m 37s | Max:  1h 07m | Hits:  75%/3028  
      🟩 NVHPC              Pass: 100%/4   | Total:  3h 42m | Avg: 55m 39s | Max:  1h 14m
    🟩 gpu
      🟩 v100               Pass: 100%/110 | Total:  3d 05h | Avg: 42m 23s | Max:  1h 18m | Hits:  75%/3028  
    🟩 jobs
      🟩 Build              Pass: 100%/102 | Total:  3d 01h | Avg: 43m 25s | Max:  1h 18m | Hits:  75%/3028  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 17m 59s | Avg: 17m 59s | Max: 17m 59s
      🟩 GraphCapture       Pass: 100%/1   | Total: 20m 41s | Avg: 20m 41s | Max: 20m 41s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 14m | Avg: 24m 40s | Max: 36m 22s
      🟩 TestGPU            Pass: 100%/3   | Total:  2h 02m | Avg: 40m 52s | Max: 50m 51s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  2h 45m | Avg: 55m 05s | Max:  1h 18m
      🟩 90a                Pass: 100%/4   | Total:  1h 31m | Avg: 22m 55s | Max: 29m 43s
    🟩 std
      🟩 11                 Pass: 100%/30  | Total:  4h 34m | Avg:  9m 09s | Max: 24m 54s
      🟩 14                 Pass: 100%/29  | Total:  1d 02h | Avg: 55m 01s | Max:  1h 17m | Hits:  99%/1514  
      🟩 17                 Pass: 100%/27  | Total:  1d 01h | Avg: 56m 11s | Max:  1h 18m | Hits:  99%/757   
      🟩 20                 Pass: 100%/24  | Total: 21h 15m | Avg: 53m 09s | Max:  1h 14m | Hits:   3%/757   
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 14s | Avg: 4m 37s | Max: 6m 45s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  9m 14s | Avg:  4m 37s | Max:  6m 45s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  9m 14s | Avg:  4m 37s | Max:  6m 45s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  9m 14s | Avg:  4m 37s | Max:  6m 45s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  9m 14s | Avg:  4m 37s | Max:  6m 45s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  9m 14s | Avg:  4m 37s | Max:  6m 45s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  9m 14s | Avg:  4m 37s | Max:  6m 45s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total:  9m 14s | Avg:  4m 37s | Max:  6m 45s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 29s | Avg:  2m 29s | Max:  2m 29s
      🟩 Test               Pass: 100%/1   | Total:  6m 45s | Avg:  6m 45s | Max:  6m 45s
    
  • 🟩 python: Pass: 100%/1 | Total: 15m 42s | Avg: 15m 42s | Max: 15m 42s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 15m 42s | Avg: 15m 42s | Max: 15m 42s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 15m 42s | Avg: 15m 42s | Max: 15m 42s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 15m 42s | Avg: 15m 42s | Max: 15m 42s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 15m 42s | Avg: 15m 42s | Max: 15m 42s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 15m 42s | Avg: 15m 42s | Max: 15m 42s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 15m 42s | Avg: 15m 42s | Max: 15m 42s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 15m 42s | Avg: 15m 42s | Max: 15m 42s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 15m 42s | Avg: 15m 42s | Max: 15m 42s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 224)

# Runner
185 linux-amd64-cpu16
16 linux-arm64-cpu16
14 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16

@bernhardmgruber bernhardmgruber merged commit 90120a4 into NVIDIA:main Nov 26, 2024
240 checks passed
@bernhardmgruber bernhardmgruber deleted the more_transform_bench branch November 26, 2024 08:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants