NEVER MERGE! Comparison: `rocm-dev` vs `branch-24.06` #1

domcharrier · 2024-04-18T11:31:09Z

WARNING: This PR exists only to compare the hipified rocm-dev branch vs. rapidsai/dask_cuda branch branch-24.06. Do not merge it!

* Add ucx-py dependency to CI

To eliminate hard-coding, generalize the GHA workflow logic to select one build for testing. This should simplify future Dask-CUDA updates. xref: rapidsai/build-planning#25 Authors: - https://github.com/jakirkham Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - Ray Douglass (https://github.com/raydouglass) URL: rapidsai#1318

NumPy 2 is expected to be released in the near future. For the RAPIDS 24.04 release, we will pin to `numpy>=1.23,<2.0a0`. This PR adds an upper bound to affected RAPIDS repositories. xref: rapidsai/build-planning#29 Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) - Ray Douglass (https://github.com/raydouglass) URL: rapidsai#1320

domcharrier · 2024-04-18T11:39:53Z

dask_cuda/utils.py

@@ -10,7 +32,11 @@
 from typing import Optional

 import numpy as np
-import pynvml
+from dask_cuda import DASK_USE_ROCM


@younseojava IMO You could simplify

from dask_cuda import DASK_USE_ROCM if DASK_USE_ROCM: from pyrsmi import rocml as pynvml else: import pynvml

to

from pyrsmi import rocml as pynvml

as users with CUDA device will likely use the rapidsai/dask_cuda package and the NVIDIA rapidsai org will likely not accept a version that supports both backends.

domcharrier · 2024-04-18T11:45:36Z

dask_cuda/__init__.py

+import os
+
+
+def is_amd_gpu_available():


@younseojava IMO you could assume that an AMD GPU is available when someone uses this code.

domcharrier · 2024-04-18T11:45:44Z

dask_cuda/__init__.py

+        return False
+
+
+DASK_USE_ROCM = is_amd_gpu_available()


@younseojava IMO you could assume that an AMD GPU is available when someone uses this code.

domcharrier · 2024-04-18T11:45:52Z

dask_cuda/__init__.py

+
+
+DASK_USE_ROCM = is_amd_gpu_available()
+print("ROCM device found") if DASK_USE_ROCM else print("ROCM device not found")


@younseojava IMO you could assume that an AMD GPU is available when someone uses this code.

domcharrier · 2024-04-18T11:59:28Z

dask_cuda/initialize.py

 import logging
 import os

 import click
-import numba.cuda
+from dask_cuda import DASK_USE_ROCM


@younseojava IMO you could assume that an AMD GPU is available when someone uses this code.
So

if DASK_USE_ROCM: from hip import hip as hiprt else: import numba.cuda

could be simplified to:

from hip import hip as hiprt

domcharrier · 2024-04-18T11:59:55Z

dask_cuda/initialize.py

-            numba.cuda.current_context()
-        except numba.cuda.cudadrv.error.CudaSupportError:
-            pass
+    if DASK_USE_ROCM:


@younseojava IMO you could assume that an AMD GPU is available when someone uses this code.

domcharrier · 2024-04-18T12:00:35Z

dask_cuda/initialize.py

    else:
-        numba.cuda.current_context()
+        if int(os.environ.get("DASK_CUDA_TEST_SINGLE_GPU", "0")) != 0:


@younseojava Why is this check not relevant for the AMD GPU version?

Maybe I don't fully understand the purpose of the env variable. Also, with hip port of numba, how can this function be changed? can it be further simplified?

domcharrier · 2024-04-18T12:01:18Z

dask_cuda/local_cuda_cluster.py

@@ -399,7 +422,7 @@ def new_worker_spec(self):
                "plugins": {
                    CPUAffinity(
                        get_cpu_affinity(nvml_device_index(0, visible_devices))
-                    ),
+                    ) if not DASK_USE_ROCM else None,


@younseojava IMO you could assume that an AMD GPU is available when someone uses this code.

So:

) if False else None,

domcharrier · 2024-04-18T12:02:16Z

dask_cuda/tests/test_explicit_comms.py

@@ -4,6 +4,9 @@
 import signal
 import time
 from functools import partial
+import signal


@younseojava These imports are a repetition of the previous 3.

domcharrier · 2024-04-18T12:03:55Z

dask_cuda/utils.py

@@ -1,3 +1,25 @@
+# Apache License
+#
+# Copyright (c) 2023 Advanced Micro Devices, Inc.


@younseojava

Should be Modifications Copyright (c) 2024 ...

Prepend the NVIDIA copyright.

domcharrier · 2024-04-18T12:04:04Z

dask_cuda/local_cuda_cluster.py

@@ -1,3 +1,25 @@
+# Apache License
+#
+# Copyright (c) 2023 Advanced Micro Devices, Inc.


@younseojava

Should be Modifications Copyright (c) 2024 ...

Prepend the NVIDIA copyright.

domcharrier · 2024-04-18T12:04:15Z

dask_cuda/initialize.py

@@ -1,8 +1,34 @@
+# Apache License
+#
+# Copyright (c) 2023 Advanced Micro Devices, Inc.


@younseojava

Should be Modifications Copyright (c) 2024 ...

Prepend the NVIDIA copyright.

domcharrier · 2024-04-18T12:04:29Z

dask_cuda/__init__.py

@@ -1,3 +1,25 @@
+# Apache License
+#
+# Copyright (c) 2023 Advanced Micro Devices, Inc.


@younseojava

Should be Modifications Copyright (c) 2024 ...

Prepend the NVIDIA copyright.

domcharrier · 2024-04-18T12:06:43Z

ci/gpu/build.sh

@@ -0,0 +1,106 @@
+#!/bin/bash
+# Copyright (c) 2018, NVIDIA CORPORATION.


@younseojava

Modifications Copyright (c) 2024 Advanced Micro Devices, Inc. missing.

AMD License missing (if different). Ideally include the original dask_cuda license below the NVIDIA copyright note.

domcharrier · 2024-04-18T12:07:06Z

build_rocm.sh

@@ -0,0 +1,33 @@
+# Apache License
+#
+# Copyright (c) 2023 Advanced Micro Devices, Inc.


@younseojava 2024

ci/build_docs.sh

younseojava and others added 30 commits January 25, 2024 18:43

initial update for rocm

875c474

fix rocm check logic

83d65e2

Add licenses for scan

8da7bee

correct license date

b90c9ce

disable cpuAffinity check for rocm

caaebb5

disable CPUAffinity plugin for ROCM

d912002

add AMD license for modified file

51d9b50

REL v0.10.0 release

606cd40

Add ucx-py dependency to CI (rapidsai#212)

4dc2117

* Add ucx-py dependency to CI

REL v0.11.0 release

5ffc405

REL v0.12.0 release

4071cde

REL v0.13.0 release

7f08e60

REL v0.14.0 release

44e898c

REL v0.14.1 release

478b410

REL v0.15.0 release

6b559b3

REL v0.16.0 release

ace3638

REL v0.17.0 release

62a8a77

REL v0.18.0 release

6ea78f4

REL v0.19.0 release

2ad55e6

REL v21.06.00 release

a65d9ef

REL v21.08.00 release

b2f67c2

REL v21.10.00 release

195dc5f

REL v21.12.00 release

59633ec

REL v22.02.00 release

4ef4b16

REL v22.04.00 release

3f60f46

REL v22.06.00 release

9301b90

REL v22.08.00 release

5afe037

update changelog

dcc61eb

REL v22.10.00 release

aa9f676

REL v22.12.00 release

3ca93a5

jakirkham and others added 11 commits April 15, 2024 22:21

Use conda env create --yes instead of --force. (rapidsai#1326)

1b40bc1

REL v24.04.00 release

f29f21e

Add licenses for scan

2c25188

add rocm version of distributed

b3323c3

update packages: pyrsmi, distributed; disable rapids-dask-dependency

4f1b0ec

move rocm device check from distributed to dask_cuda

b946557

add build script

002780f

merge upstream branch-24.6

7fe7624

fix file merge error

ddafa37

domcharrier self-assigned this Apr 18, 2024

domcharrier marked this pull request as draft April 18, 2024 11:31

domcharrier commented Apr 18, 2024

View reviewed changes

build_rocm.sh Outdated

@@ -0,0 +1,33 @@

# Apache License

#

# Copyright (c) 2023 Advanced Micro Devices, Inc.

Copy link

Author

domcharrier Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@younseojava 2024

domcharrier commented Apr 18, 2024

View reviewed changes

ci/build_docs.sh Outdated Show resolved Hide resolved

fix merge errors and license clauses

ce4f393

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NEVER MERGE! Comparison: `rocm-dev` vs `branch-24.06` #1

NEVER MERGE! Comparison: `rocm-dev` vs `branch-24.06` #1

domcharrier commented Apr 18, 2024

domcharrier Apr 18, 2024 •

edited

Loading

domcharrier Apr 18, 2024

domcharrier Apr 18, 2024

domcharrier Apr 18, 2024

domcharrier Apr 18, 2024

domcharrier Apr 18, 2024

domcharrier Apr 18, 2024

younseojava Apr 18, 2024

domcharrier Apr 18, 2024

domcharrier Apr 18, 2024 •

edited

Loading

domcharrier Apr 18, 2024 •

edited

Loading

domcharrier Apr 18, 2024 •

edited

Loading

domcharrier Apr 18, 2024 •

edited

Loading

domcharrier Apr 18, 2024 •

edited

Loading

domcharrier Apr 18, 2024 •

edited

Loading

domcharrier Apr 18, 2024



		DASK_USE_ROCM = is_amd_gpu_available()
		print("ROCM device found") if DASK_USE_ROCM else print("ROCM device not found")

		@@ -0,0 +1,106 @@
		#!/bin/bash
		# Copyright (c) 2018, NVIDIA CORPORATION.

**NEVER MERGE!** Comparison: rocm-dev vs branch-24.06 #1

Are you sure you want to change the base?

**NEVER MERGE!** Comparison: rocm-dev vs branch-24.06 #1

Conversation

domcharrier commented Apr 18, 2024

domcharrier Apr 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

domcharrier Apr 18, 2024 • edited Loading

Choose a reason for hiding this comment

domcharrier Apr 18, 2024 • edited Loading

Choose a reason for hiding this comment

domcharrier Apr 18, 2024 • edited Loading

Choose a reason for hiding this comment

domcharrier Apr 18, 2024 • edited Loading

Choose a reason for hiding this comment

domcharrier Apr 18, 2024 • edited Loading

Choose a reason for hiding this comment

domcharrier Apr 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NEVER MERGE! Comparison: `rocm-dev` vs `branch-24.06` #1

NEVER MERGE! Comparison: `rocm-dev` vs `branch-24.06` #1

domcharrier Apr 18, 2024 •

edited

Loading

domcharrier Apr 18, 2024 •

edited

Loading

domcharrier Apr 18, 2024 •

edited

Loading

domcharrier Apr 18, 2024 •

edited

Loading

domcharrier Apr 18, 2024 •

edited

Loading

domcharrier Apr 18, 2024 •

edited

Loading

domcharrier Apr 18, 2024 •

edited

Loading