Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

**NEVER MERGE!** Comparison: rocm-dev vs branch-24.06 #1

Draft
wants to merge 79 commits into
base: branch-24.06
Choose a base branch
from

Conversation

domcharrier
Copy link


WARNING: This PR exists only to compare the hipified rocm-dev branch vs. rapidsai/dask_cuda branch branch-24.06. Do not merge it!


jakirkham and others added 11 commits April 15, 2024 22:21
To eliminate hard-coding, generalize the GHA workflow logic to select one build for testing. This should simplify future Dask-CUDA updates.

xref: rapidsai/build-planning#25

Authors:
  - https://github.com/jakirkham

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Ray Douglass (https://github.com/raydouglass)

URL: rapidsai#1318
NumPy 2 is expected to be released in the near future. For the RAPIDS 24.04 release, we will pin to `numpy>=1.23,<2.0a0`. This PR adds an upper bound to affected RAPIDS repositories.

xref: rapidsai/build-planning#29

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Peter Andreas Entschev (https://github.com/pentschev)
  - Ray Douglass (https://github.com/raydouglass)

URL: rapidsai#1320
@domcharrier domcharrier self-assigned this Apr 18, 2024
@domcharrier domcharrier marked this pull request as draft April 18, 2024 11:31
@@ -10,7 +32,11 @@
from typing import Optional

import numpy as np
import pynvml
from dask_cuda import DASK_USE_ROCM
Copy link
Author

@domcharrier domcharrier Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@younseojava IMO You could simplify

from dask_cuda import DASK_USE_ROCM
if DASK_USE_ROCM:
    from pyrsmi import rocml as pynvml
else:
    import pynvml

to

from pyrsmi import rocml as pynvml

as users with CUDA device will likely use the rapidsai/dask_cuda package and the NVIDIA rapidsai org will likely not accept a version that supports both backends.

import os


def is_amd_gpu_available():
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@younseojava IMO you could assume that an AMD GPU is available when someone uses this code.

return False


DASK_USE_ROCM = is_amd_gpu_available()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@younseojava IMO you could assume that an AMD GPU is available when someone uses this code.



DASK_USE_ROCM = is_amd_gpu_available()
print("ROCM device found") if DASK_USE_ROCM else print("ROCM device not found")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@younseojava IMO you could assume that an AMD GPU is available when someone uses this code.

import logging
import os

import click
import numba.cuda
from dask_cuda import DASK_USE_ROCM
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@younseojava IMO you could assume that an AMD GPU is available when someone uses this code.
So

if DASK_USE_ROCM:
    from hip import hip as hiprt
else:
    import numba.cuda

could be simplified to:

from hip import hip as hiprt

numba.cuda.current_context()
except numba.cuda.cudadrv.error.CudaSupportError:
pass
if DASK_USE_ROCM:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@younseojava IMO you could assume that an AMD GPU is available when someone uses this code.

else:
numba.cuda.current_context()
if int(os.environ.get("DASK_CUDA_TEST_SINGLE_GPU", "0")) != 0:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@younseojava Why is this check not relevant for the AMD GPU version?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I don't fully understand the purpose of the env variable. Also, with hip port of numba, how can this function be changed? can it be further simplified?

@@ -399,7 +422,7 @@ def new_worker_spec(self):
"plugins": {
CPUAffinity(
get_cpu_affinity(nvml_device_index(0, visible_devices))
),
) if not DASK_USE_ROCM else None,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@younseojava IMO you could assume that an AMD GPU is available when someone uses this code.

So:

) if False else None,

@@ -4,6 +4,9 @@
import signal
import time
from functools import partial
import signal
Copy link
Author

@domcharrier domcharrier Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@younseojava These imports are a repetition of the previous 3.

@@ -1,3 +1,25 @@
# Apache License
#
# Copyright (c) 2023 Advanced Micro Devices, Inc.
Copy link
Author

@domcharrier domcharrier Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@younseojava

  • Should be Modifications Copyright (c) 2024 ...
  • Prepend the NVIDIA copyright.

@@ -1,3 +1,25 @@
# Apache License
#
# Copyright (c) 2023 Advanced Micro Devices, Inc.
Copy link
Author

@domcharrier domcharrier Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@younseojava

  • Should be Modifications Copyright (c) 2024 ...
  • Prepend the NVIDIA copyright.

@@ -1,8 +1,34 @@
# Apache License
#
# Copyright (c) 2023 Advanced Micro Devices, Inc.
Copy link
Author

@domcharrier domcharrier Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@younseojava

  • Should be Modifications Copyright (c) 2024 ...
  • Prepend the NVIDIA copyright.

@@ -1,3 +1,25 @@
# Apache License
#
# Copyright (c) 2023 Advanced Micro Devices, Inc.
Copy link
Author

@domcharrier domcharrier Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@younseojava

  • Should be Modifications Copyright (c) 2024 ...
  • Prepend the NVIDIA copyright.

ci/gpu/build.sh Outdated
@@ -0,0 +1,106 @@
#!/bin/bash
# Copyright (c) 2018, NVIDIA CORPORATION.
Copy link
Author

@domcharrier domcharrier Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@younseojava

  • Modifications Copyright (c) 2024 Advanced Micro Devices, Inc. missing.
  • AMD License missing (if different). Ideally include the original dask_cuda license below the NVIDIA copyright note.

build_rocm.sh Outdated
@@ -0,0 +1,33 @@
# Apache License
#
# Copyright (c) 2023 Advanced Micro Devices, Inc.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ci/build_docs.sh Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.