Merge pull request #888 from rapidsai/branch-22.04

[RELEASE] dask-cuda v22.04
rapidsai · Apr 6, 2022 · 29a8e29 · 29a8e29
2 parents a666e9b + 61322f3
commit 29a8e29
Show file tree

Hide file tree

Showing 34 changed files with 809 additions and 875 deletions.
diff --git a/.github/ops-bot.yaml b/.github/ops-bot.yaml
@@ -0,0 +1,8 @@
+# This file controls which features from the `ops-bot` repository below are enabled.
+# - https://github.com/rapidsai/ops-bot
+
+auto_merger: true
+branch_checker: true
+label_checker: true
+release_drafter: true
+external_contributors: false
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,35 +1,72 @@
+# dask-cuda 22.04.00 (6 Apr 2022)
+
+## 🚨 Breaking Changes
+
+- Introduce incompatible-types and enables spilling of CuPy arrays ([#856](https://github.com/rapidsai/dask-cuda/pull/856)) [@madsbk](https://github.com/madsbk)
+
+## 🐛 Bug Fixes
+
+- Resolve build issues / consistency with conda-forge packages ([#883](https://github.com/rapidsai/dask-cuda/pull/883)) [@charlesbluca](https://github.com/charlesbluca)
+- Increase test_worker_force_spill_to_disk timeout ([#857](https://github.com/rapidsai/dask-cuda/pull/857)) [@pentschev](https://github.com/pentschev)
+
+## 📖 Documentation
+
+- Remove description from non-existing `--nprocs` CLI argument ([#852](https://github.com/rapidsai/dask-cuda/pull/852)) [@pentschev](https://github.com/pentschev)
+
+## 🚀 New Features
+
+- Add --pre-import/pre_import argument ([#854](https://github.com/rapidsai/dask-cuda/pull/854)) [@pentschev](https://github.com/pentschev)
+- Remove support for UCX &lt; 1.11.1 ([#830](https://github.com/rapidsai/dask-cuda/pull/830)) [@pentschev](https://github.com/pentschev)
+
+## 🛠️ Improvements
+
+- Raise `ImportError` when platform is not Linux ([#885](https://github.com/rapidsai/dask-cuda/pull/885)) [@pentschev](https://github.com/pentschev)
+- Temporarily disable new `ops-bot` functionality ([#880](https://github.com/rapidsai/dask-cuda/pull/880)) [@ajschmidt8](https://github.com/ajschmidt8)
+- Pin `dask` &amp; `distributed` ([#878](https://github.com/rapidsai/dask-cuda/pull/878)) [@galipremsagar](https://github.com/galipremsagar)
+- Upgrade min `dask` &amp; `distributed` versions ([#872](https://github.com/rapidsai/dask-cuda/pull/872)) [@galipremsagar](https://github.com/galipremsagar)
+- Add `.github/ops-bot.yaml` config file ([#871](https://github.com/rapidsai/dask-cuda/pull/871)) [@ajschmidt8](https://github.com/ajschmidt8)
+- Make Dask CUDA work with the new WorkerMemoryManager abstraction ([#870](https://github.com/rapidsai/dask-cuda/pull/870)) [@shwina](https://github.com/shwina)
+- Implement ProxifyHostFile.evict() ([#862](https://github.com/rapidsai/dask-cuda/pull/862)) [@madsbk](https://github.com/madsbk)
+- Introduce incompatible-types and enables spilling of CuPy arrays ([#856](https://github.com/rapidsai/dask-cuda/pull/856)) [@madsbk](https://github.com/madsbk)
+- Spill to disk clean up ([#853](https://github.com/rapidsai/dask-cuda/pull/853)) [@madsbk](https://github.com/madsbk)
+- ProxyObject to support matrix multiplication ([#849](https://github.com/rapidsai/dask-cuda/pull/849)) [@madsbk](https://github.com/madsbk)
+- Unpin max dask and distributed ([#847](https://github.com/rapidsai/dask-cuda/pull/847)) [@galipremsagar](https://github.com/galipremsagar)
+- test_gds: skip if GDS is not available ([#845](https://github.com/rapidsai/dask-cuda/pull/845)) [@madsbk](https://github.com/madsbk)
+- ProxyObject implement __array_function__ ([#843](https://github.com/rapidsai/dask-cuda/pull/843)) [@madsbk](https://github.com/madsbk)
+- Add option to track RMM allocations ([#842](https://github.com/rapidsai/dask-cuda/pull/842)) [@shwina](https://github.com/shwina)
+
 # dask-cuda 22.02.00 (2 Feb 2022)
 
 ## 🐛 Bug Fixes
 
-- Ignoe `DepecationWaning` fom `distutils.Vesion` classes ([#823](https://github.com/rapidsai/dask-cuda/pull/823)) [@pentschev](https://github.com/pentschev)
-- Handle explicitly disabled UCX tanspots ([#820](https://github.com/rapidsai/dask-cuda/pull/820)) [@pentschev](https://github.com/pentschev)
-- Fix egex patten to match to in test_on_demand_debug_info ([#819](https://github.com/rapidsai/dask-cuda/pull/819)) [@pentschev](https://github.com/pentschev)
+- Ignore `DeprecationWarning` from `distutils.Version` classes ([#823](https://github.com/rapidsai/dask-cuda/pull/823)) [@pentschev](https://github.com/pentschev)
+- Handle explicitly disabled UCX transports ([#820](https://github.com/rapidsai/dask-cuda/pull/820)) [@pentschev](https://github.com/pentschev)
+- Fix regex pattern to match to in test_on_demand_debug_info ([#819](https://github.com/rapidsai/dask-cuda/pull/819)) [@pentschev](https://github.com/pentschev)
 - Fix skipping GDS test if cucim is not installed ([#813](https://github.com/rapidsai/dask-cuda/pull/813)) [@pentschev](https://github.com/pentschev)
-- Unpin Dask and Distibuted vesions ([#810](https://github.com/rapidsai/dask-cuda/pull/810)) [@pentschev](https://github.com/pentschev)
+- Unpin Dask and Distributed versions ([#810](https://github.com/rapidsai/dask-cuda/pull/810)) [@pentschev](https://github.com/pentschev)
 - Update to UCX-Py 0.24 ([#805](https://github.com/rapidsai/dask-cuda/pull/805)) [@pentschev](https://github.com/pentschev)
 
 ## 📖 Documentation
 
-- Fix Dask-CUDA vesion to 22.02 ([#835](https://github.com/rapidsai/dask-cuda/pull/835)) [@jakikham](https://github.com/jakikham)
-- Mege banch-21.12 into banch-22.02 ([#829](https://github.com/rapidsai/dask-cuda/pull/829)) [@pentschev](https://github.com/pentschev)
-- Claify `LocalCUDACluste`&#39;s `n_wokes` docstings ([#812](https://github.com/rapidsai/dask-cuda/pull/812)) [@pentschev](https://github.com/pentschev)
+- Fix Dask-CUDA version to 22.02 ([#835](https://github.com/rapidsai/dask-cuda/pull/835)) [@jakirkham](https://github.com/jakirkham)
+- Merge branch-21.12 into branch-22.02 ([#829](https://github.com/rapidsai/dask-cuda/pull/829)) [@pentschev](https://github.com/pentschev)
+- Clarify `LocalCUDACluster`&#39;s `n_workers` docstrings ([#812](https://github.com/rapidsai/dask-cuda/pull/812)) [@pentschev](https://github.com/pentschev)
 
-## 🚀 New Featues
+## 🚀 New Features
 
-- Pin `dask` &amp; `distibuted` vesions ([#832](https://github.com/rapidsai/dask-cuda/pull/832)) [@galipemsaga](https://github.com/galipemsaga)
-- Expose mm-maximum_pool_size agument ([#827](https://github.com/rapidsai/dask-cuda/pull/827)) [@VibhuJawa](https://github.com/VibhuJawa)
-- Simplify UCX configs, pemitting UCX_TLS=all ([#792](https://github.com/rapidsai/dask-cuda/pull/792)) [@pentschev](https://github.com/pentschev)
+- Pin `dask` &amp; `distributed` versions ([#832](https://github.com/rapidsai/dask-cuda/pull/832)) [@galipremsagar](https://github.com/galipremsagar)
+- Expose rmm-maximum_pool_size argument ([#827](https://github.com/rapidsai/dask-cuda/pull/827)) [@VibhuJawa](https://github.com/VibhuJawa)
+- Simplify UCX configs, permitting UCX_TLS=all ([#792](https://github.com/rapidsai/dask-cuda/pull/792)) [@pentschev](https://github.com/pentschev)
 
-## 🛠️ Impovements
+## 🛠️ Improvements
 
-- Add avg and std calculation fo time and thoughput ([#828](https://github.com/rapidsai/dask-cuda/pull/828)) [@quasiben](https://github.com/quasiben)
-- sizeof test: incease toleance ([#825](https://github.com/rapidsai/dask-cuda/pull/825)) [@madsbk](https://github.com/madsbk)
-- Quey UCX-Py fom gpuCI vesioning sevice ([#818](https://github.com/rapidsai/dask-cuda/pull/818)) [@pentschev](https://github.com/pentschev)
-- Standadize Distibuted config sepaato in get_ucx_config ([#806](https://github.com/rapidsai/dask-cuda/pull/806)) [@pentschev](https://github.com/pentschev)
-- Fixed `PoxyObject.__del__` to use the new Disk IO API fom #791 ([#802](https://github.com/rapidsai/dask-cuda/pull/802)) [@madsbk](https://github.com/madsbk)
-- GPUDiect Stoage (GDS) suppot fo spilling ([#793](https://github.com/rapidsai/dask-cuda/pull/793)) [@madsbk](https://github.com/madsbk)
-- Disk IO inteface ([#791](https://github.com/rapidsai/dask-cuda/pull/791)) [@madsbk](https://github.com/madsbk)
+- Add avg and std calculation for time and throughput ([#828](https://github.com/rapidsai/dask-cuda/pull/828)) [@quasiben](https://github.com/quasiben)
+- sizeof test: increase tolerance ([#825](https://github.com/rapidsai/dask-cuda/pull/825)) [@madsbk](https://github.com/madsbk)
+- Query UCX-Py from gpuCI versioning service ([#818](https://github.com/rapidsai/dask-cuda/pull/818)) [@pentschev](https://github.com/pentschev)
+- Standardize Distributed config separator in get_ucx_config ([#806](https://github.com/rapidsai/dask-cuda/pull/806)) [@pentschev](https://github.com/pentschev)
+- Fixed `ProxyObject.__del__` to use the new Disk IO API from #791 ([#802](https://github.com/rapidsai/dask-cuda/pull/802)) [@madsbk](https://github.com/madsbk)
+- GPUDirect Storage (GDS) support for spilling ([#793](https://github.com/rapidsai/dask-cuda/pull/793)) [@madsbk](https://github.com/madsbk)
+- Disk IO interface ([#791](https://github.com/rapidsai/dask-cuda/pull/791)) [@madsbk](https://github.com/madsbk)
 
 # dask-cuda 21.12.00 (9 Dec 2021)
 

diff --git a/conda/recipes/dask-cuda/meta.yaml b/conda/recipes/dask-cuda/meta.yaml
@@ -27,11 +27,12 @@ requirements:
     - setuptools
   run:
     - python
-    - dask>=2021.11.1,<=2022.01.0
-    - distributed>=2021.11.1,<=2022.01.0
-    - pynvml >=8.0.3
-    - numpy >=1.16.0
-    - numba >=0.53.1
+    - dask==2022.03.0
+    - distributed==2022.03.0
+    - pynvml>=11.0.0
+    - numpy>=1.16.0
+    - numba>=0.53.1
+    - click==8.0.4
 
 test:
   imports:

diff --git a/dask_cuda/__init__.py b/dask_cuda/__init__.py
@@ -1,3 +1,9 @@
+import sys
+
+if sys.platform != "linux":
+    raise ImportError("Only Linux is supported by Dask-CUDA at this time")
+
+
 import dask
 import dask.dataframe.core
 import dask.dataframe.shuffle

diff --git a/dask_cuda/benchmarks/utils.py b/dask_cuda/benchmarks/utils.py
@@ -116,13 +116,6 @@ def parse_benchmark_args(description="Generic dask-cuda Benchmark", args_list=[]
         dest="enable_rdmacm",
         help="Disable RDMACM with UCX.",
     )
-    parser.add_argument(
-        "--ucx-net-devices",
-        default=None,
-        type=str,
-        help="The device to be used for UCX communication, or 'auto'. "
-        "Ignored if protocol is 'tcp'",
-    )
     parser.add_argument(
         "--interface",
         default=None,
@@ -195,7 +188,6 @@ def parse_benchmark_args(description="Generic dask-cuda Benchmark", args_list=[]
 
 def get_cluster_options(args):
     ucx_options = {
-        "ucx_net_devices": args.ucx_net_devices,
         "enable_tcp_over_ucx": args.enable_tcp_over_ucx,
         "enable_infiniband": args.enable_infiniband,
         "enable_nvlink": args.enable_nvlink,

diff --git a/dask_cuda/cli/dask_cuda_worker.py b/dask_cuda/cli/dask_cuda_worker.py
@@ -43,9 +43,7 @@
     type=str,
     default=None,
     help="""A unique name for the worker. Can be a string (like ``"worker-1"``) or
-    ``None`` for a nameless worker. If used with ``--nprocs``, then the process number
-    will be appended to the worker name, e.g. ``"worker-1-0"``, ``"worker-1-1"``,
-    ``"worker-1-2"``.""",
+    ``None`` for a nameless worker.""",
 )
 @click.option(
     "--memory-limit",
@@ -119,16 +117,22 @@
         Logging will only be enabled if ``--rmm-pool-size`` or ``--rmm-managed-memory``
         are specified.""",
 )
+@click.option(
+    "--rmm-track-allocations/--no-rmm-track-allocations",
+    default=False,
+    show_default=True,
+    help="""Track memory allocations made by RMM. If ``True``, wraps the memory
+    resource of each worker with a ``rmm.mr.TrackingResourceAdaptor`` that
+    allows querying the amount of memory allocated by RMM.""",
+)
 @click.option(
     "--pid-file", type=str, default="", help="File to write the process PID.",
 )
 @click.option(
     "--resources",
     type=str,
     default="",
-    help="""Resources for task constraints like ``"GPU=2 MEM=10e9"``. Resources are
-    applied separately to each worker process (only relevant when starting multiple
-    worker processes with ``--nprocs``).""",
+    help="""Resources for task constraints like ``"GPU=2 MEM=10e9"``.""",
 )
 @click.option(
     "--dashboard/--no-dashboard",
@@ -248,21 +252,6 @@
     help="""Set environment variables to enable UCX RDMA connection manager support,
     requires ``--enable-infiniband``.""",
 )
-@click.option(
-    "--net-devices",
-    type=str,
-    default=None,
-    help="""Interface(s) used by workers for UCX communication. Can be a string (like
-    ``"eth0"`` for NVLink or ``"mlx5_0:1"``/``"ib0"`` for InfiniBand), ``"auto"``
-    (requires ``--enable-infiniband``) to pick the optimal interface per-worker based on
-    the system's topology, or ``None`` to stay with the default value of ``"all"`` (use
-    all available interfaces).
-
-    .. warning::
-        ``"auto"`` requires UCX-Py to be installed and compiled with hwloc support.
-        Unexpected errors can occur when using ``"auto"`` if any interfaces are
-        disconnected or improperly configured.""",
-)
 @click.option(
     "--enable-jit-unspill/--disable-jit-unspill",
     default=None,
@@ -281,6 +270,13 @@
     help="""Use a different class than Distributed's default (``distributed.Worker``)
     to spawn ``distributed.Nanny``.""",
 )
+@click.option(
+    "--pre-import",
+    default=None,
+    help="""Pre-import libraries as a Worker plugin to prevent long import times
+    bleeding through later Dask operations. Should be a list of comma-separated names,
+    such as "cudf,rmm".""",
+)
 def main(
     scheduler,
     host,
@@ -293,6 +289,7 @@ def main(
     rmm_managed_memory,
     rmm_async,
     rmm_log_directory,
+    rmm_track_allocations,
     pid_file,
     resources,
     dashboard,
@@ -310,9 +307,9 @@ def main(
     enable_infiniband,
     enable_nvlink,
     enable_rdmacm,
-    net_devices,
     enable_jit_unspill,
     worker_class,
+    pre_import,
     **kwargs,
 ):
     if tls_ca_file and tls_cert and tls_key:
@@ -344,6 +341,7 @@ def main(
         rmm_managed_memory,
         rmm_async,
         rmm_log_directory,
+        rmm_track_allocations,
         pid_file,
         resources,
         dashboard,
@@ -359,9 +357,9 @@ def main(
         enable_infiniband,
         enable_nvlink,
         enable_rdmacm,
-        net_devices,
         enable_jit_unspill,
         worker_class,
+        pre_import,
         **kwargs,
     )
 

diff --git a/dask_cuda/cuda_worker.py b/dask_cuda/cuda_worker.py
@@ -17,37 +17,24 @@
     enable_proctitle_on_children,
     enable_proctitle_on_current,
 )
-from distributed.worker import parse_memory_limit
+from distributed.worker_memory import parse_memory_limit
 
 from .device_host_file import DeviceHostFile
 from .initialize import initialize
 from .proxify_host_file import ProxifyHostFile
 from .utils import (
     CPUAffinity,
+    PreImport,
     RMMSetup,
-    _ucx_111,
     cuda_visible_devices,
     get_cpu_affinity,
     get_n_gpus,
     get_ucx_config,
-    get_ucx_net_devices,
     nvml_device_index,
     parse_device_memory_limit,
 )
 
 
-def _get_interface(interface, host, cuda_device_index, ucx_net_devices):
-    if host:
-        return None
-    else:
-        return interface or get_ucx_net_devices(
-            cuda_device_index=cuda_device_index,
-            ucx_net_devices=ucx_net_devices,
-            get_openfabrics=False,
-            get_network=True,
-        )
-
-
 class CUDAWorker(Server):
     def __init__(
         self,
@@ -62,6 +49,7 @@ def __init__(
         rmm_managed_memory=False,
         rmm_async=False,
         rmm_log_directory=None,
+        rmm_track_allocations=False,
         pid_file=None,
         resources=None,
         dashboard=True,
@@ -77,9 +65,9 @@ def __init__(
         enable_infiniband=None,
         enable_nvlink=None,
         enable_rdmacm=None,
-        net_devices=None,
         jit_unspill=None,
         worker_class=None,
+        pre_import=None,
         **kwargs,
     ):
         # Required by RAPIDS libraries (e.g., cuDF) to ensure no context
@@ -170,16 +158,6 @@ def del_pid_file():
                 "RMM managed memory and NVLink are currently incompatible."
             )
 
-        if _ucx_111 and net_devices == "auto":
-            warnings.warn(
-                "Starting with UCX 1.11, `ucx_net_devices='auto' is deprecated, "
-                "it should now be left unspecified for the same behavior. "
-                "Please make sure to read the updated UCX Configuration section in "
-                "https://docs.rapids.ai/api/dask-cuda/nightly/ucx.html, "
-                "where significant performance considerations for InfiniBand with "
-                "UCX 1.11 and above is documented.",
-            )
-
         # Ensure this parent dask-cuda-worker process uses the same UCX
         # configuration as child worker processes created by it.
         initialize(
@@ -188,8 +166,6 @@ def del_pid_file():
             enable_infiniband=enable_infiniband,
             enable_nvlink=enable_nvlink,
             enable_rdmacm=enable_rdmacm,
-            net_devices=net_devices,
-            cuda_device_index=0,
         )
 
         if jit_unspill is None:
@@ -232,7 +208,7 @@ def del_pid_file():
                 loop=loop,
                 resources=resources,
                 memory_limit=memory_limit,
-                interface=_get_interface(interface, host, i, net_devices),
+                interface=interface,
                 host=host,
                 preload=(list(preload) or []) + ["dask_cuda.initialize"],
                 preload_argv=(list(preload_argv) or []) + ["--create-cuda-context"],
@@ -248,7 +224,9 @@ def del_pid_file():
                         rmm_managed_memory,
                         rmm_async,
                         rmm_log_directory,
+                        rmm_track_allocations,
                     ),
+                    PreImport(pre_import),
                 },
                 name=name if nprocs == 1 or name is None else str(name) + "-" + str(i),
                 local_directory=local_directory,
@@ -258,8 +236,6 @@ def del_pid_file():
                         enable_infiniband=enable_infiniband,
                         enable_nvlink=enable_nvlink,
                         enable_rdmacm=enable_rdmacm,
-                        net_devices=net_devices,
-                        cuda_device_index=i,
                     )
                 },
                 data=data(nvml_device_index(i, cuda_visible_devices(i))),