diff --git a/docs/source/examples/best-practices.rst b/docs/source/examples/best-practices.rst index d6cc71899..03e26bf5b 100644 --- a/docs/source/examples/best-practices.rst +++ b/docs/source/examples/best-practices.rst @@ -49,8 +49,8 @@ Spilling from Device Dask-CUDA offers several different ways to enable automatic spilling from device memory. The best method often depends on the specific workflow. For classic ETL workloads with -`Dask cuDF `_, cuDF spilling is usually -the best place to start. See `spilling`_ for more details. +`Dask-cuDF `_, cuDF spilling is usually the +best place to start. See :ref:`Spilling from device ` for more details. Accelerated Networking ~~~~~~~~~~~~~~~~~~~~~~ diff --git a/docs/source/spilling.rst b/docs/source/spilling.rst index 066284fa6..04cfa05c3 100644 --- a/docs/source/spilling.rst +++ b/docs/source/spilling.rst @@ -1,3 +1,5 @@ +.. _spilling-from-device: + Spilling from device ==================== @@ -110,7 +112,7 @@ to enable compatibility mode, which automatically calls ``unproxy()`` on all fun cuDF Spilling ------------- -When executing a `Dask cuDF `_ +When executing a `Dask-cuDF `_ (i.e. Dask DataFrame) ETL workflow, it is usually best to leverage `native spilling support in cuDF `. @@ -145,14 +147,23 @@ Statistics ~~~~~~~~~~ When cuDF spilling is enabled, it is also possible to have cuDF collect basic -spill statistics. This information can be a useful way to understand the -performance of Dask cuDF workflows with high memory utilization: +spill statistics. Collecting this information can be a useful way to understand +the performance of Dask-cuDF workflows with high memory utilization. + +When deploying a ``LocalCUDACluster``, cuDF spilling can be enabled with the +``cudf_spill_stats`` argument: + +.. code-block:: + + >>> cluster = LocalCUDACluster(n_workers=10, enable_cudf_spill=True, cudf_spill_stats=1)​ + +The same applies for ``dask cuda worker``: .. code-block:: $ dask cuda worker --enable-cudf-spill --cudf-spill-stats 1 -To have each dask-cuda worker print spill statistics, do something like: +To have each dask-cuda worker print spill statistics within the workflow, do something like: .. code-block:: @@ -161,11 +172,14 @@ To have each dask-cuda worker print spill statistics, do something like: print(get_global_manager().statistics) client.submit(spill_info) +See the `cuDF spilling documentation +`_ +for more information on the available spill-statistics options. Limitations ~~~~~~~~~~~ -Although cuDF spilling is the best option for most Dask cuDF ETL workflows, +Although cuDF spilling is the best option for most Dask-cuDF ETL workflows, it will be much less effective if that workflow converts between ``cudf.DataFrame`` and other data formats (e.g. ``cupy.ndarray``). Once the underlying device buffers are "exposed" to external memory references, they become "unspillable" by cuDF.