Skip to content

Commit

Permalink
Update tab styling on "10 Minutes to Dask" page (dask#10728)
Browse files Browse the repository at this point in the history
  • Loading branch information
jrbourbeau authored Dec 19, 2023
1 parent 2b5c65a commit 5561770
Showing 1 changed file with 44 additions and 28 deletions.
72 changes: 44 additions & 28 deletions docs/source/10-minutes-to-dask.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,11 @@ Creating a Dask Object
You can create a Dask object from scratch by supplying existing data and optionally
including information about how the chunks should be structured.

.. tabs::

.. group-tab:: DataFrame
.. tab-set::

.. tab-item:: DataFrame
:sync: dataframe

See :doc:`dataframe`.

Expand Down Expand Up @@ -84,7 +86,8 @@ including information about how the chunks should be structured.
2021-09-21 ... ...
Dask Name: blocks, 11 tasks
.. group-tab:: Array
.. tab-item:: Array
:sync: array

See :doc:`array`.

Expand Down Expand Up @@ -112,7 +115,8 @@ including information about how the chunks should be structured.
# access a particular block of data
a.blocks[1, 3]

.. group-tab:: Bag
.. tab-item:: Bag
:sync: bag

See :doc:`bag`.

Expand All @@ -131,9 +135,10 @@ Indexing

Indexing Dask collections feels just like slicing NumPy arrays or pandas DataFrame.

.. tabs::
.. tab-set::

.. group-tab:: DataFrame
.. tab-item:: DataFrame
:sync: dataframe

.. code-block:: python
Expand All @@ -156,13 +161,15 @@ Indexing Dask collections feels just like slicing NumPy arrays or pandas DataFra
2021-10-09 05:00:59.999999999 ... ...
Dask Name: loc, 11 tasks
.. group-tab:: Array
.. tab-item:: Array
:sync: array

.. jupyter-execute::
.. jupyter-execute::

a[:50, 200]
a[:50, 200]

.. group-tab:: Bag
.. tab-item:: Bag
:sync: bag

A Bag is an unordered collection allowing repeats. So it is like a list, but it doesn’t
guarantee an ordering among elements. There is no way to index Bags since they are
Expand All @@ -177,9 +184,10 @@ you ask for it. Instead, a Dask task graph for the computation is produced.

Anytime you have a Dask object and you want to get the result, call ``compute``:

.. tabs::
.. tab-set::

.. group-tab:: DataFrame
.. tab-item:: DataFrame
:sync: dataframe

.. code-block:: python
Expand All @@ -199,7 +207,8 @@ Anytime you have a Dask object and you want to get the result, call ``compute``:
[198 rows x 2 columns]
.. group-tab:: Array
.. tab-item:: Array
:sync: array

.. code-block:: python
Expand All @@ -211,7 +220,8 @@ Anytime you have a Dask object and you want to get the result, call ``compute``:
18200, 18700, 19200, 19700, 20200, 20700, 21200, 21700, 22200,
22700, 23200, 23700, 24200, 24700])
.. group-tab:: Bag
.. tab-item:: Bag
:sync: bag

.. code-block:: python
Expand All @@ -225,9 +235,10 @@ Methods
Dask collections match existing numpy and pandas methods, so they should feel familiar.
Call the method to set up the task graph, and then call ``compute`` to get the result.

.. tabs::
.. tab-set::

.. group-tab:: DataFrame
.. tab-item:: DataFrame
:sync: dataframe

.. code-block:: python
Expand Down Expand Up @@ -280,7 +291,8 @@ Call the method to set up the task graph, and then call ``compute`` to get the r
2021-10-09 05:00:00 161963
Freq: H, Name: a, Length: 198, dtype: int64
.. group-tab:: Array
.. tab-item:: Array
:sync: array

.. code-block:: python
Expand Down Expand Up @@ -332,7 +344,8 @@ Call the method to set up the task graph, and then call ``compute`` to get the r
array([100009, 99509, 99009, 98509, 98009, 97509, 97009, 96509,
96009, 95509])
.. group-tab:: Bag
.. tab-item:: Bag
:sync: bag

Dask Bag implements operations like ``map``, ``filter``, ``fold``, and
``groupby`` on collections of generic Python objects.
Expand Down Expand Up @@ -369,9 +382,10 @@ Visualize the Task Graph
So far we've been setting up computations and calling ``compute``. In addition to
triggering computation, we can inspect the task graph to figure out what's going on.

.. tabs::
.. tab-set::

.. group-tab:: DataFrame
.. tab-item:: DataFrame
:sync: dataframe

.. code-block:: python
Expand All @@ -391,7 +405,8 @@ triggering computation, we can inspect the task graph to figure out what's going
.. image:: images/10_minutes_dataframe_graph.png
:alt: Dask task graph for the Dask dataframe computation. The task graph shows a "loc" and "getitem" operations selecting a small section of the dataframe values, before applying a cumulative sum "cumsum" operation, then finally subtracting a value from the result.

.. group-tab:: Array
.. tab-item:: Array
:sync: array

.. code-block:: python
Expand All @@ -410,7 +425,8 @@ triggering computation, we can inspect the task graph to figure out what's going
.. image:: images/10_minutes_array_graph.png
:alt: Dask task graph for the Dask array computation. The task graph shows many "amax" operations on each chunk of the Dask array, that are then aggregated to find "amax" along the first array axis, then reversing the order of the array values with a "getitem" slicing operation, before an "add" operation to get the final result.

.. group-tab:: Bag
.. tab-item:: Bag
:sync: bag

.. code-block:: python
Expand All @@ -431,9 +447,9 @@ Low-Level Interfaces
Often when parallelizing existing code bases or building custom algorithms, you
run into code that is parallelizable, but isn't just a big DataFrame or array.

.. tabs::
.. tab-set::

.. group-tab:: Delayed: Lazy
.. tab-item:: Delayed: Lazy

:doc:`delayed` lets you to wrap individual function calls into a lazily constructed task graph:

Expand All @@ -455,7 +471,7 @@ run into code that is parallelizable, but isn't just a big DataFrame or array.
c = c.compute() # This triggers all of the above computations
.. group-tab:: Futures: Immediate
.. tab-item:: Futures: Immediate

Unlike the interfaces described so far, Futures are eager. Computation starts as soon
as the function is submitted (see :doc:`futures`).
Expand Down Expand Up @@ -500,9 +516,9 @@ If you want more control, use the distributed scheduler instead. Despite having
"distributed" in it's name, the distributed scheduler works well
on both single and multiple machines. Think of it as the "advanced scheduler".

.. tabs::
.. tab-set::

.. group-tab:: Local
.. tab-item:: Local

This is how you set up a cluster that uses only your own computer.

Expand All @@ -514,7 +530,7 @@ on both single and multiple machines. Think of it as the "advanced scheduler".
... client
<Client: 'tcp://127.0.0.1:41703' processes=4 threads=12, memory=31.08 GiB>
.. group-tab:: Remote
.. tab-item:: Remote

This is how you connect to a cluster that is already running.

Expand Down

0 comments on commit 5561770

Please sign in to comment.