Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duck array documentation improvements #7911

Merged
merged 41 commits into from
Jun 29, 2023
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
0217fe3
draft updates
TomNicholas Jun 12, 2023
5a221bb
discuss array API standard
TomNicholas Jun 12, 2023
1971da4
fix sparse examples so they run
TomNicholas Jun 13, 2023
fa58fff
Deepak's suggestions
TomNicholas Jun 14, 2023
258dd54
link to duck arrays user guide from internals page
TomNicholas Jun 14, 2023
b26e7ac
fix various links
TomNicholas Jun 15, 2023
ad81811
itemized list
TomNicholas Jun 15, 2023
99394a3
mention dispatching on functions not in the array API standard
TomNicholas Jun 15, 2023
c93f143
examples of duckarrays
TomNicholas Jun 21, 2023
b6279fd
add intended audience to xarray internals section
TomNicholas Jun 21, 2023
0eea00b
move paragraph on why its called a duck array upwards
TomNicholas Jun 27, 2023
cc4fac0
delete section on numpy ufuncs
TomNicholas Jun 27, 2023
5e8015f
explain difference between .values and to_numpy
TomNicholas Jun 27, 2023
70bfda5
strongly prefer to_numpy over values
TomNicholas Jun 27, 2023
5fdb7e3
recommend to_numpy instead of values in the how do I? page
TomNicholas Jun 27, 2023
68315f8
clearer about using to_numpy
TomNicholas Jun 27, 2023
2931b86
merge section on missing features
TomNicholas Jun 27, 2023
9f21b00
remove todense from examples
TomNicholas Jun 27, 2023
2bb65d5
whatsnew
TomNicholas Jun 27, 2023
f0ba66c
Merge branch 'main' into duckarray-docs
TomNicholas Jun 27, 2023
0b405a1
double that
TomNicholas Jun 28, 2023
ed6195c
numpy array class clarification
TomNicholas Jun 28, 2023
40eb53b
Remove sentence about xarray's internals
TomNicholas Jun 28, 2023
a567aa4
array API standard
TomNicholas Jun 28, 2023
76237a9
proper link for sparse.COO type
TomNicholas Jun 28, 2023
1923d4b
links to docstrings of array types
TomNicholas Jun 28, 2023
b26cbd8
don't put variable in parentheses
TomNicholas Jun 28, 2023
f62b4a9
double backquote formatting
TomNicholas Jun 28, 2023
8d4bd3f
better bracketing
TomNicholas Jun 28, 2023
e9287de
fix list formatting
TomNicholas Jun 28, 2023
d1e9b8f
add links to glue packages, dask, and cubed
TomNicholas Jun 28, 2023
d545d5d
Merge branch 'duckarray-docs' of https://github.com/TomNicholas/xarra…
TomNicholas Jun 28, 2023
1ea2078
link to todense method
TomNicholas Jun 28, 2023
be919b6
link to numpy-like arrays page
TomNicholas Jun 28, 2023
0c0a547
Merge branch 'duckarray-docs' of https://github.com/TomNicholas/xarra…
TomNicholas Jun 28, 2023
d03e125
link to numpy ufunc docs
TomNicholas Jun 28, 2023
90a8bcb
add example of using .to_numpy
TomNicholas Jun 28, 2023
14057b9
show example of .values failing
TomNicholas Jun 28, 2023
45000e4
move whatsnew entry to unreleased version
TomNicholas Jun 28, 2023
da8719d
fix warning in docs build
TomNicholas Jun 28, 2023
08c0f84
trigger CI
TomNicholas Jun 29, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 25 additions & 5 deletions doc/internals/duck-arrays-integration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,18 +6,38 @@ Integrating with duck arrays

.. warning::

This is a experimental feature.
This is a experimental feature. Please report any bugs or other difficulties on xarray's issue tracker.
TomNicholas marked this conversation as resolved.
Show resolved Hide resolved

Xarray can wrap custom :term:`duck array` objects as long as they define numpy's
``shape``, ``dtype`` and ``ndim`` properties and the ``__array__``,
``__array_ufunc__`` and ``__array_function__`` methods.
Xarray can wrap custom numpy-like arrays (":term:`duck array`s") - see the user guide documentation.
TomNicholas marked this conversation as resolved.
Show resolved Hide resolved

Duck array requirements
~~~~~~~~~~~~~~~~~~~~~~~

Xarray does not explicitly check that that required methods are defined by the underlying duck array object before
TomNicholas marked this conversation as resolved.
Show resolved Hide resolved
attempting to wrap the given array. However, a wrapped array type should at a minimum support numpy's ``shape``,
TomNicholas marked this conversation as resolved.
Show resolved Hide resolved
``dtype`` and ``ndim`` properties, as well as the ``__array__``, ``__array_ufunc__`` and ``__array_function__`` methods.
The array ``shape`` property needs to obey numpy's broadcasting rules.

Python Array API standard support
=================================

As an integration library xarray benefits greatly from the standardization of duck-array libraries' APIs, and so is a
big supporter of the python Array API Standard (link). In fact the crystallization of different array libraries' APIs towards
TomNicholas marked this conversation as resolved.
Show resolved Hide resolved
the standard has already helped xarray remove a lot of internal adapter code.

As such, we aim to support any array librarie that follows the standard out-of-the-box. However, xarray does occasionally
TomNicholas marked this conversation as resolved.
Show resolved Hide resolved
call some numpy functions which are not (yet) part of the standard (e.g. :py:class:`DataArray.pad` calls `np.pad`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can mention that we support dispatching on these through the other array protocols

). (link to issue)

Custom inline reprs
~~~~~~~~~~~~~~~~~~~

In certain situations (e.g. when printing the collapsed preview of
variables of a ``Dataset``), xarray will display the repr of a :term:`duck array`
in a single line, truncating it to a certain number of characters. If that
would drop too much information, the :term:`duck array` may define a
``_repr_inline_`` method that takes ``max_width`` (number of characters) as an
argument:
argument

.. code:: python

Expand Down
180 changes: 166 additions & 14 deletions doc/user-guide/duckarrays.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,28 +3,172 @@
Working with numpy-like arrays
==============================
TomNicholas marked this conversation as resolved.
Show resolved Hide resolved

NumPy-like arrays (often known as :term:`duck array`s) are drop-in replacements for the :py:class:`numpy.ndarray`
class but with different features, such as propagating physical units or a different layout in memory.
Xarray can often wrap these array types, allowing you to use labelled dimensions and indexes whilst benefiting from the
additional features of these array libraries.

.. warning::

This feature should be considered experimental. Please report any bug you may find on
This feature should be considered somewhat experimental. Please report any bugs you find on
TomNicholas marked this conversation as resolved.
Show resolved Hide resolved
xarray’s github repository.

NumPy-like arrays (:term:`duck array`) extend the :py:class:`numpy.ndarray` with
additional features, like propagating physical units or a different layout in memory.
.. note::

For information on wrapping dask arrays see :ref:`dask`. Whilst xarray wraps dask arrays in a similar way to that
described on this page, chunked array types like `dask.array.Array` implement additional methods that require
TomNicholas marked this conversation as resolved.
Show resolved Hide resolved
slightly different user code (e.g. calling ``.chunk`` or ``.compute``).

What is a numpy-like array?
---------------------------

A "numpy-like array" (also known as a "duck array") is a class that contains array-like data, and implements key
numpy-like functionality such as indexing, broadcasting, and computation methods.

For example, the ``sparse`` library provides a sparse array type which is useful for representing ``sparse matrices``
TomNicholas marked this conversation as resolved.
Show resolved Hide resolved
in a memory-efficient manner. We can create a sparse array object (of the ``sparse.COO`` type) from a numpy array like this:
TomNicholas marked this conversation as resolved.
Show resolved Hide resolved

.. ipython:: python

from sparse import COO

x = np.eye(4, dtype=np.uint8) # create diagonal identity matrix
s = COO.from_numpy(x)
s

This sparse object does not attempt to explicitly store every element in the array, only the non-zero elements.
This approach is much more efficient for large arrays with only a few non-zero elements (such as tri-diagonal matrices).
It does mean that in order to clearly see what is stored in our sparse array object we have to convert it back to a
"dense" array using ``.todense``:

.. ipython:: python
TomNicholas marked this conversation as resolved.
Show resolved Hide resolved

s.todense()

Just like `numpy.ndarray` objects, `sparse.COO` arrays support indexing
TomNicholas marked this conversation as resolved.
Show resolved Hide resolved

.. ipython:: python

s[1, 1] # diagonal elements should be ones
s[2, 3] # off-diagonal elements should be zero

:py:class:`DataArray` and :py:class:`Dataset` objects can wrap these duck arrays, as
long as they satisfy certain conditions (see :ref:`internals.duck_arrays`).
broadcasting,

.. ipython:: python

x2 = np.zeros(
(4, 1), dtype=np.uint8
) # create second sparse array of different shape
s2 = COO.from_numpy(x2)
(s * s2).todense() # multiplication requires broadcasting

and various computation methods

.. ipython:: python

s.sum(axis=1).todense()

This numpy-like array also supports calling so-called numpy ufuncs (link to numpy docs) on it directly:
TomNicholas marked this conversation as resolved.
Show resolved Hide resolved

.. ipython:: python

np.sum(s, axis=1).todense()


Notice that in each case the API for calling the operation on the sparse array is identical to that of calling it on the
equivalent numpy array - this is the sense in which the sparse array is "numpy-like".

Why is it also called a "duck" array, you might ask? This comes from a common statement in object-oriented programming -
TomNicholas marked this conversation as resolved.
Show resolved Hide resolved
"If it walks like a duck, and quacks like a duck, treat it like a duck". In other words, a library like xarray that
is capable of using multiple different types of arrays does not have to explicitly check that each one it encounters is
permitted (e.g. `if dask`, `if numpy`, `if sparse` etc.). Instead xarray can take the more permissive approach of simply
treating the wrapped array as valid, attempting to call the relevant methods (e.g. `.mean()`) and only raising an
error if a problem occurs (e.g. the method is not found on the wrapped class). This is much more flexible, and allows
objects and classes from different libraries to work together more easily.

.. note::

For ``dask`` support see :ref:`dask`.
For discussion on exactly which methods a class needs to implement to be considered "numpy-like", see :ref:`internals.duck_arrays`.

Wrapping numpy-like arrays in xarray
------------------------------------

:py:class:`DataArray` and :py:class:`Dataset` (and :py:class:`Variable`) objects can wrap these numpy-like arrays.
TomNicholas marked this conversation as resolved.
Show resolved Hide resolved

Constructing xarray objects which wrap numpy-like arrays
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The primary way to create an xarray object which wraps a numpy-like array is to pass that numpy-like array instance directly
to the constructor of the xarray class. The page on xarray data structures shows how :py:class:`DataArray` and :py:class:`Dataset`
TomNicholas marked this conversation as resolved.
Show resolved Hide resolved
both accept data in various forms through their ``data`` argument, but in fact this data can also be any wrappable numpy-like array.

For example, we can wrap the sparse array we created earlier inside a new DataArray object:

.. ipython:: python

s_da = xr.DataArray(s, dims=["i", "j"])
s_da

We can see what's inside - the printable representation of our xarray object (the repr) automatically uses the printable
representation of the underlying wrapped array.

Of course our sparse array object is still there underneath - it's stored under the `.data` attribute of the dataarray:
TomNicholas marked this conversation as resolved.
Show resolved Hide resolved

.. ipython:: python

s_da.data

Array methods
~~~~~~~~~~~~~

We saw above that numpy-like arrays provide numpy methods. Xarray automatically uses these when you call the corresponding xarray method:

.. ipython:: python

s_da.sum(dim="j")

Numpy ufuncs
~~~~~~~~~~~~

Xarray objects support calling numpy functions direction on the xarray objects, e.g. ``np.func(da)``.
This also works when wrapping numpy-like arrays:

.. ipython:: python

np.sum(s_da, axis=1)
TomNicholas marked this conversation as resolved.
Show resolved Hide resolved

Converting wrapped types
~~~~~~~~~~~~~~~~~~~~~~~~

If you want to change the type inside your xarray object you can use :py:meth:`DataArray.as_numpy`:

.. ipython:: python

s_da.as_numpy()

This returns a new :py:class:`DataArray` object, but now wrapping a normal numpy array.

If instead you want to convert to numpy and return that numpy array you can use either :py:meth:`DataArray.to_numpy` or
:py:meth:`DataArray.values` (what is the difference here?).
TomNicholas marked this conversation as resolved.
Show resolved Hide resolved

This illustrates the difference between `.values` and `.data`, which is sometimes a point of confusion for new xarray users.
:py:meth:`DataArray.data` returns the underlying numpy-like array, regardless of type, whereas :py:meth:`DataArray.values`
converts the underlying array to a numpy array before returning it.
TomNicholas marked this conversation as resolved.
Show resolved Hide resolved

Conversion to numpy as a fallback
TomNicholas marked this conversation as resolved.
Show resolved Hide resolved
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If a wrapped array does not implement the corresponding array method then xarray will often attempt to convert the
underlying array to a numpy array so that the operation can be performed. You may want to watch out for this behavior,
and report any instances in which it causes problems.

Missing features
----------------
Most of the API does support :term:`duck array` objects, but there are a few areas where
the code will still cast to ``numpy`` arrays:

- dimension coordinates, and thus all indexing operations:
Most of xarray's API does support using :term:`duck array` objects, but there are a few areas where
the code will still convert to ``numpy`` arrays:

- Dimension coordinates, and thus all indexing operations:

* :py:meth:`Dataset.sel` and :py:meth:`DataArray.sel`
* :py:meth:`Dataset.loc` and :py:meth:`DataArray.loc`
Expand All @@ -33,7 +177,7 @@ the code will still cast to ``numpy`` arrays:
:py:meth:`DataArray.reindex` and :py:meth:`DataArray.reindex_like`: duck arrays in
data variables and non-dimension coordinates won't be casted

- functions and methods that depend on external libraries or features of ``numpy`` not
- Functions and methods that depend on external libraries or features of ``numpy`` not
covered by ``__array_function__`` / ``__array_ufunc__``:

* :py:meth:`Dataset.ffill` and :py:meth:`DataArray.ffill` (uses ``bottleneck``)
Expand All @@ -49,17 +193,25 @@ the code will still cast to ``numpy`` arrays:
:py:class:`numpy.vectorize`)
* :py:func:`apply_ufunc` with ``vectorize=True`` (uses :py:class:`numpy.vectorize`)

- incompatibilities between different :term:`duck array` libraries:
- Incompatibilities between different :term:`duck array` libraries:

* :py:meth:`Dataset.chunk` and :py:meth:`DataArray.chunk`: this fails if the data was
not already chunked and the :term:`duck array` (e.g. a ``pint`` quantity) should
wrap the new ``dask`` array; changing the chunk sizes works.


Extensions using duck arrays
----------------------------
Here's a list of libraries extending ``xarray`` to make working with wrapped duck arrays
easier:

Whilst the features above allow many numpy-like array libraries to be used pretty seamlessly with xarray, it often also
makes sense to use an interfacing package to make certain tasks easier.

For example the ``pint-xarray`` package offers a custom `.pint` accessor (link to accessors docs) which provides
TomNicholas marked this conversation as resolved.
Show resolved Hide resolved
convenient access to information stored within the wrapped array (e.g. `.units` and `.magnitude`), and makes makes
creating wrapped pint arrays (and especially xarray-wrapping-pint-wrapping-dask arrays) simpler for the user.

We maintain a list of libraries extending ``xarray`` to make working with particular wrapped duck arrays
easier. If you know of more that aren't on this list please raise an issue to add them!

- `pint-xarray <https://pint-xarray.readthedocs.io>`_
- `cupy-xarray <https://cupy-xarray.readthedocs.io>`_
- `cubed-xarray <https://github.com/cubed-xarray>`_