Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No attribute map_over_subtree #9710

Closed
5 tasks done
melonora opened this issue Nov 4, 2024 · 8 comments
Closed
5 tasks done

No attribute map_over_subtree #9710

melonora opened this issue Nov 4, 2024 · 8 comments
Labels
topic-DataTree Related to the implementation of a DataTree class

Comments

@melonora
Copy link

melonora commented Nov 4, 2024

What happened?

Looking for a way to map a function over Datasets in a DataTree I was hit by the issue described in #9693. This because of the node with path . not containing the dimensions I was trying to transpose.

Traceback (most recent call last):
  File "C:\ProgramData\miniforge3\envs\xarray_datatree\Lib\site-packages\IPython\core\interactiveshell.py", line 3577, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-38-46c5b7c11604>", line 1, in <module>
    tree = tree.map_over_datasets(Dataset.transpose, ('y', 'x', 'c'))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniforge3\envs\xarray_datatree\Lib\site-packages\xarray\core\datatree.py", line 1462, in map_over_datasets
    return map_over_datasets(func, self, *args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniforge3\envs\xarray_datatree\Lib\site-packages\xarray\core\datatree_mapping.py", line 103, in map_over_datasets
    results = func_with_error_context(*node_dataset_args)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniforge3\envs\xarray_datatree\Lib\site-packages\xarray\core\datatree_mapping.py", line 133, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniforge3\envs\xarray_datatree\Lib\site-packages\xarray\util\deprecation_helpers.py", line 143, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniforge3\envs\xarray_datatree\Lib\site-packages\xarray\core\dataset.py", line 6415, in transpose
    _ = list(infix_dims(dim, self.dims, missing_dims))
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniforge3\envs\xarray_datatree\Lib\site-packages\xarray\namedarray\utils.py", line 171, in infix_dims
    existing_dims = drop_missing_dims(dims_supplied, dims_all, missing_dims)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniforge3\envs\xarray_datatree\Lib\site-packages\xarray\namedarray\utils.py", line 124, in drop_missing_dims
    raise ValueError(
ValueError: Dimensions {('y', 'x', 'c')} do not exist. Expected one or more of FrozenMappingWarningOnValuesAccess({})
Raised whilst mapping function over node with path '.'

Trying to find a workaround with map_over_subtree did not work either as seemingly in the latest xarray (2024.10.0) this does not exist. I get an AttributeError, while according to the documentation the method does exist.

What did you expect to happen?

I expect as output a datatree in which the datasets have their dimensions transposed.

Minimal Complete Verifiable Example

import numpy as np
from dask.array.core import from_array
from xarray import DataTree, DataArray, Dataset

img = from_array(np.random.rand(3, 512,512))
dims = ['c','y','x']
scale_factors = [2,2]
data = DataArray(img, coords={dims[dim_index]: range(img.shape[dim_index]) for dim_index in range(len(dims))} ,dims=('c','y','x'), name="image")

multiscale_data = {
        "scale0": data.to_dataset(name=data.name, promote_attrs=True)
    }

for factor_index, scale_factor in enumerate(scale_factors):
    dim_factors = {'y': scale_factor, 'x': scale_factor}
    downscaled = data.coarsen(dim=dim_factors, boundary="trim", side="right").mean().astype(data.dtype)
    multiscale_data[f"scale{factor_index+1}"] = downscaled.to_dataset(name=data.name, promote_attrs=True)

multiscale_image = DataTree.from_dict(multiscale_data)

# Following leads to error as node with path '.' has no dimensions
multiscale_image = multiscale_image.map_over_datasets(Dataset.transpose, ('y', 'x', 'c'))
# Following leads to error as map_over_subtree does not exist.
multiscale_image = multiscale_image.map_over_subtree(Dataset.transpose, ('y', 'x', 'c'))

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

type(element)
Out[31]: xarray.core.datatree.DataTree
element.map_over_subtree(Dataset.transpose, "y","x","c")
Traceback (most recent call last):
  File "C:\ProgramData\miniforge3\envs\xarray_datatree\Lib\site-packages\IPython\core\interactiveshell.py", line 3577, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-32-43b28a4a5288>", line 1, in <module>
    element.map_over_subtree(Dataset.transpose, "y","x","c")
    ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniforge3\envs\xarray_datatree\Lib\site-packages\xarray\core\common.py", line 302, in __getattr__
    raise AttributeError(
AttributeError: 'DataTree' object has no attribute 'map_over_subtree'

Anything else we need to know?

No response

Environment

C:\ProgramData\miniforge3\envs\xarray_datatree\Lib\site-packages_distutils_hack_init_.py:31: UserWarning: Setuptools is replacing distutils. Support for replaci
ng an already imported distutils is deprecated. In the future, this condition will fail. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml
warnings.warn(

INSTALLED VERSIONS

commit: None
python: 3.11.10 | packaged by conda-forge | (main, Oct 16 2024, 01:17:14) [MSC v.1941 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: AMD64 Family 25 Model 97 Stepping 2, AuthenticAMD
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('English_Netherlands', '1252')
libhdf5: 1.14.2
libnetcdf: None

xarray: 2024.10.0
pandas: 2.2.3
numpy: 1.26.4
scipy: 1.14.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: 3.12.1
zarr: 2.18.3
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.6.2
distributed: 2024.6.2
matplotlib: 3.9.2
cartopy: None
seaborn: 0.13.2
numbagg: None
fsspec: 2023.6.0
cupy: None
pint: 0.24.3
sparse: None
flox: None
numpy_groupies: None
setuptools: 75.3.0
pip: 24.3.1
conda: None
pytest: 8.3.3
mypy: 1.13.0
IPython: 8.29.0
sphinx: 8.1.3

@melonora melonora added bug needs triage Issue that has not been reviewed by xarray team member labels Nov 4, 2024
@kmuehlbauer kmuehlbauer removed bug needs triage Issue that has not been reviewed by xarray team member labels Nov 4, 2024
@kmuehlbauer
Copy link
Contributor

@melonora map_over_subtree was removed from API in the process of moving datatree into xarray codebase.

Please use map_over_datasets with one of the workarounds as suggested in #9693 for the time being.

@keewis
Copy link
Collaborator

keewis commented Nov 4, 2024

additionally, ds.transpose(("x", "y", "z")) will not work unless you have a dimension named ("x", "y", "z") (i.e. the dimension name is a tuple), since Dataset.transpose takes the dimension names as *args).

Given that map_over_subtree intentionally does not exist anymore, I think this is a duplicate of #9693.

Edit: or rather, where in the documentation did you find DataTree.map_over_subtree? If that really still exists I'd call that a documentation bug.

@kmuehlbauer
Copy link
Contributor

@melonora @keewis There is no mention of map_over_subtree in the latest stable docs. So maybe the used doc was outdated?

@melonora To get you working until #9693 is sorted out, here is a workaround (please also take @keewis comment on transpose arguments into account):

import functools
def skip_nodes(func):
    @functools.wraps(func)
    def _func(ds, *args, **kwargs):
        # check if needed dimensions are available in the Dataset
        # otherwise return verbatim
        if not all(arg in ds.dims for arg in args):
            return ds
        return func(ds, *args, **kwargs)
    return _func

@skip_nodes
def transpose(ds, *args, **kwargs):
    return ds.transpose(*args, **kwargs)

multiscale_image = multiscale_image.map_over_datasets(transpose, 'y', 'x', 'c')

@kmuehlbauer
Copy link
Contributor

I'll close this as dupe of #9693.

@keewis keewis closed this as not planned Won't fix, can't repro, duplicate, stale Nov 4, 2024
@melonora
Copy link
Author

melonora commented Nov 4, 2024

additionally, ds.transpose(("x", "y", "z")) will not work unless you have a dimension named ("x", "y", "z") (i.e. the dimension name is a tuple), since Dataset.transpose takes the dimension names as *args).

Given that map_over_subtree intentionally does not exist anymore, I think this is a duplicate of #9693.

Edit: or rather, where in the documentation did you find DataTree.map_over_subtree? If that really still exists I'd call that a documentation bug.

ah sorry was looking at the xarray_datatree documentation

@melonora
Copy link
Author

melonora commented Nov 4, 2024

@melonora @keewis There is no mention of map_over_subtree in the latest stable docs. So maybe the used doc was outdated?

@melonora To get you working until #9693 is sorted out, here is a workaround (please also take @keewis comment on transpose arguments into account):

import functools
def skip_nodes(func):
    @functools.wraps(func)
    def _func(ds, *args, **kwargs):
        # check if needed dimensions are available in the Dataset
        # otherwise return verbatim
        if not all(arg in ds.dims for arg in args):
            return ds
        return func(ds, *args, **kwargs)
    return _func

@skip_nodes
def transpose(ds, *args, **kwargs):
    return ds.transpose(*args, **kwargs)

multiscale_image = multiscale_image.map_over_datasets(transpose, 'y', 'x', 'c')

Thanks! I had a similar workaround for now

@eschalkargans
Copy link

Hello,

I am currently migrating to 2024.10.0. I encountered some code making use of the former map_over_subtree decorator.

What is the suggested migration process to migrate such code to the map_over_subsets one? Is the decorator aspect of it definitely gone?

Thanks for your answer

@TomNicholas TomNicholas added the topic-DataTree Related to the implementation of a DataTree class label Nov 20, 2024
@TomNicholas
Copy link
Member

What is the suggested migration process to migrate such code to the map_over_subsets one?

Sorry apparently I forgot to add this to the migration guide (I've added it in #9804).

Is the decorator aspect of it definitely gone?

Yes, we decided that it was better to have it be consistent with xr.apply_ufunc. If you want decorator-like behaviour you could use functools.partial or just wrap the .map_over_datasets call in a new function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-DataTree Related to the implementation of a DataTree class
Projects
None yet
Development

No branches or pull requests

5 participants