Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataTree: missing methods #10015

Open
4 tasks
mathause opened this issue Jan 31, 2025 · 2 comments · May be fixed by #10146
Open
4 tasks

DataTree: missing methods #10015

mathause opened this issue Jan 31, 2025 · 2 comments · May be fixed by #10146
Labels
enhancement topic-DataTree Related to the implementation of a DataTree class

Comments

@mathause
Copy link
Collaborator

mathause commented Jan 31, 2025

Is your feature request related to a problem?

There are still missing methods on DataTree and I did not find an issue tracking those.

  • Did I miss that?
  • Is there a reason they are missing?

e.g.:

  • broadcast_like
  • dropna
  • transpose
  • ...

(I am sure there are more but these are the ones I need.)

Describe the solution you'd like

No response

Describe alternatives you've considered

We could use map_over_datasets(xr.Dataset.transpose, dt) but (1) that's annoying, (2) obj.transpose() would work for DataTree and Dataset etc.

Additional context

No response

@mathause mathause added enhancement topic-DataTree Related to the implementation of a DataTree class labels Jan 31, 2025
@mathause
Copy link
Collaborator Author

mathause commented Mar 7, 2025

Here is a more complete list (code-generated):

  • apply
  • argmax
  • argmin
  • as_numpy
  • assign_attrs
  • assign_coords
  • astype
  • bfill
  • broadcast_equals
  • broadcast_like
  • chunks
  • clip
  • coarsen
  • combine_first
  • convert_calendar
  • cumulative
  • cumulative_integrate
  • curvefit
  • diff
  • differentiate
  • drop
  • drop_attrs
  • drop_dims
  • drop_duplicates
  • drop_encoding
  • drop_indexes
  • drop_isel
  • drop_sel
  • drop_vars
  • dropna
  • dtypes
  • dump_to_store
  • eval
  • expand_dims
  • ffill
  • fillna
  • filter_by_attrs
  • from_dataframe
  • get_index
  • groupby
  • groupby_bins
  • head
  • idxmax
  • idxmin
  • imag
  • info
  • integrate
  • interp
  • interp_calendar
  • interp_like
  • interpolate_na
  • isin
  • isnull
  • load_store
  • loc
  • map
  • map_blocks
  • merge
  • notnull
  • pad
  • plot
  • polyfit
  • quantile
  • query
  • rank
  • real
  • reindex
  • reindex_like
  • rename
  • rename_dims
  • rename_vars
  • reorder_levels
  • resample
  • reset_coords
  • reset_encoding
  • reset_index
  • roll
  • rolling
  • rolling_exp
  • set_coords
  • set_index
  • set_xindex
  • shift
  • sortby
  • squeeze
  • stack
  • swap_dims
  • tail
  • thin
  • to_array
  • to_dask_dataframe
  • to_dataarray
  • to_dataframe
  • to_pandas
  • to_stacked_array
  • transpose
  • unify_chunks
  • unstack
  • weighted
  • where

I think these are not available because they where defined directly on Dataset, while DataTree uses mixins to define reductions and binary ops...

import xarray as xr

dt_methods = set(m for m in dir(xr.DataTree) if not m.startswith("_"))
ds_methods = set(m for m in dir(xr.Dataset) if not m.startswith("_"))
for m in sorted(ds_methods - dt_methods):
    print(f"- [ ] {m}")

@TomNicholas
Copy link
Member

Did I miss that?

I don't think so, we might have forgotten to make an issue for this.

Is there a reason they are missing?

Available time? 🤷

while DataTree uses mixins to define reductions and binary ops...

So does Dataset! Repurposing those mixins for DataTree made it easy to add e.g. all the aggregation methods in one PR.

Here is a more complete list (code-generated):

So I don't think these are all equally important. We deliberately prioritized methods to manipulate and iterate over tree contents above implementing data analysis methods. A large number of the methods on that list could be implemented in one line using map_over_datasets, which to me means it's less urgent to add them as we are only saving users one line of code.

The things that are important to add that are still missing are to do with manipulating and viewing tree contents, such as set/assign_coords, drop*, map. Some of those we do have issues/open PRs for.

Also as an aside I actually feel like some of those methods on the list shouldn't even exist on Dataset (such as curvefit), as they aren't really fundamental to our data structures, and should live in a separate package.

@mathause mathause linked a pull request Mar 18, 2025 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement topic-DataTree Related to the implementation of a DataTree class
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants