Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RangeIndex #10076

Open
wants to merge 24 commits into
base: main
Choose a base branch
from
Open

Add RangeIndex #10076

wants to merge 24 commits into from

Conversation

benbovy
Copy link
Member

@benbovy benbovy commented Feb 25, 2025

Work in progress (just Ready for review (copied and adapted the example from #9543 (comment)).

benbovy added 9 commits March 20, 2025 09:27
- Use start, stop, step terms

- Make RangeIndex.__init__ private and more flexible, add
  RangeIndex.arange and RangeIndex.linspace public factories

- General support of RangeIndex slicing

- RangeIndex.isel with arbitrary 1D values: convert to PandasIndex

- Add RangeIndex.to_pandas_index
... when check_default_indexes=False.
@benbovy
Copy link
Member Author

benbovy commented Mar 21, 2025

I've made further progress on this. Some design questions (thoughts welcome!):

Create a new RangeIndex

  • I'm not sure yet about the public API? Currently RangeIndex.__init__ is "private" (more flexible and easier for internals) and there are two public factories RangeIndex.arange and RangeIndex.linspace inspired from Numpy API. Creating a new dataset with a range index would look like:
import xarray as xr
from xarray.indexes import RangeIndex

index = RangeIndex.arange("x", "x", 0.0, 1.0, 0.1)
ds = xr.Dataset(coords=xr.Coordinates.from_xindex(index))
<xarray.Dataset> Size: 80B
Dimensions:  (x: 10)
Coordinates:
  * x        (x) float64 80B 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Data variables:
    *empty*
Indexes:
    x        RangeIndex
  • RangeIndex doesn't support set_xindex. Do we want to support it? If yes, how would look like the input coordinate? An existing range with explicit values from which RangeIndex would try to infer a constant step value? Is that useful to have? Since the point of RangeIndex is to avoid materializing coordinate values in memory... Or a 1D coordinate with three values representing start, stop and step? Any other alternative?

Index import

Should we expose all public built-in Xarray indexes at the top level? Or only at the xarray.indexes level?

Currently the Index base class and CFTimeIndex (not an Xarray index but could eventually be refactored so) are exposed at the top level, while PandasIndex, PandasMultiIndex and RangeIndex (this PR) are only exposed at the xarray.indexes level. We might want to uniformize that.

@benbovy
Copy link
Member Author

benbovy commented Mar 21, 2025

Note: this Xarray RangeIndex is designed for floating value ranges. For integer ranges it is probably best to use a PandasIndex wrapping a pandas.RangeIndex. I added a note in the docstrings here. More work on the documentation is needed but probably in a later PR addressing Xarray indexes in general.

@benbovy benbovy marked this pull request as ready for review March 21, 2025 11:55
dim : str
Dimension name.
start : float, optional
Start of interval (default: 0.0). The interval includes this value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could consider adding a closed kwarg like pd.Interval, but in a future PR of course.

"`Coordinates.from_xindex()`"
)

@property
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these all be cached_property?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be much benefit of caching those simple aliases to attributes of the underlying transform?

dtype : dtype, optional
The dtype of the coordinate variable (default: float64).

Examples
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Examples
Note that all `start`, `stop` & `step` must be passed, which is more explicit than `np.arange` or `range`
Examples

(optional, no strong view)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that all start, stop & step must be passed

This isn't exactly true, but yes the API here is more explicit than np.arange and range, e.g., RangeIndex.arange(10.0) means start=10 while np.arange(10.0) means stop=10.

RangeIndex.arange(10.0) doesn't make much sense, though, considering the default value of stop=1.0. I'll see if we can get closer to np.arange using tpying.overload.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RangeIndex.arange(10.0) doesn't make much sense, though, considering the default value of stop=1.0. I'll see if we can get closer to np.arange using tpying.overload.

yeah. no objection to the more explicit approach — it's useful-but-a-bit-magic that arange / range changes the meaning of the first arg based on how many are supplied

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

Regular (linspace) Coordinates/Index
4 participants