Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-monotonic error of ZARR: original zarr is not sorted by time dimension #78

Open
LeoLee-Xiaohu opened this issue Aug 30, 2024 · 1 comment
Assignees

Comments

@LeoLee-Xiaohu
Copy link

LeoLee-Xiaohu commented Aug 30, 2024

Summary

Encountered a KeyError: 'Value based partial slicing on non-monotonic DatetimeIndexes with non-existing keys is not allowed.' when attempting to slice a dataset using DatetimeIndexes in xarray. The error occurs while trying to plot the sea surface temperature data from a Zarr file hosted on S3.

Code to Reproduce

The error occurs when running the following code snippet:

import s3fs
import xarray as xr

zarr_path = "s3://aodn-cloud-optimised/satellite_ghrsst_l3s_1day_daynighttime_single_sensor_australia.zarr/"

fs = s3fs.S3FileSystem(anon=True)
ds = xr.open_zarr(zarr_path, consolidated=True, storage_options={"anon": True})

# Attempting to slice and plot the data
ds.sel(time=slice('2019-01-02', '2019-01-07'), lon=slice(120, 150), lat=slice(-30, -50)).sea_surface_temperature.plot(col='time',col_wrap=3)

Error Message

KeyError: 'Value based partial slicing on non-monotonic DatetimeIndexes with non-existing keys is not allowed.'

Expected Behavior

The code should slice the dataset by the specified time, longitude, and latitude ranges and plot the sea surface temperature data without errors.

Actual Behavior

The code throws a KeyError related to partial slicing on non-monotonic DatetimeIndexes. This suggests that the DatetimeIndex in the dataset is either not sorted or contains missing values, preventing the slice operation from functioning as expected.

Steps to Reproduce

  1. Run the provided code snippet in a Python environment with s3fs and xarray installed.
  2. Observe the KeyError that occurs when executing the .sel() method.

Possible Causes

  • The time index in the dataset might be non-monotonic (not sorted) when generating zarr.
    Screenshot from 2024-08-30 11-47-14
    As the above figure shows, the time value of Zarr is not linear, which means the time is not sorted.

Suggested Fixes

  • Check and ensure that the time index in the dataset is sorted and does not contain gaps.
  • Consider using .sortby('time') on the dataset before slicing to sort the DatetimeIndex.
  • Use .reindex() to align the index or fill in missing dates if necessary.

Proposed Code Fix

# Sort the dataset by time before slicing
ds = ds.sortby('time')

# Attempt the slice again
ds.sel(time=slice('2019-01-02', '2019-01-05'), lon=slice(120, 150), lat=slice(-30, -50)).sea_surface_temperature.plot()

Severity

  • Medium: The error prevents data slicing and visualization, which is crucial for analysis workflows. And could make the downstream workflow down, for example, pygeoAPI cannot process non-monotonic zarr.
@diodon
Copy link

diodon commented Aug 30, 2024

Also, having a non-monotonic increasing or decreasing coordinate makes the file non-CF compliant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants