Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow setting fill_value on Zarr format 3 arrays #10161

Merged
merged 5 commits into from
Mar 26, 2025

Conversation

dcherian
Copy link
Contributor

cc @aldenks

  • Closes Zarr v3 fill_value #10064
  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst

@aldenks
Copy link

aldenks commented Mar 21, 2025

Thank you!

attr = "fill_value"
if dtype is float:
# for floats, Xarray inserts a default `np.nan`
expected.foo.attrs["_FillValue"] = np.nan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this assume that np.nan is serializable to the attributes field using the same JSON encoding that we use for the fill_value field? Because the spec doesn't cover how scalars should be encoded in attributes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I think so:

xarray/xarray/backends/zarr.py

Lines 1150 to 1155 in cdebbf5

fill_value = None
if "_FillValue" in attrs:
# replace with encoded fill value
fv = attrs.pop("_FillValue")
if fv is not None:
attrs["_FillValue"] = FillValueCoder.encode(fv, dtype)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the zarr spec only defines a special JSON encoding for nan in the specific context of the fill_value field, I would recommend handling the JSON serialization explicitly here.

Also, In zarr-developers/zarr-python#2874 I remove the behavior where all numpy scalars get special serialization to JSON (instead, the dtype object is responsible for that). But it seems like we will break xarray if we move forward with that change.

Copy link
Contributor

@d-v-b d-v-b Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked what zarr v3 does today -- although we apply special JSON encoding to any nan, +infinity, -inifinity when writing JSON (e.g., {"foo": np.nan} -> {"foo": "NaN"}), we only apply special decoding when reading the fill value. Special nan values in attributes will not get special decoding. Is xarray basically handling that special decoding here? If so, we don't have anything to worry about w.r.t. the new dtype stuff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes looks like Ryan added base64 encode/decode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if we could take that custom attribute encoding decoding stuff out of Xarray and replace it with @d-v-b's new ZarrDType machinery.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have stand-alone functions that handle this, but they are currently not public API. Would be happy to make them public though!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

("this" meaning "convert special floats to / from JSON")

Copy link
Contributor

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this Deepak.

This is a loose end that we left dangling when we implemented Zarr V3 support.

@dcherian dcherian added the plan to merge Final call for comments label Mar 24, 2025
@dcherian dcherian merged commit 7ffdcc7 into pydata:main Mar 26, 2025
34 of 37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Zarr v3 fill_value
4 participants