Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where did "zarr extensions" go, or "v2 to v3 migration guide"? #325

Open
yarikoptic opened this issue Jan 13, 2025 · 5 comments
Open

Where did "zarr extensions" go, or "v2 to v3 migration guide"? #325

yarikoptic opened this issue Jan 13, 2025 · 5 comments

Comments

@yarikoptic
Copy link

We were doing digging for finding rationale/migration guide on removal of some data types (such as unicode strings) and ran into

which was merged into https://github.com/zarr-developers/zarr-specs/tree/core-protocol-v3.0-dev branch which seems to no longer exist.
Search for "This specification is a Zarr protocol extension defining data types" across github pointed to only some forks.

Could someone please help us out and potentially point to

  • discussions on rationale behind removal those data types
  • zarr extensions -- are they to be formalized and then where
  • potentially some "migration" guide for users to go from v2 to v3 given those deprecations?

Thank you in advance!

@jbms
Copy link
Contributor

jbms commented Jan 13, 2025

Are you referring to the NumPy fixed-length (zero-padded) string data types, like "|S10" or "<U10"?

@mavaylon1
Copy link

I am not sure if this is what you are looking for: https://zarr.readthedocs.io/en/latest/user-guide/v3_migration.html @yarikoptic

@yarikoptic
Copy link
Author

@mavaylon1 rright ! that answers 2nd question.

@jbms

Are you referring to the NumPy fixed-length (zero-padded) string data types, like "|S10" or "<U10"?

yes, but was overall interested in the destiny of those all docs/protocol/extensions.rst which seems tried to provide extensions to support those datatypes.

@jhamman
Copy link
Member

jhamman commented Jan 27, 2025

In talking with folks on the @zarr-developers/steering-council recently, I understand an update to the extensions conversation is coming any day now. Stay tuned.

@jbms
Copy link
Contributor

jbms commented Jan 28, 2025

We decided to remove them, at least initially, because they introduced a lot of complications and the value was unclear.

  • "O" (python object): This is essentially meaningless as a data type. In practice in zarr v2 it meant you had to check the list of filters to determine the actual data type. This could be replaced in zarr v3 with (yet to be added) data types corresponding to variable-length strings or json.
  • "|S" (fixed-length byte string): Variable-length string is likely to be preferable in almost all cases. Instead, if chunk compatibility with zarr v2 is desired, this could instead be defined as a codec usable with a (yet to be added) variable-length byte string data type.
  • "<U" (fixed-length UTF-32 string): Same caveats as "|S" apply, and in addition UTF-8 encoding would be basically always better than UTF-32. For chunk compatibility this could be defined as a codec for a variable-length unicode string data type.
  • structs: See [v3] Structured dtype support zarr-python#2134 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants