-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
empty/null datatype #318
Comments
Can you say more about what |
It represents the concept of “something exists under that path, but it has neither dtype nor size”. Like |
We definitely don't have anything like that! Maybe it's useful when you want to partially initialize an array? But I'm not aware of a use case for this in Zarr. |
Sorry, maybe “datatype” was misleading. I think what you’re referring to is a nullable dtype, which definitely has a lot of uses, but is a different question. This is about storing an empty dataset. E.g. |
maybe I don't understand what an "empty dataset" is? It sounds like it's a dataset, but doesn't have certain properties specified, hence my reference to partial initialization. In Zarr, arrays have to have a complete metadata document, and the shape / dtype fields are mandatory, so it's not possible to have something that's a zarr array with an indeterminate or unset shape and dtype. |
Yup, that’s why we’re asking to add support for that. |
Would you want certain fields of the zarr array / group metadata documents to be optional? What's the use case (besides mapping on to hdf5)? |
I wrote down the use case here: #318 (comment) The idea would be to have a marker that represents an empty array. Not 0 dimensions (which means 1 scalar value) but nothing. I don’t know what the best representation would be, but I don’t think making fields optional individually makes sense. E.g. it wouldn’t make sense to have an array without shape but with dtype. Either both or neither. |
Can you explain why you want to represent an empty array? I'm not sure I understand what this is for. The example in #318 (comment) doesn't really explain why you want this. If the goal is to model a partial zarr / hdf5 hierarchy that has "holes", where details of arrays / groups are undefined (e.g. because you want to initialize a hierarchy but you don't know everything yet about your arrays / groups), then I think this can only be done in-memory, e.g. in a software model of a Zarr hierarchy. It's a pretty important aspect of Zarr that the stored representation of arrays (i.e., the contents of the actual metadata documents) are complete and valid, and I don't think there's any way that could change. |
I just want to represent a null value. No hole, nothing undefined or invalid. Just a dataset that contains no data. Or a placeholder thereof. Empty groups are able to exist, but empty datasets aren’t. |
You can make Zarr arrays that contain no data, but they have to have a defined dtype and shape. I thought you were proposing relaxing this restriction, i.e. allowing arrays to have an undefined dtype and shape, but maybe I misinterpreted your request? |
No matter how it’s represented. I was told that there was a discussion about that, but apparently not. If you’re telling me that there will be nothing like it, we’ll make up our own convention, but if there is going to be a canonical way to express that, I’d like to know how that’ll look. |
@ilan-gold pointed out that
|
And what is the purpose of this thing? For example: what does hdf5 use empty datasets for? Once I know the answer to that question, we can sketch out how to achieve the same outcome with Zarr. |
We serialize user-defined Python dictionaries with string keys as Zarr/HDF5 groups. The presence of a sub-group and dataset in that group with name There is a difference between having a dict entry with name If you do, we’ll use this type to represent Nones, if you won’t, we’ll use a convention like a Dataset with shape |
A zarr array would not be my first choice for serializing the literal value |
Thus this feature request. Unless you have another idea of something that’s a better fit than a
That’s what I currently do in the draft PR that waits for this feature request to receive a definite yes or no! |
broadly speaking, I would only use Zarr arrays to represent other n-dimensional arrays, and I would use something else entirely to model python primitives like and I can't give a definite "no" to a feature request, because this repo doesn't really take feature requests? this is the repo for the zarr specifications, which have an extremely long release cycle (we have been working on just implementing v3 of the spec for over a year, and we don't have any plans for a v4 spec). So I would push ahead with your implementation. |
H5Py has
h5py.Empty
, and @ivirshup said something like that was debated here as well.What’s the status of that discussion?
The text was updated successfully, but these errors were encountered: