Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[draft] zarr object models #46

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
add motivating hierarchy equality example, give zarr v3 priority in e…
…xamples, change attrs to attributes
  • Loading branch information
d-v-b committed Sep 21, 2023
commit 7bd949dca8837a73ac97047751cd79f48acb92e3
68 changes: 47 additions & 21 deletions draft/ZEP0006.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
layout: default
title: ZEP0006
description: Defining a Zarr Object Model (ZOM)
description: Zarr Object Models (ZOMs)
parent: draft ZEPs
nav_order: 1
---
Expand All @@ -23,7 +23,20 @@ Created: 2023-07-20

This ZEP defines Zarr Object Models, or ZOMs. ZOMs are abstract representations of Zarr hierarchy. The core of a ZOM is a language-independent interface that describes an abstract hierarchy as a tree of nodes.

The base ZOM defines two types of nodes: arrays and groups. Both types of nodes have an `attrs` property, which is an object with string keys and arbitrary values. The base ZOM does not define the exact properties of arrays, as these properties vary with Zarr versions. Groups have a property called `members`, which is an object with string keys and values that are either arrays or groups. A ZOM can be used by applications as the basis for a declarative, type-safe approach to managing Zarr hierarchies.
The base ZOM defines two types of nodes: arrays and groups. Both types of nodes have an `attributes` property, which is an object with string keys and arbitrary values. The base ZOM does not define the exact properties of arrays, as these properties vary with Zarr versions. Groups have a property called `members`, which is an object with string keys and values that are either arrays or groups. A ZOM can be used by applications as the basis for a declarative, type-safe approach to managing Zarr hierarchies.

## Motivation and Scope

The reference python implementation of Zarr provides APIs for managing Zarr groups and Zarr arrays. The final product of these operations is a collection of Zarr groups and arrays, i.e. a Zarr hierarchy. But the reference python implementation does *not* provide APIs for managing Zarr hierarchies directly.

To see why this matters, consider a programmer who wishes to check that two Zarr hierarchies (called "A" and "B") are identically structured, i.e. that the two hierarchies have the same tree structure, with structurally identical nodes. This requires resolving two checks:
- for each Zarr array in hierarchy A, there is a Zarr array in hierarchy B with the same position in the hierarchy, the same metadata, and the same array properties.
- for each Zarr group in hierarchy A, there is a Zarr group in hierarchy B with the same position in the hierarchy, the same metadata, and that the members of both groups have members that pass this check and the previously defined array equality check.

Using an API that only references Zarr arrays and groups, the programmer will be forced to write a new hierarchy equality checking routine for each new hierarchy. But if the programmer has access to a data structure that can represent a Zarr hierarchy, then the aforementioned binary similarity operation can be defined just once for this data structure, and it will work for any two Zarr hierarchies. This is a much better outcome.

There are many situations when programmers must read, validate, and write Zarr hierarchies. Because the Zarr specifications do not define a data structure that represents the Zarr hierarchy itself, i.e. a tree of arrays and groups, developers who attempt to create APIs for manipulating entire Zarr hierarchies must design such a data structure independently, which may lead to unnecessary fragmentation and redundant efforts. Thus, this ZEP introduces these data structures.


## Definition of hierarchy structure

Expand All @@ -33,11 +46,11 @@ Because the structure of a zarr hierarchy is decoupled, by definition, from the

## Specification of the base Zarr Object Model

A node is an object with a property called `attrs` (short for "attributes"), which is a key-value data structure that contains content described as "arbitrary user metadata" in zarr specifications. As of Zarr versions 2 and 3, `attrs` must be a JSON-serializable object.
A node is an object with a property called `attributes` (short for "attributes"), which is a key-value data structure that contains content described as "arbitrary user metadata" in zarr specifications. As of Zarr versions 2 and 3, `attributes` must be a JSON-serializable object.

The base ZOM defines exactly two types of node: groups and arrays. This definition will use the unqualified terms "array" and "group" to refer to the two nodes defined in the ZOM. Where necessary to avoid ambiguity, the objects *represented* by ZOM arrays and ZOM groups, i.e. Zarr arrays and Zarr groups, will be referred to as "Zarr arrays" and "Zarr groups".

ZOM arrays and ZOM groups represent Zarr arrays and Zarr groups in the simplest way possible that still conforms to the definition of "node" given above. Thus, a ZOM array is a node with properties identical to those defined in a particular specification of Zarr array metadata, unless one of those Zarr array properties contains user metadata, in which case a ZOM array does not include that property (since user metadata is already represented by the `attrs` property of the array). This definition is parametric with respect to a particular Zarr specification in order to accomodate future versions of Zarr that may add new properties to Zarr arrays.
ZOM arrays and ZOM groups represent Zarr arrays and Zarr groups in the simplest way possible that still conforms to the definition of "node" given above. Thus, a ZOM array is a node with properties identical to those defined in a particular specification of Zarr array metadata, unless one of those Zarr array properties contains user metadata, in which case a ZOM array does not include that property (since user metadata is already represented by the `attributes` property of the array). This definition is parametric with respect to a particular Zarr specification in order to accomodate future versions of Zarr that may add new properties to Zarr arrays.

Similarly, a ZOM group is a node with properties identical to those defined in a specification of Zarr group metadata, unless one of those properties contains user metadata, in which case a ZOM group does not contain that property, for the same reason given above for arrays. Beyond the properties of Zarr groups defined in a particular Zarr specification, a ZOM group has an additional property:

Expand All @@ -49,12 +62,20 @@ Thus, ZOM groups and ZOM arrays can represent the structure of a Zarr hierarchy,

### ZOM in JSON

The ZOM representation of a Zarr hierarchy can be easily represented as a JSON object. Here is an example of a ZOM group representing a Zarr group that contains a single two-dimensional Zarr array using Zarr version 2. Both the Zarr group and the Zarr array contain user metadata.
The ZOM representation of a Zarr hierarchy can be easily represented as a JSON object.

Here is an example of a ZOM group representing a Zarr group that contains a single two-dimensional Zarr array using Zarr version 3. Both the Zarr group and the Zarr array contain user metadata.

```json
Insert V3 hierarchy example here
```

And the same can be done for a similar hierarchy defined in Zarr V2.

```json
{
"zarr_format" : 2,
"attrs": {
"attributes": {
"foo" : 10,
"bar" : "hello"
},
Expand All @@ -68,15 +89,25 @@ The ZOM representation of a Zarr hierarchy can be easily represented as a JSON o
"fill_value": 0,
"order": "C",
"filters": null,
"attrs" : {
"attributes" : {
"name": "my cool array"
}
}
}
}
```

The ZOM itself can also be represented as a JSON schema. Here is a the ZOM for Zarr V2 expressed as a JSON schema:
To facilitate adoption of new Zarr versions, it may be desirable to define a mapping from ZOM to ZOM, e.g. ZOM[V2] -> ZOM[V3]. Programs could use this mapping to execute automatic conversions of hierarchies to newer Zarr versions.


A ZOM can also be represented as a JSON schema. Here is a the ZOM for Zarr V3 expressed as a JSON schema:

```json
# insert schema for v3 here
```

And likewise for Zarr V2:

```json
{
"$ref": "#/definitions/Group",
Expand All @@ -86,8 +117,8 @@ The ZOM itself can also be represented as a JSON schema. Here is a the ZOM for Z
"description": "Model of a Zarr Version 2 Array",
"type": "object",
"properties": {
"attrs": {
"title": "Attrs",
"attributes": {
"title": "Attributess",
"type": "object"
},
"shape": {
Expand Down Expand Up @@ -155,7 +186,7 @@ The ZOM itself can also be represented as a JSON schema. Here is a the ZOM for Z
}
},
"required": [
"attrs",
"attributess",
"shape",
"chunks",
"dtype",
Expand All @@ -170,8 +201,8 @@ The ZOM itself can also be represented as a JSON schema. Here is a the ZOM for Z
"description": "Model of a Zarr Version 2 Group",
"type": "object",
"properties": {
"attrs": {
"title": "Attrs",
"attributes": {
"title": "Attributes",
"type": "object"
},
"members": {
Expand All @@ -195,7 +226,7 @@ The ZOM itself can also be represented as a JSON schema. Here is a the ZOM for Z
}
},
"required": [
"attrs",
"attributes",
"members"
],
"additionalProperties": false
Expand All @@ -204,12 +235,6 @@ The ZOM itself can also be represented as a JSON schema. Here is a the ZOM for Z
}
```

And Zarr V3:

```json
# insert schema for v3 here
```


## Related Work

Expand All @@ -230,7 +255,8 @@ And Zarr V3:

## References and Footnotes


[^1]: https://github.com/zarr-developers/geozarr-spec
[^2]: http://api.csswg.org/bikeshed/?url=https://raw.githubusercontent.com/ome/ngff/master/0.4/index.bs#multiscale-md
## License

<p xmlns:dct="http://purl.org/dc/terms/">
Expand Down