Support for .zarr reading and writing #398

iacopoff · 2024-04-24T06:55:09Z

iacopoff
Apr 24, 2024

Hi, I am currently running a 20-year daily wflow_sbm simulation over the Alpine region at 1km spatial resolution (but eventually down to 250 m).

This setting ends up in many tens of GB of inputs/outputs, consisting mainly in 3 NetCDF files (forcings.nc, staticmaps.nc, output.nc).

The model run is only one step in a multi-component's workflow that consists in model building, model running, DL-based surrogate training, parameter learning and prediction.

In addition the input/output of each component is pushed to a Spatio-Temporal Asset Catalog STAC.

Summarizing, these are the reasons why I would like wflow to be able to read/write to .zarr, some are specific to my use case but others I think are more general:

Seamless integration with other workflow components without blowing up RAM
Easy to push and fetch from STAC over the internet
Easier data handling with chunking and compression,
Concurrent reading and writing (although that depends on the availability of libraries that can implement that, i.e. dask in python)
Better for cloud computing
Easier data analytics

Now, I am not sure about the maturity of the Julia ecosystem for multidimensional labelled array packages that support both netcdf and zarr IO (I only know YAXArrays.jl) and whether this development is worth the effort as one could also just convert between formats before and after running wflow (though wasting a bit of time and computing), but I was interested to know your opinion on this.

visr · 2024-04-24T07:35:22Z

visr
Apr 24, 2024
Maintainer

Now, I am not sure about the maturity of the Julia ecosystem for multidimensional labelled array packages that support both netcdf and zarr IO (I only know YAXArrays.jl)

Since we are using NCDatasets.jl currently, one way to look into is also support https://github.com/JuliaGeo/ZarrDatasets.jl. They both implement the https://github.com/JuliaGeo/CommonDataModel.jl interface.

0 replies

iacopoff · 2024-04-29T06:23:44Z

iacopoff
Apr 29, 2024
Author

Thanks, it seems that for reading the forcings file something as simple as extending the Union type of CFDataset as const CFDataset = Union{NCDataset,NCDatasets.MFDataset, ZarrDataset} and then allowing a condition for reading ZarrDataset in prepare_reader is sufficient.

The staticmaps file presents more difficulties as there are some methods that, despite the CommonDataModel interface, are not currently compatible, e.g. the keys in nc_dim_name returns different data type whether the argument is of type
NCDatasets keys or ZarrDatasets keys

So far I have only quickly looked at reading .zarr but I have not looked into writing .zarr. I have the impression that writing may be more complicated, but I may be completely wrong.

That said If that's of interest I could try developing at least the reading zarr functionality and see whether you want to merge it. I see that you are planning a 1.0.0 release, would this development clash with your current work?

0 replies

verseve · 2024-05-15T06:28:33Z

verseve
May 15, 2024

Hi @iacopoff, Thanks for already looking into this. I don't think this development will clash with our current work. We will discuss this development also internally and will get back to you asap.

Some questions from my side:

* Seamless integration with other workflow components without blowing up RAM

I understand it can improve integration with other workflows, bu what do you mean with "without blowing up RAM"? Is that the conversion from netCDF to zarr format?

* Easier data handling with chunking and compression,

I do not know the zarr format very well, but chunking and compression is also part of netCDF? Or is the query performance for zarr generally better?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for .zarr reading and writing #398

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Support for .zarr reading and writing #398

iacopoff Apr 24, 2024

Replies: 3 comments

visr Apr 24, 2024 Maintainer

iacopoff Apr 29, 2024 Author

verseve May 15, 2024

iacopoff
Apr 24, 2024

visr
Apr 24, 2024
Maintainer

iacopoff
Apr 29, 2024
Author

verseve
May 15, 2024