Support for .zarr reading and writing #398
Replies: 3 comments
-
Since we are using NCDatasets.jl currently, one way to look into is also support https://github.com/JuliaGeo/ZarrDatasets.jl. They both implement the https://github.com/JuliaGeo/CommonDataModel.jl interface. |
Beta Was this translation helpful? Give feedback.
-
Thanks, it seems that for reading the forcings file something as simple as extending the Union type of CFDataset as The staticmaps file presents more difficulties as there are some methods that, despite the CommonDataModel interface, are not currently compatible, e.g. the So far I have only quickly looked at reading .zarr but I have not looked into writing .zarr. I have the impression that writing may be more complicated, but I may be completely wrong. That said If that's of interest I could try developing at least the reading zarr functionality and see whether you want to merge it. I see that you are planning a 1.0.0 release, would this development clash with your current work? |
Beta Was this translation helpful? Give feedback.
-
Hi @iacopoff, Thanks for already looking into this. I don't think this development will clash with our current work. We will discuss this development also internally and will get back to you asap. Some questions from my side:
I understand it can improve integration with other workflows, bu what do you mean with "without blowing up RAM"? Is that the conversion from netCDF to zarr format?
I do not know the zarr format very well, but chunking and compression is also part of netCDF? Or is the query performance for zarr generally better? |
Beta Was this translation helpful? Give feedback.
-
Hi, I am currently running a 20-year daily wflow_sbm simulation over the Alpine region at 1km spatial resolution (but eventually down to 250 m).
This setting ends up in many tens of GB of inputs/outputs, consisting mainly in 3 NetCDF files (forcings.nc, staticmaps.nc, output.nc).
The model run is only one step in a multi-component's workflow that consists in model building, model running, DL-based surrogate training, parameter learning and prediction.
In addition the input/output of each component is pushed to a Spatio-Temporal Asset Catalog STAC.
Summarizing, these are the reasons why I would like wflow to be able to read/write to .zarr, some are specific to my use case but others I think are more general:
Now, I am not sure about the maturity of the Julia ecosystem for multidimensional labelled array packages that support both netcdf and zarr IO (I only know YAXArrays.jl) and whether this development is worth the effort as one could also just convert between formats before and after running wflow (though wasting a bit of time and computing), but I was interested to know your opinion on this.
Beta Was this translation helpful? Give feedback.
All reactions