-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
writing datetime64 in netCDF may produce badly formatted or unreadable files #9488
Comments
Thanks for the report. We've recently reworked the CF masking. Is it possible to check, if the latest version (released yesterday) still has this issue? The other thing is, that nanosecond precision isn't possible with floating point on-disk values. See also #7817 Internally we use int64 which nicely fits NaT with NetCDF int64 default _FillValue. |
@kmuehlbauer thanks for checking. I tried with your latest version (xarray-2024.9.0) and have the same failure. Also ignore the conversion to nanoseconds in my example, was playing around (in particular because of some warning with pandas), but in fact whether or not I convert to nanoseconds I do get the same behaviour. |
Ok, @jfpiolle, thanks for testing this out. If it is not too much of an effort for you, are you able to find a working version in the past? And if there is one, are you familiar with git-bisect? I'll definitely have a look into this the next days. |
@kmuehlbauer I am not familiar with git-bisect; however I can confirm it is working as expected with |
@jfpiolle Thanks for this information. It's really useful. |
@jfpiolle I think I've found the root cause (or at least the point of failure). The following was added in #8575 to improve handling of chunked times (cc @spencerkclark). I'm not sure if I'm missing something, but this cast should not be at this position in the code. I've tested this and removing these two lines makes the above example work. Lines 805 to 806 in 18e5c87
@jfpiolle Are you able to check if the created file conforms to your expectations, if you remove these two lines. I'm sure we can work out a solution but it can be tricky. @spencerkclark This is another one of those tricky points we have to put on the list. Maybe you see a quick fix hiding around the corner? |
Thanks @kmuehlbauer—I’ll try and take a closer look over the weekend. |
@kmuehlbauer I confirm that if I remove l.805-806 it works again and the content of the created files is correct! |
Thanks @spencerkclark, much appreciated, as I can't immediately see, in which cases these lines are needed. |
See #9497 for a quick fix for this issue, though I think you are right @kmuehlbauer that we should rethink the casting in the lines you mentioned. The motivation behind adding the casting in #8575 was to:
In hindsight, I think there is a better way to address (1)—essentially we add a branch where we raise instead of modify the units if With regard to (2), we could decide it is enough of an edge case that we don't try to actively guard against it; for example in principle you could also try to encode ordinary |
@spencerkclark Thanks for taking the time to enjoy us with this enlightening ( ❤️ 💯 ) description of the different problems and their possible solutions. Maybe it might be useful to discuss ways to a more permanent solution in one of the next meetings? |
Sure thing @kmuehlbauer—I'd be happy to discuss these broader issues in an upcoming meeting. I could try to make it to the next one—it looks like that would be on 9/25? I went ahead and put together a sketch of the changes I proposed to eliminate the casting step in #9498, just to give a more concrete sense for what that would look like. |
Great, I've missed last meetings, so good occasion to catch up.
I didn't had a look yet, will do the next days. |
What happened?
When writing datetime64 and NaT values to netCDF (using
h5netcdf
engine) using user defined dtype and _FillValue (to comply to some format requirements), I get different results depending on the precision of datetime64 values.Here I'm writing datetime64 with milliseconds precision:
above works fine, and fill values are correctly written, as shown by the CDL output:
However, if instead I write datetime64 rounded to seconds, the output result is wrong:
The CDL output is as follow (note that -9.22337203685478e+18 is used instead of the required 1e20 as _FillValue):
As the result, attempting to read the result file will fail:
What did you expect to happen?
the content of the output in the second example (with values rounded to seconds) should be similar to the first one (minus the precision of the values of course) since same units, _FillValue and variable dtype are used.
I did not met this issue with earlier versions of xarray.
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
No response
Anything else we need to know?
No response
Environment
Using:
xarray: 2024.7.0
pandas: 2.2.2
numpy: 2.1.1
scipy: 1.14.1
netCDF4: None
pydap: None
h5netcdf: 1.3.0
h5py: 3.11.0
zarr: None
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.8.2
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2024.9.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 74.1.2
pip: 24.2
conda: None
pytest: 6.2.5
mypy: None
IPython: None
sphinx: None
The text was updated successfully, but these errors were encountered: