Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing and reading NetCDF file for FCI does not retain fill value #3058

Open
gerritholl opened this issue Feb 11, 2025 · 4 comments · May be fixed by #3060
Open

Writing and reading NetCDF file for FCI does not retain fill value #3058

gerritholl opened this issue Feb 11, 2025 · 4 comments · May be fixed by #3060

Comments

@gerritholl
Copy link
Member

Describe the bug

Writing a NetCDF file from FCI data, then reading the resulting NetCDF file, does not retain fill value information. Explicitly including _FillValue with the encoding fails with ValueError.

To Reproduce

Write a NetCDF file:

import hdf5plugin
from satpy import Scene
from glob import glob
filenames = glob("/media/nas/x23352/MTG/FCI/L1c-cases/20241016-vergleich-hrv-vis06-nir08/W*FDHSI*BODY*C_0061*.nc")
sc = Scene(filenames=filenames, reader="fci_l1c_nc")
chans = ["ir_105"]
sc.load(chans)
sc.save_datasets(
        writer="cf",
        format="NETCDF4",
        filename="/media/nas/x21308/scratch/FCI/{platform_name}-{sensor}-{start_time:%Y%m%d%H%M%S}-{end_time:%Y%m%d%H%M%S}.nc",
        encoding={"ir_105": {"dtype": "int32", "zlib": True}}, #, "_FillValue": 0}},
        include_lonlats=False)

and read it again:

from satpy import Scene
sc2 = Scene(filenames={"satpy_cf_nc": ["/media/nas/x21308/scratch/FCI/MTG-I1-fci-20241016100000-20241016101000.nc"]})
sc2.load(["ir_105"])
print(sc2["ir_105"][0, 0].compute())

Expected behavior

I expect that I can read the data that I have written without jumping through hoops, while retaining fill value information.

Actual results

Attempt 1: rely on fill value already in the data

No error messages, but the fill value written to the NetCDF does not match the actually masked data. When reading the data again, the fill value is not understood:

<xarray.DataArray 'ir_105' ()> Size: 8B
array(-2.14748365e+09)
Coordinates:
    y        float64 8B -5.567e+06
    x        float64 8B -5.567e+06
    crs      object 8B PROJCRS["unknown",BASEGEOGCRS["unknown",DATUM["Unknown...
Attributes: (12/22)
    orbital_parameters:   {'satellite_actual_longitude': -0.25698670372366905...
    ancillary_variables:  []
    calibration:          brightness_temperature
    end_time:             2024-10-16 10:10:00
    file_type:            fci_l1c_fdhsi
    grid_mapping:         mtg_fci_fdss_2km
    ...                   ...
    wavelength:           10.5 µm (9.8-11.2 µm)
    history:              Created by pytroll/satpy on 2025-02-11 16:07:19.534814
    Conventions:          CF-1.7
    area:                 Area ID: mtg_fci_fdss_2km\nDescription: mtg_fci_fds...
    name:                 ir_105
    _satpy_id:            DataID(name='ir_105', wavelength=WavelengthRange(mi...

Attempt 2: use _FillValue in the encoding

Traceback (most recent call last):
  File "/home/gholl/checkouts/protocode/fci-nc-fill-value.py", line 8, in <module>
    sc.save_datasets(
    ~~~~~~~~~~~~~~~~^
            writer="cf",
            ^^^^^^^^^^^^
    ...<2 lines>...
            encoding={"ir_105": {"dtype": "int32", "zlib": True, "_FillValue": 0}},
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
            include_lonlats=False)
            ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gholl/checkouts/satpy/satpy/scene.py", line 1300, in save_datasets
    return writer.save_datasets(dataarrays, compute=compute, **save_kwargs)
           ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gholl/checkouts/satpy/satpy/writers/cf_writer.py", line 309, in save_datasets
    res = ds.to_netcdf(filename,
                       engine=engine,
    ...<2 lines>...
                       encoding=encoding,
                       **other_to_netcdf_kwargs)
  File "/home/gholl/miniforge3/envs/py313/lib/python3.13/site-packages/xarray/core/dataset.py", line 2380, in to_netcdf
    return to_netcdf(  # type: ignore[return-value]  # mypy cannot resolve the overloads:(
        self,
    ...<10 lines>...
        auto_complex=auto_complex,
    )
  File "/home/gholl/miniforge3/envs/py313/lib/python3.13/site-packages/xarray/backends/api.py", line 1928, in to_netcdf
    dump_to_store(
    ~~~~~~~~~~~~~^
        dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/gholl/miniforge3/envs/py313/lib/python3.13/site-packages/xarray/backends/api.py", line 1975, in dump_to_store
    store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gholl/miniforge3/envs/py313/lib/python3.13/site-packages/xarray/backends/common.py", line 454, in store
    variables, attributes = self.encode(variables, attributes)
                            ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gholl/miniforge3/envs/py313/lib/python3.13/site-packages/xarray/backends/common.py", line 638, in encode
    variables, attributes = cf_encoder(variables, attributes)
                            ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gholl/miniforge3/envs/py313/lib/python3.13/site-packages/xarray/conventions.py", line 784, in cf_encoder
    new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()}
                   ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^
  File "/home/gholl/miniforge3/envs/py313/lib/python3.13/site-packages/xarray/conventions.py", line 102, in encode_cf_variable
    var = coder.encode(var, name=name)
  File "/home/gholl/miniforge3/envs/py313/lib/python3.13/site-packages/xarray/coding/variables.py", line 401, in encode
    fill_value = pop_to(encoding, attrs, "_FillValue", name=name)
  File "/home/gholl/miniforge3/envs/py313/lib/python3.13/site-packages/xarray/coding/variables.py", line 217, in pop_to
    safe_setitem(dest, key, value, name=name)
    ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gholl/miniforge3/envs/py313/lib/python3.13/site-packages/xarray/coding/variables.py", line 198, in safe_setitem
    raise ValueError(
    ...<4 lines>...
    )
ValueError: failed to prevent overwriting existing key _FillValue in attrs on variable 'ir_105'. This is probably an encoding field used by xarray to describe how a variable is serialized. To proceed, remove this key from the variable's attributes manually.

Attempt 3: fill_value=0 is not recognised by this writer:

Traceback (most recent call last):
  File "/home/gholl/checkouts/protocode/fci-nc-fill-value.py", line 8, in <module>
    sc.save_datasets(
    ~~~~~~~~~~~~~~~~^
            writer="cf",
            ^^^^^^^^^^^^
    ...<3 lines>...
            include_lonlats=False,
            ^^^^^^^^^^^^^^^^^^^^^^
            fill_value=0)
            ^^^^^^^^^^^^^
  File "/home/gholl/checkouts/satpy/satpy/scene.py", line 1300, in save_datasets
    return writer.save_datasets(dataarrays, compute=compute, **save_kwargs)
           ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gholl/checkouts/satpy/satpy/writers/cf_writer.py", line 309, in save_datasets
    res = ds.to_netcdf(filename,
                       engine=engine,
    ...<2 lines>...
                       encoding=encoding,
                       **other_to_netcdf_kwargs)
TypeError: Dataset.to_netcdf() got an unexpected keyword argument 'fill_value'

Screenshots
If applicable, add screenshots to help explain your problem.

Environment Info:

  • Satpy Version: main (v0.54.0-84-gc776b2607)

Additional context

As a workaround, attempt 2 works if I add del sc["ir_105"].attrs["_FillValue"] before saving the data, but this should not be the responsibility of the user and is cumbersome and buggy-prone in an operational environment, such as through trollflow2.

@gerritholl
Copy link
Member Author

If I use calibration="counts", attempt 1 does produce a NetCDF file with a fill value matching the stored data.

@gerritholl
Copy link
Member Author

The problem (when calibration is not "counts") is that satpy applies the fill value but also retains it in the metadata. That leads to a fill value (65535) that does not match the actually used value for missing data (nan, then when stored with int32, -2147483648).

Either the FCI reader should delete the fill value attribute if it has been applied, or the CF writer should set the correct fill value when applying the dtype. One way is to simply delete the fill value attribute, like in the approach under "Additional context".

@djhoese
Copy link
Member

djhoese commented Feb 11, 2025

It is common practice (dare I say recommended or maybe even required) for readers to remove _FillValue and valid_range/valid_min/valid_max for non-integer products. Typically for NetCDF files these values represent the on-disk data, not the unscaled data in memory so I prefer to remove them to be more accurate. In the case of valid_X attributes sometimes I will update them in the reader to represent the in-memory data values so they can be used by users or writers (ex. the AWIPS SCMI tile writer). I suppose the long term and possibly better solution would be to move this stuff to the .encoding of the DataArray to preserve it as long as possible. I'm not sure how long it persists though through arithmetic operations.

@gerritholl
Copy link
Member Author

The reader pops FillValue but not _FillValue:

fv = attrs.pop(
"FillValue",
default_fillvals.get(data.dtype.str[1:], np.float32(np.nan)))
vr = attrs.get("valid_range", [np.float32(-np.inf), np.float32(np.inf)])
if key["calibration"] == "counts":
attrs["_FillValue"] = fv
nfv = data.dtype.type(fv)
else:
nfv = np.float32(np.nan)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants