Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly populate product fill values in geocoded datasets #167

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

LiangJYu
Copy link
Contributor

@LiangJYu LiangJYu commented Jun 2, 2023

This PR:

  1. correctly populates geocoded dataset fill values based on dataset data type
  2. renames init_geocoded_dataset to create_geocoded_dataset
  3. renames h5py.Dataset variables with grid prefix to data prefix

@LiangJYu LiangJYu requested review from seongsujeong and vbrancat June 2, 2023 01:01
@vbrancat
Copy link
Contributor

@LiangJYu do you mind update this branch? So we can go ahead and test it. Thanks.

@LiangJYu
Copy link
Contributor Author

Unit tests pass with most current code on FN develop branch. Current failure is due to the change in types importing.

@vbrancat vbrancat added the CalVal_point_release Cal/Val point release label Jul 21, 2023
@vbrancat vbrancat added this to the Cal/Val_point_release milestone Jul 21, 2023
@vbrancat
Copy link
Contributor

@LiangJYu do you mind merging the main branch into this PR branch?

Copy link
Contributor

@vbrancat vbrancat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LiangJYu, thanks for this PR. I am a bit lost. I see you implemented a functionality to retrieve the fill value for datasets with different data types. However, I don't see anywhere in the code that these fill values are plugged into the product metadata. Can you clarify?

Also, it would be great to come up with an automated way of understanding which fill value has been used for each data type instead of hard-coding them. If this is not possible, can we group all the fill value somewhere more visible?

src/compass/utils/h5_helpers.py Outdated Show resolved Hide resolved
src/compass/utils/h5_helpers.py Outdated Show resolved Hide resolved
src/compass/utils/h5_helpers.py Outdated Show resolved Hide resolved
@LiangJYu
Copy link
Contributor Author

@LiangJYu, thanks for this PR. I am a bit lost. I see you implemented a functionality to retrieve the fill value for datasets with different data types. However, I don't see anywhere in the code that these fill values are plugged into the product metadata. Can you clarify?

Also, it would be great to come up with an automated way of understanding which fill value has been used for each data type instead of hard-coding them. If this is not possible, can we group all the fill value somewhere more visible?

All the changes have been merged away against main. 😭 I will restore and try to implement you suggestion above.

@vbrancat
Copy link
Contributor

@LiangJYu can we quickly address these comments by EOD?

@LiangJYu
Copy link
Contributor Author

@seongsujeong @vbrancat Comments have been addressed. Products look good from my testing with the Peru dataset. Please have a look. Thanks.

@LiangJYu LiangJYu removed the CalVal_point_release Cal/Val point release label Jul 26, 2023
Copy link
Contributor

@vbrancat vbrancat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind merging the main branch? It would be much easier for testing. Merci.

src/compass/utils/fill_value.py Outdated Show resolved Hide resolved
src/compass/utils/fill_value.py Outdated Show resolved Hide resolved
@vbrancat
Copy link
Contributor

vbrancat commented Aug 7, 2023

@LiangJYu I have processed a CSLC-S1 product using this PR and inspected it using HDF5. In the HDF5 viewer, I am not able to see the _FillValue as an attribute of 2D datasets. Can we insert as attribute for all the 2D datasets (timing corrections included)? Thanks.

Screenshot 2023-08-07 at 10 50 15 AM

Copy link
Contributor

@vbrancat vbrancat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me but for consistency, we should add fill values also to the timing corrections since these are 2D datasets. @seongsujeong, please, go ahead and review this PR as well.

@LiangJYu
Copy link
Contributor Author

LiangJYu commented Aug 8, 2023

Looks good to me but for consistency, we should add fill values also to the timing corrections since these are 2D datasets. @seongsujeong, please, go ahead and review this PR as well.

Safe to assume NaN as fill value for all correction values? Apologies, I don't know them off the top of my head.

@scottstanie
Copy link
Contributor

scottstanie commented Aug 8, 2023

Don't wanna slow this down if you're about to try and get it merged, but if you wanted to cut down on some of the code, you might just be able to use the numpy dtype 'kind'

In [13]: dts = ['complex64', 'float64', 'float32', 'uint8']

In [14]: for dt in dts:
    ...:     print(dt, ':', np.dtype(dt).kind)
    ...:
complex64 : c
float64 : f
float32 : f
uint8 : u

that's how they do this in h5netcdf:

https://github.com/h5netcdf/h5netcdf/blob/2951cf8ea3174dde5663451484bde0cebaa535d0/h5netcdf/legacyapi.py#L24-L30
where they use a dict keyed on the numpy dtype kind

@vbrancat
Copy link
Contributor

@seongsujeong can you go ahead and have another look at this PR? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants