Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing data #99

Open
hofmannmartin opened this issue Sep 27, 2019 · 4 comments
Open

Missing data #99

hofmannmartin opened this issue Sep 27, 2019 · 4 comments

Comments

@hofmannmartin
Copy link
Member

From my experience I can say that we occasionally have to deal with missing data, e.g. when converting measurements of our Bruker system, which e.g. does not store the size of the delta sample, but just its volume. Our current approach is to then assign some non nonsensical value like e.g. 0x0x0 mm³ to the size. However, would it not be better to have a standardized and documented way of dealing with missing data?

In case where a whole data set is unknown or missing we could use empty data sets as proposed here https://docs.h5py.org/en/stable/high/dataset.html and here https://support.hdfgroup.org/HDF5/Tutor/crtdat.html.

In case where only single elements of a data set are unknown we could use NaN https://stackoverflow.com/questions/33656043/hdf5-how-to-handle-empty-rows.

I think former or later every group using the format will likely run into this issue and this could help keep the number of undocumented solutions at bay.

@tknopp
Copy link
Member

tknopp commented Sep 27, 2019

With my MDF hat on:
One should simply not write non-sensical values into that field. Its optional, just omit it.

With my IBI hat on:
Its an issue that we don't properly write this. In the OpenMPIData we have deltaSample overlaps, which leads to different concentration ranges if not taken into account. That case is fixable by deriving the size from the volume (not perfect but it yields the correct concentration). Lina has run into this issue.

@hofmannmartin
Copy link
Member Author

With my MDF hat on:
One should simply not write non-sensical values into that field. Its optional, just omit it.

But what if the field is non-optional?

With my IBI hat on:
Its an issue that we don't properly write this. In the OpenMPIData we have deltaSample overlaps, which leads to different concentration ranges if not taken into account. That case is fixable by deriving the size from the volume (not perfect but it yields the correct concentration). Lina has run into this issue.

Its actually two problems:

  1. As you mentioned we do not write this. How could we fix this?
  2. That we can not represent non-cuboid samples. E.g. if we measure with capillaries, which have circular cross-section it just makes no sense to fill up the delta sample size field.

@tknopp
Copy link
Member

tknopp commented Sep 27, 2019

But what if the field is non-optional?

That case is somewhat hypothetical. We tried to make only those fields mandatory that are really necessary to use the data.

Regarding the concrete field: Maybe we should just add deltaSampleVolume. This would simplify this entire situation. The user of the library then has the issue that she/he needs to inspect both fields but this is actually not such a big deal.

@hofmannmartin
Copy link
Member Author

We have volume in the tracer group, which can be used in this case. The problem lies less in the volume, but actually more in the shape of the sample if non-cuboid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants