Missing data #99

hofmannmartin · 2019-09-27T07:09:00Z

From my experience I can say that we occasionally have to deal with missing data, e.g. when converting measurements of our Bruker system, which e.g. does not store the size of the delta sample, but just its volume. Our current approach is to then assign some non nonsensical value like e.g. 0x0x0 mm³ to the size. However, would it not be better to have a standardized and documented way of dealing with missing data?

In case where a whole data set is unknown or missing we could use empty data sets as proposed here https://docs.h5py.org/en/stable/high/dataset.html and here https://support.hdfgroup.org/HDF5/Tutor/crtdat.html.

In case where only single elements of a data set are unknown we could use NaN https://stackoverflow.com/questions/33656043/hdf5-how-to-handle-empty-rows.

I think former or later every group using the format will likely run into this issue and this could help keep the number of undocumented solutions at bay.

The text was updated successfully, but these errors were encountered:

tknopp · 2019-09-27T13:28:36Z

With my MDF hat on:
One should simply not write non-sensical values into that field. Its optional, just omit it.

With my IBI hat on:
Its an issue that we don't properly write this. In the OpenMPIData we have deltaSample overlaps, which leads to different concentration ranges if not taken into account. That case is fixable by deriving the size from the volume (not perfect but it yields the correct concentration). Lina has run into this issue.

hofmannmartin · 2019-09-27T13:40:35Z

With my MDF hat on:
One should simply not write non-sensical values into that field. Its optional, just omit it.

But what if the field is non-optional?

With my IBI hat on:
Its an issue that we don't properly write this. In the OpenMPIData we have deltaSample overlaps, which leads to different concentration ranges if not taken into account. That case is fixable by deriving the size from the volume (not perfect but it yields the correct concentration). Lina has run into this issue.

Its actually two problems:

As you mentioned we do not write this. How could we fix this?
That we can not represent non-cuboid samples. E.g. if we measure with capillaries, which have circular cross-section it just makes no sense to fill up the delta sample size field.

tknopp · 2019-09-27T13:46:45Z

But what if the field is non-optional?

That case is somewhat hypothetical. We tried to make only those fields mandatory that are really necessary to use the data.

Regarding the concrete field: Maybe we should just add deltaSampleVolume. This would simplify this entire situation. The user of the library then has the issue that she/he needs to inspect both fields but this is actually not such a big deal.

hofmannmartin · 2019-09-27T13:53:11Z

We have volume in the tracer group, which can be used in this case. The problem lies less in the volume, but actually more in the shape of the sample if non-cuboid.

hofmannmartin added the enhancement label Sep 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing data #99

Missing data #99

hofmannmartin commented Sep 27, 2019

tknopp commented Sep 27, 2019

hofmannmartin commented Sep 27, 2019

tknopp commented Sep 27, 2019

hofmannmartin commented Sep 27, 2019

Missing data #99

Missing data #99

Comments

hofmannmartin commented Sep 27, 2019

tknopp commented Sep 27, 2019

hofmannmartin commented Sep 27, 2019

tknopp commented Sep 27, 2019

hofmannmartin commented Sep 27, 2019