Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugs in NXmpes definition #230

Closed
rettigl opened this issue May 22, 2024 · 12 comments
Closed

Bugs in NXmpes definition #230

rettigl opened this issue May 22, 2024 · 12 comments
Assignees

Comments

@rettigl
Copy link
Collaborator

rettigl commented May 22, 2024

There seem to be several issues with both the NXmpes definitions, as well as with the parsing code in the latest version of pynxtools

  • I get this line twice:
The data entry corresponding to /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/transformations/scheme is required and hasn't been supplied by the reader.
The data entry corresponding to /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/transformations/scheme is required and hasn't been supplied by the reader.

This path should not exist, and does not occur in NXmpes, so definitely should not be required.

  • The field "kinetic_energy" seems to have slipped out of the definitions again:
Field /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/ENERGYDISPERSION[energydispersion]/kinetic_energy written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/ENERGYDISPERSION[energydispersion]/kinetic_energy/@units = eV, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/ENERGYDISPERSION[energydispersion]/kinetic_energy/@units written without documentation. 
@rettigl
Copy link
Collaborator Author

rettigl commented May 22, 2024

The second part is addressed in PR #128, so only the former remains (probably pynxtools-related?)

@lukaspie
Copy link
Collaborator

There seem to be several issues with both the NXmpes definitions, as well as with the parsing code in the latest version of pynxtools

  • The field "kinetic_energy" seems to have slipped out of the definitions again:
Field /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/ENERGYDISPERSION[energydispersion]/kinetic_energy written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/ENERGYDISPERSION[energydispersion]/kinetic_energy/@units = eV, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/ENERGYDISPERSION[energydispersion]/kinetic_energy/@units written without documentation. 

I think NXenergydispersion]/kinetic_energy is so far only in the NXmpes_arpes PR that has not been merged. I have created a separate PR that fixes it independently.

@lukaspie
Copy link
Collaborator

The second part is addressed in PR #128, so only the former remains (probably pynxtools-related?)

Sorry, I did not see your comment. Feel free to merge the separate PR first or we merge it with #128.

@lukaspie
Copy link
Collaborator

  • I get this line twice:
The data entry corresponding to /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/transformations/scheme is required and hasn't been supplied by the reader.
The data entry corresponding to /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/transformations/scheme is required and hasn't been supplied by the reader.

I don't really understand this error. There is no named field transformations in either NXmpes/NXinstrument/NXelectronanalyser or in the base class NXmpes. I guess you are working with NXmpes_arpes from #128? But even there, there is no scheme inside /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/transformations, so I don't even get where the error is coming from.

@rettigl
Copy link
Collaborator Author

rettigl commented May 22, 2024

I don't really understand this error. There is no named field transformations in either NXmpes/NXinstrument/NXelectronanalyser or in the base class NXmpes. I guess you are working with NXmpes_arpes from #128? But even there, there is no scheme inside /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/transformations, so I don't even get where the error is coming from.

Me neither. I just used the latest definitions installed by pynxtools. I will look into this when I find time.

@rettigl
Copy link
Collaborator Author

rettigl commented May 22, 2024

This bug must be somewhere in the validation code. However, I don't understand it well enough...

@rettigl
Copy link
Collaborator Author

rettigl commented May 22, 2024

The issue is somewhere here:
https://github.com/FAIRmat-NFDI/pynxtools/blob/71020b564b88e33a8e50770315b6cfa63ffea31e/pynxtools/dataconverter/validation.py#L323
For
grafik
I get
grafik
which does not make sense to me.
It appears it is due to this line:
https://github.com/FAIRmat-NFDI/pynxtools/blame/71020b564b88e33a8e50770315b6cfa63ffea31e/pynxtools/dataconverter/validation.py#L201
The problem here is that transformations is defined as "transformations" and not as "TRANSFORMATIONS[transformations]", as it should be. Thus, it is not in "get_all_children_names", and is added even though get_nx_namefit returns 0.
As I don't understand the logic here really, I cannot debug further.

So, it makes sense that the code complains (as lower_case transformations is not defined), but the error message is very unintuitive and misleading.

@rettigl
Copy link
Collaborator Author

rettigl commented May 22, 2024

If I try to use the inherited appdef NXmpes_arpes, things become really crazy:

The value at /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/ENERGYDISPERSION[energydispersion]/entrance_slit/shape should be on of the following strings: ['straight slit', 'curved slit']
The data entry corresponding to /ENTRY[entry]/INSTRUMENT[instrument]/energy_resolution/depends_on is required and hasn't been supplied by the reader.
The required group, /ENTRY[entry]/INSTRUMENT[instrument]/energy_resolution/transformations, hasn't been supplied.
The data entry corresponding to /ENTRY[entry]/INSTRUMENT[instrument]/pressure_gauge/depends_on is required and hasn't been supplied by the reader.
The required group, /ENTRY[entry]/INSTRUMENT[instrument]/pressure_gauge/transformations, hasn't been supplied.
Missing attribute: "/ENTRY[entry]/data/@energy_indices"
Missing attribute: "/ENTRY[entry]/data/@angular0_indices"
Missing attribute: "/ENTRY[entry]/data/@angular1_indices"
Missing attribute: "/ENTRY[entry]/data/@angular0_depends"
Missing attribute: "/ENTRY[entry]/data/@angular1_depends"
Missing attribute: "/ENTRY[entry]/data/@energy_depends"
The data entry corresponding to /ENTRY[entry]/data/energy is required and hasn't been supplied by the reader.
There were attributes set for the field /ENTRY[entry]/data/energy, but the field does not exist.
The data entry corresponding to /ENTRY[entry]/data/angular0 is required and hasn't been supplied by the reader.
The data entry corresponding to /ENTRY[entry]/data/angular1 is required and hasn't been supplied by the reader.
Field /ENTRY[entry]/INSTRUMENT[instrument]/energy_resolution/resolution written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/energy_resolution/resolution/@units = meV, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/energy_resolution/resolution/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/energy_resolution/physical_quantity written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/energy_resolution/type written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/pressure_gauge/name written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/pressure_gauge/measurement written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/pressure_gauge/value written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/pressure_gauge/value/@units = mbar, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/pressure_gauge/value/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/device_information/vendor written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/device_information/model written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/ENERGYDISPERSION[energydispersion]/kinetic_energy written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/ENERGYDISPERSION[energydispersion]/kinetic_energy/@units = eV, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/ENERGYDISPERSION[energydispersion]/kinetic_energy/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/ENERGYDISPERSION[energydispersion]/exit_slit/shape written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/sourceTYPE[source_probe]/associated_beam written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/distance written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/distance/@units = mm, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/distance/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/incident_energy written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/incident_energy/@units = eV, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/incident_energy/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/incident_energy_spread written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/incident_energy_spread/@units = eV, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/incident_energy_spread/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/pulse_duration written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/pulse_duration/@units = fs, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/pulse_duration/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/incident_polarization written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/incident_polarization/@units = V^2/mm^2, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/incident_polarization/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/extent written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/extent/@units = µm, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/extent/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/associated_source written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/sourceTYPE[source_pump]/associated_beam written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/distance written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/distance/@units = mm, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/distance/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/incident_energy written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/incident_energy/@units = eV, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/incident_energy/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/incident_energy_spread written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/incident_energy_spread/@units = eV, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/incident_energy_spread/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/incident_wavelength written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/incident_wavelength/@units = nm, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/incident_wavelength/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/pulse_duration written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/pulse_duration/@units = fs, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/pulse_duration/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/incident_polarization written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/incident_polarization/@units = V^2/mm^2, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/incident_polarization/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/pulse_energy written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/pulse_energy/@units = µJ, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/pulse_energy/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/average_power written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/average_power/@units = mW, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/average_power/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/extent written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/extent/@units = µm, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/extent/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/fluence written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/fluence/@units = mJ/cm^2, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/fluence/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/associated_source written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/MANIPULATOR[manipulator]/temperature_sensor/name requires a unit in the unit category NX_TRANSFORMATION.
The value at /ENTRY[entry]/INSTRUMENT[instrument]/MANIPULATOR[manipulator]/temperature_sensor/name should be one of: (<class 'int'>, <class 'float'>, <class 'numpy.ndarray'>, <class 'numpy.signedinteger'>, <class 'numpy.unsignedinteger'>, <class 'numpy.floating'>, <class 'dict'>), as defined in the NXDL as NX_NUMBER.
Field /ENTRY[entry]/INSTRUMENT[instrument]/MANIPULATOR[manipulator]/temperature_sensor/measurement requires a unit in the unit category NX_TRANSFORMATION.
The value at /ENTRY[entry]/INSTRUMENT[instrument]/MANIPULATOR[manipulator]/temperature_sensor/measurement should be one of: (<class 'int'>, <class 'float'>, <class 'numpy.ndarray'>, <class 'numpy.signedinteger'>, <class 'numpy.unsignedinteger'>, <class 'numpy.floating'>, <class 'dict'>), as defined in the NXDL as NX_NUMBER.
Field /ENTRY[entry]/INSTRUMENT[instrument]/MANIPULATOR[manipulator]/sample_bias_voltmeter/name requires a unit in the unit category NX_TRANSFORMATION.
The value at /ENTRY[entry]/INSTRUMENT[instrument]/MANIPULATOR[manipulator]/sample_bias_voltmeter/name should be one of: (<class 'int'>, <class 'float'>, <class 'numpy.ndarray'>, <class 'numpy.signedinteger'>, <class 'numpy.unsignedinteger'>, <class 'numpy.floating'>, <class 'dict'>), as defined in the NXDL as NX_NUMBER.
Field /ENTRY[entry]/INSTRUMENT[instrument]/MANIPULATOR[manipulator]/sample_bias_voltmeter/measurement requires a unit in the unit category NX_TRANSFORMATION.
The value at /ENTRY[entry]/INSTRUMENT[instrument]/MANIPULATOR[manipulator]/sample_bias_voltmeter/measurement should be one of: (<class 'int'>, <class 'float'>, <class 'numpy.ndarray'>, <class 'numpy.signedinteger'>, <class 'numpy.unsignedinteger'>, <class 'numpy.floating'>, <class 'dict'>), as defined in the NXDL as NX_NUMBER.
Field /ENTRY[entry]/INSTRUMENT[instrument]/MANIPULATOR[manipulator]/drain_current_amperemeter/name requires a unit in the unit category NX_TRANSFORMATION.
The value at /ENTRY[entry]/INSTRUMENT[instrument]/MANIPULATOR[manipulator]/drain_current_amperemeter/name should be one of: (<class 'int'>, <class 'float'>, <class 'numpy.ndarray'>, <class 'numpy.signedinteger'>, <class 'numpy.unsignedinteger'>, <class 'numpy.floating'>, <class 'dict'>), as defined in the NXDL as NX_NUMBER.
Field /ENTRY[entry]/INSTRUMENT[instrument]/MANIPULATOR[manipulator]/drain_current_amperemeter/measurement requires a unit in the unit category NX_TRANSFORMATION.
The value at /ENTRY[entry]/INSTRUMENT[instrument]/MANIPULATOR[manipulator]/drain_current_amperemeter/measurement should be one of: (<class 'int'>, <class 'float'>, <class 'numpy.ndarray'>, <class 'numpy.signedinteger'>, <class 'numpy.unsignedinteger'>, <class 'numpy.floating'>, <class 'dict'>), as defined in the NXDL as NX_NUMBER.
Field /ENTRY[entry]/SAMPLE[sample]/temperature/temperature_sensor written without documentation.
Field /ENTRY[entry]/SAMPLE[sample]/gas_pressure/pressure_gauge written without documentation.

To me it appears this validation code has not been thoroughly tested enough...

@domna
Copy link

domna commented Jun 3, 2024

The problem here is that transformations is defined as "transformations" and not as "TRANSFORMATIONS[transformations]", as it should be. Thus, it is not in "get_all_children_names", and is added even though get_nx_namefit returns 0.

Indeed that is a problem with the algorithm. It goes through all required fields and tries to find a namefit the available names and they are only considered if they are not in all the children names (here including all inherited ones). This means that it also can namefit a single name twice if it matches to multiple categories (including the consequences of missing fields one level down). For this we need to resolve the ambiguity by stating it with the class as you correctly found. I think we can improve the algorithm for this in the future but for the first basic version we need to resolve the ambiguity by stating the class.

A return value of 0 for get_nx_namefit means that it fits the category but has nothing in common. -1 means no fit. Here is the docstring of the function explaining it in more detail:

def get_nx_namefit(hdf_name: str, name: str, name_any: bool = False) -> int:
"""
Checks if an HDF5 node name corresponds to a child of the NXDL element.
A group of uppercase letters anywhere in the name is treated as freely choosable
part of this name.
If a match is found this function returns twice the length for an exact match,
otherwise the number of matching characters (case insensitive) or zero, if
`name_any` is set to True, is returned.
All uppercase groups are considered independently.
Lowercase matches are independent of uppercase group lengths, e.g.,
an hdf_name `get_nx_namefit("my_fancy_yet_long_name", "my_SOME_name")` would
return a score of 8 for the lowercase matches `my_..._name`.
All characters in `[a-zA-Z0-9_.]` are considered for matching to an uppercase letter.
If you use any other letter in the name, it will not match and return -1.
Periods at the beginning or end of the hdf_name are not allowed, only exact
matches will be considered.
Examples:
* `get_nx_namefit("test_name", "TEST_name")` returns 9
* `get_nx_namefit("te_name", "TEST_name")` returns 7
* `get_nx_namefit("my_other_name", "TEST_name")` returns 5
* `get_nx_namefit("test_name", "test_name")` returns 18
* `get_nx_namefit("test_other", "test_name")` returns -1
Args:
hdf_name (str): The hdf_name, containing the name of the HDF5 node.
name (str): The concept name to match against.
name_any (bool, optional):
Accept any name and return either 0 (match) or -1 (no match).
Defaults to False.
Returns:
int: -1 if no match is found or the number of matching
characters (case insensitive).
"""

So, it makes sense that the code complains (as lower_case transformations is not defined), but the error message is very unintuitive and misleading.

I agree. Generally, all error messages and reporting for the new verification have room for improvement. I plan to do improvements with the cli #validation and hdf tree traversal: FAIRmat-NFDI/pynxtools#333

To me it appears this validation code has not been thoroughly tested enough...

Indeed, we brought this quickly with only a small testing set on our examples. But the old algorithm was broken anyways, was slow and also did not verify everything (especially multiple groups belonging to the same concept) so we decided to bring it in already and then fix all the upcoming issues with the new algorithm.

Your long list of problems, I believe, arise from the fact that extends is not yet supported at all. Therefore, the algorithm just works on NXmpes_arpes without any fields inherited from the parent. I will implement support for this in the next days and test whether this runs through without any errors (probably there are also some other small issues then, especially NXdata specialities are still very hacky and untested).

@domna
Copy link

domna commented Jun 4, 2024

@rettigl I started to work on supporting the extends keyword here: FAIRmat-NFDI/pynxtools#339

Which example did you use for the above tests? Then I'll use this for testing and debugging.

@rettigl
Copy link
Collaborator Author

rettigl commented Jun 4, 2024

Which example did you use for the above tests? Then I'll use this for testing and debugging.

This was the conversion of the phoibos scan to NXmpes_arpes, which I did on the central Nomad, with updated packages in the jupyterlab container

@lukaspie
Copy link
Collaborator

lukaspie commented Jul 3, 2024

Closed as handled by FAIRmat-NFDI/pynxtools#339

@lukaspie lukaspie closed this as completed Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants