Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fairmat 2024: add target attribute to NXdata #1410

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

lukaspie
Copy link
Contributor

@lukaspie lukaspie commented Sep 23, 2024

This PR introduces a new concept for NXdata. Essentially, the idea is here that a new attribute is added to NXdata to indicate the origin of the data. The whole idea comes from discussion with instrument manufacturers who would like to combine multiple detector readings into the top-level NXdata group (e.g., for spin-resolved photoemission measurements) and would like to keep track for both them and the user where this data originated.

In the case of a link, the value of this attribute would be the target of the link. That is, if an NXdata group in entry/data is linked from an NXdata group on the detector, the value of this attribute would be ´entry/dectector/detector_data. Similarly, both for the ´AXISNAME and DATA fields, such an attribute is added to indicate directly where this axis/data field is originating from.

This new attribute serves multiple purposes:

  1. it provides human-readable information about links that would otherwise need to be extracted from whatever API one was using to write/read the file.
  2. As @rayosborn correctly pointed out below, if you use the nexusformat API, using the nxlink, attribute it is easy to resolve to the linked object (however, it is less easy using h5py or any other HDF5 library). In any case, also with the nexusformat API, this resolving however works only if e.g. the NXdata and the referenced data group are in the same file (as far as I understand, please correct me if I'm wrong). With the target attribute, there is at least the option to keep track from where this data originally came from, even if the link is not directly resolvable anymore.
  3. Another use case would be that you do not actually want to make a link, but rather a copy of the data. An example would be that you create a field somewhere that has multiple other attributes, but you would really only like to bring the field values and not the attributes to your NXdata/AXISNAME. With a link, that would be impossible because in HDF5, the two concepts would then refer to the same object. Thus, one would rather like to make a copy the field values, but still indicate with this new attribute where the data was originally from.
  4. In addition to linking, this new attribute could also be used for a string notation describing e.g. "the axis xy comes from here and there". This was sugggested by the manufacturers at some point, but I would (at least for now) rather limit it to the cases that I outlined above (where at least a target also exists somewhere else).

Please note that this attribute was originally called @reference, with the idea of referring back to other data. It was changed to @target upon request by the NIAC in the NIAC meeting 2024, but in my opinion, these concepts are slightly different and a different name would fit that attribute better.

@sanbrock sanbrock mentioned this pull request Sep 26, 2024
@lukaspie lukaspie linked an issue Sep 29, 2024 that may be closed by this pull request
If the data corresponds to a readout of a detector, ``@reference`` links
to that detectors data:

@reference: '/entry/instrument/detector/data' for a 2D detector
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can remove "2D"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I changed it

@benajamin
Copy link
Contributor

benajamin commented Sep 29, 2024

This is fine for a path in the same file, but what if the NXdata and the referenced data field are not in the same file?

Edit: the target attribute of a link also doesn't refer to external files, so I guess this is not worse.

<doc>
Points to the path of a field defining the data to which the `DATA` group refers.

This concept allows to link the data to a respective field in the NeXus hierarchy, thereby
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest rewording to avoid using the word "link":
"This attribute indicates the origin of the data, thereby providing an association to further metadata that define the physical quantity that the data represents"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implemented

@prjemian
Copy link
Contributor

prjemian commented Sep 29, 2024 via email

@phyy-nx
Copy link
Contributor

phyy-nx commented Sep 29, 2024

The NIAC asks if it's possible to use the target attribute instead of adding a new attribute reference. This could affect the target documentation because in the past it's only been used for links, but here it's desired to use the target attribute without links. This could also affect nexusformat API.

@prjemian
Copy link
Contributor

prjemian commented Sep 29, 2024 via email

@lukaspie lukaspie force-pushed the fairmat-2024-nxdata branch 2 times, most recently from 85ea386 to 903ffbe Compare October 16, 2024 09:10
@rayosborn
Copy link
Contributor

I think this PR is premature. I believe that at the recent NIAC, we agreed to look into whether the same thing could be achieved with the target attribute. That work has not been done. At the very least, this should wait until the next Telco.

@rayosborn
Copy link
Contributor

As a quick follow-up to my last comment, I see that the proposal has now been changed to use 'target' instead of 'reference'. I still think this needs some further discussion, because this is a major structural change to the format.

Personally, I believe that the PR should be resubmitted, with a detailed introduction explaining the rationale for all these changes and what the consequences are. I don't believe it's good practice to submit a PR with absolutely no explanation because it makes it impossible for future readers to know the context of the request and it makes it very difficult for current reviewers to understand what is proposed and why. I know we discussed it in the NIAC but I didn't really understand why it was needed then and I understand it less now.

For example, the current docstrings say the target "attribute indicates the origin of the data, thereby providing an association to further metadata that define the physical quantity that the data represents."

What does "origin" mean here if the data is not actually linked? Can the field with the target attribute have no corresponding field (at any time) in the target group? Can it be present but with different contents? I seem to recall (perhaps incorrectly) that part of the issue was the need to create the NXdata group before the target exists in the NXdetector group. This can be achieved now without making any changes. In HDF5 and the nexusformat API, you can create a soft link before the target exists.

In the following, the soft link is created before the target object. The nxtarget attribute shows its path, but the nxlink attribute, which resolves to the linked object, shows an invalid path since it doesn't yet exist. Once it's created, the nxlink attribute returns the linked object.

>>> root['entry/data/data'] = NXlink(target='entry/instrument/detector/data', soft=True)
>>> root['entry/data/data'].nxtarget
'/entry/instrument/detector/data'
>>> root['entry/data/data'].nxlink
NeXusError: Invalid path
>>> root['entry/instrument/detector/data'] = np.array(...)
>>> root['entry/data/data'].nxlink
NXfield('data')
>>> root['entry/data/data'].nxlink.nxpath
'/entry/instrument/detector/data'

Does that accomplish what you need? It's true that, if you use h5py instead of the nexusformat API, you might want to manually add the target attribute to the linked object after it has been created, but you could use other h5py functions to identify the target path.

The point of this discussion is that you can probably achieve what you need without a major structural change to the format, but we need to know the precise use case you have in mind to know how. I am willing to amend the nexusformat API to facilitate your use case.

@lukaspie
Copy link
Contributor Author

lukaspie commented Dec 9, 2024

As a quick follow-up to my last comment, I see that the proposal has now been changed to use 'target' instead of 'reference'. I still think this needs some further discussion, because this is a major structural change to the format.

Personally, I believe that the PR should be resubmitted, with a detailed introduction explaining the rationale for all these changes and what the consequences are. I don't believe it's good practice to submit a PR with absolutely no explanation because it makes it impossible for future readers to know the context of the request and it makes it very difficult for current reviewers to understand what is proposed and why. I know we discussed it in the NIAC but I didn't really understand why it was needed then and I understand it less now.

For example, the current docstrings say the target "attribute indicates the origin of the data, thereby providing an association to further metadata that define the physical quantity that the data represents."

What does "origin" mean here if the data is not actually linked? Can the field with the target attribute have no corresponding field (at any time) in the target group? Can it be present but with different contents? I seem to recall (perhaps incorrectly) that part of the issue was the need to create the NXdata group before the target exists in the NXdetector group. This can be achieved now without making any changes. In HDF5 and the nexusformat API, you can create a soft link before the target exists.

In the following, the soft link is created before the target object. The nxtarget attribute shows its path, but the nxlink attribute, which resolves to the linked object, shows an invalid path since it doesn't yet exist. Once it's created, the nxlink attribute returns the linked object.

>>> root['entry/data/data'] = NXlink(target='entry/instrument/detector/data', soft=True)
>>> root['entry/data/data'].nxtarget
'/entry/instrument/detector/data'
>>> root['entry/data/data'].nxlink
NeXusError: Invalid path
>>> root['entry/instrument/detector/data'] = np.array(...)
>>> root['entry/data/data'].nxlink
NXfield('data')
>>> root['entry/data/data'].nxlink.nxpath
'/entry/instrument/detector/data'

Does that accomplish what you need? It's true that, if you use h5py instead of the nexusformat API, you might want to manually add the target attribute to the linked object after it has been created, but you could use other h5py functions to identify the target path.

The point of this discussion is that you can probably achieve what you need without a major structural change to the format, but we need to know the precise use case you have in mind to know how. I am willing to amend the nexusformat API to facilitate your use case.

@rayosborn thanks for the pointers in the nexusformat API. This idea is not really about the order when links are created, but also for cases where not necessarily a link in HDf5 is created, but one would still like to say where the data originated from. I outlined multiple usage above, hope that helps to clarify the purpose of this new attribute. I/we would be happy to further discuss this.

@lukaspie lukaspie marked this pull request as ready for review December 18, 2024 15:45
Copy link
Contributor

@phyy-nx phyy-nx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When these changes are done, ok to move to online vote, as discussed by NIAC meeting Dec 18 2024

If the data corresponds to a readout of a detector, ``@target`` points
to that detectors data:

@target: '/entry/instrument/detector/data' for a detector
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: detectors --> detector's

Also, use a different example that doesn't use detector twice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point here is to say: an NXdata group somewhere (i.e., not on the detector) is actually originating from the NXdata group that is part of the NXdetector. I changed this now to

Example:
    If the root-level ``NXdata`` corresponds to a readout of a specific detector called "my_detector", ``@target`` points
	to the ``NXdata`` group of the `NXdetector` group describing that detector:

	  @target: '/entry/instrument/my_detector/data'

Would that be helpful @phyy-nx? I think the problem also occurs with any other example.

@@ -309,6 +309,20 @@

</doc>
</attribute>
<attribute name="target">
<doc>
Points to the path of a field defining the data to which the `DATA` group refers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

field --> group

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

@lukaspie lukaspie force-pushed the fairmat-2024-nxdata branch from 1618cc8 to 42877b3 Compare December 19, 2024 13:37
@lukaspie lukaspie changed the title Fairmat 2024: add reference attribute to NXdata Fairmat 2024: add target attribute to NXdata Jan 15, 2025
lukaspie and others added 6 commits January 16, 2025 09:17
# Conflicts:
#	base_classes/nyaml/NXdata.yaml
# Conflicts:
#	base_classes/NXsource.nxdl.xml
#	base_classes/nyaml/NXdata.yaml
#	base_classes/nyaml/NXsource.yaml
#	contributed_definitions/NXactuator.nxdl.xml
#	contributed_definitions/NXcalibration.nxdl.xml
#	contributed_definitions/NXelectronanalyser.nxdl.xml
#	contributed_definitions/NXresolution.nxdl.xml
#	contributed_definitions/nyaml/NXactuator.yaml
#	contributed_definitions/nyaml/NXcalibration.yaml
#	contributed_definitions/nyaml/NXelectronanalyser.yaml
#	contributed_definitions/nyaml/NXresolution.yaml
# Conflicts:
#	base_classes/nyaml/NXdata.yaml
#	contributed_definitions/NXactuator.nxdl.xml
#	contributed_definitions/NXdata_mpes.nxdl.xml
#	contributed_definitions/NXdata_mpes_detector.nxdl.xml
#	contributed_definitions/NXelectronanalyser.nxdl.xml
#	contributed_definitions/NXresolution.nxdl.xml
#	contributed_definitions/nyaml/NXactuator.yaml
#	contributed_definitions/nyaml/NXdata_mpes.yaml
#	contributed_definitions/nyaml/NXdata_mpes_detector.yaml
#	contributed_definitions/nyaml/NXelectronanalyser.yaml
#	contributed_definitions/nyaml/NXresolution.yaml
@lukaspie lukaspie force-pushed the fairmat-2024-nxdata branch from bb2b020 to 2c1c6cd Compare January 16, 2025 08:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NXdata
6 participants