Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove-NXroot-inheritance #1505

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

rayosborn
Copy link
Contributor

@rayosborn rayosborn commented Nov 4, 2024

This PR addresses Issues #1493 and #1494. It contains two changes to the NXroot NXDL file,

  • The first ensures that the NXroot base class does not inherit from the NXobject base class. When group classes that are valid in any base class are added to the NXobject base class, it is important to exclude NXroot to enforce the rule that only NXentry groups are permitted.
  • The second removes the NX_class attribute definition from root attributes. This is the only example in the entire repository where the mechanism for defining classes is conflated with the contents of that class. Its presence here implies that the attribute is required, whereas it is completely redundant.

When group classes that are valid in any base class are added to the NXobject base class, it is important to exclude NXroot to enforce the rule that only NXentry groups are permitted.
This is an anomalous entry that is the only example in the entire repository where the mechanism for defining classes is conflated with the contents of that class. Its presence here implies that the attribute is required, whereas it is completely redundant.
@PeterC-DLS
Copy link
Contributor

PeterC-DLS commented Nov 6, 2024

This NX_class attribute was added to address #149

@tacaswell
Copy link
Contributor

I am 👍🏻 on removing the inheritance from NXobject (makes the definitions match the prose), but I am very 👎🏻 on removing NX_class attribute (and would support a proposal to make it required in the future).

@rayosborn
Copy link
Contributor Author

There is already an extended discussion of the NX_class issues in #1493, so please read that before making too many extra comments. In particular, it explains a little of the history behind its addition. I am strongly against making it required (which would require a NIAC vote), although I would be interested to hear from @tacaswell why he thinks it is necessary.

@tacaswell
Copy link
Contributor

I think it makes lots of sense for the top-level group of an hdf5 file to declare itself to be the root of a nexus file (rather than inferring that it is because there is 1 or more groups tagged with NXEntry) to identify when we have


Looking ahead (and as I as trying to say in the call / comments today), it would be nice to blur the "top of a nexus file" and "top of an hdf5 file" a bit. The / group in hdf5 is not really special, and I do not understand a technical reason that

/ [no nexus markup]
/A [nx_root]
  /e1 [nx_entry]
  /e2 [nx_entry]
/B [nx_root]
  /e3 [nx_entry]
  /e4 [nx_entry]
  /e5 [nx_entry]

should not be allowed (I understand that it currently is not allowed but do not understand why it is not allowed).

A concrete use case I have for this is an experiment we are doing at NSLS-II where we put identical samples on an pdf beamline and a xafs beamline and coordinate data collection between them. It seems reasonable to have one (on-disk) file with all of the data for the experiment from both beamlines consolidated in it and to leverage the fact that hdf5 lets you do nested groups to provide a bit of additional structure.

I presume most tools can handle multiple on-disk files at a time, so it seems to be a direct extension to support "multiple nexus files in one hdf5 file" as well. e.g. nexpy will show you the contents of multiple hdf5 files in a nice tree.

It would also give a nice path to mix nexus and not-nexus in the same hdf5 file

/ [no nexus markup]
/NEXUS [nx_root]
  /e1 [nx_entry]
  /e2 [nx_entry]
/other
/stuff
/that
/is 
/not 
/nexus

If we want to be able to serialize nexus into zarr, tiled, or any other format that has a different set of boundaries between what is "file system" structure and what is "in-file" structure having an explicit marker that "this is the top of the nexus tree" is extremely useful.


To the question about validators, it seems like the solution is that Sandor "just" needs to add a --implicit_root flag and everything will be fine.

@prjemian
Copy link
Contributor

prjemian commented Nov 6, 2024

... currently is not allowed ...

This is not correct. The NeXus standard does not control the root of the HDF5 file. The HDF5 file root should not be validated other than for: [1] at least one NXentry group, [2] the presence of the @default attribute and that it names an existing NXentry group, and [3] optional file-level attributes.

It is perfectly acceptable (not a warning or an error) to have non-NeXus content at the file root, as long as such content does not use NX (as described in the manual).

For example, a tree such as this, with mixed NeXus and non-NeXus content, should not fail NeXus validation:

/
    data/
    entry/:NXentry
    entry2/:NXentry
    exchange/
    pi = 3.14
  • APS users have asked if one HDF5 file can contain both NeXus and Data Exchange content.
    Yes
  • Then they have asked if content can be linked between the two.
    Also Yes, but you might run into challenges (such as group or dataset attributes).

The NXroot base class is unique in not applying to an HDF5 group, but to the root level of the NeXus file itself.
@prjemian
Copy link
Contributor

prjemian commented Nov 7, 2024 via email

This also adds fields to the permitted non-NeXus objects at the root level.
@rayosborn
Copy link
Contributor Author

@tacaswell, I made another commit that partly implements what you suggest, although I still wanted to avoid use of "root group" since I think that contributed to the original confusion about the need for assigning the NXroot class. On the other hand, I wasn't aware that there was a difference between f and f['/'] when `f=h5py.File('...')'. Is that also true of the HDF5 C-API, i.e., that a path of '/' refers to a HDF5 group, or is that just in h5py?

@tacaswell
Copy link
Contributor

Is that also true of the HDF5 C-API, i.e., that a path of '/' refers to a HDF5 group, or is that just in h5py?

My understanding is that this is also true in the c API

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants