Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make NWBFile groups optional :) #592

Open
sneakers-the-rat opened this issue Oct 21, 2024 · 6 comments
Open

Make NWBFile groups optional :) #592

sneakers-the-rat opened this issue Oct 21, 2024 · 6 comments
Labels
category: proposal proposed enhancements or new features priority: low alternative solution already working and/or relevant to only specific user(s)
Milestone

Comments

@sneakers-the-rat
Copy link

Currently, many of the top-level items in NWBFile are explicitly optional (quantity is either * or ?), and several can be interpreted as being optional, but there is a little bit of inconsistency that is awkward here:

Here's the current status of quantity for NWBFile.groups:

  • acquisition: null
  • analysis: null
  • general: null
  • intervals: '?'
  • processing: null
  • scratch: '?'
  • stimulus: null
  • units: '?'

(where null == 1 by default).

Some of these can be interpreted as being optional, eg. groups like processing consist of only *-quantitied groups, so I am treating those as Optional[Dict[str, ProcessingModule]] : https://github.com/p2p-ld/nwb-linkml/blob/77a852913c0ae036ca7285a3e5e5dc04fa3e8954/nwb_models/src/nwb_models/models/pydantic/core/v2_7_0/core_nwb_file.py#L258

But others, in particular general and stimulus would be much harder to make that inference for, and it's enough of a special case that i'd prefer not to do it. For both of those, all of their (recursive) child groups are optional, so it makes sense to also make them optional.

This is already how pynwb behaves for the null-aka-1-quantitied groups:

So it would be nice to make the rest of the groups quantity ? to make that behavior ~schema official~, because at the moment requiring general and stimulus is technically correct behavior, even if all of their params are optional (and i don't think there's a syntactic construct in nwb schema language for "the default value is an empty instance of this class").

@oruebel
Copy link
Contributor

oruebel commented Oct 22, 2024

Just to avoid confusion, I believe the parameters indicated here only mean that a user does not need to specify any contents to be stored in the groups. As far as I know, PyNWB still treats these groups as required and will generate empty groups if the user does not specify any contents. I.e., as far as I know, PyNWB (and MatNWB) indeed treat these groups as required.

  • pynwb currently does not represent general as a separate object and has all of its groups and subgroups unpacked in the docval signature and maps them back into general in the NWBFileMap, so NWBFile.general is technically optional because it doesn't exist as a discrete type.

/general is being treated as required by PyNWB. /general is a required component (i.e,. with a quantity of null aka 1) because it is part of the NWBFile type. The group does not need to exist as its own type for it to be required.

Some of these can be interpreted as being optional, e.g. groups like processing consist of only *-quantitied groups, so I am treating those as Optional[Dict[str, ProcessingModule]]

acquisition, analysis, general, processing, and stimulus were all part of the original NWB 1.0 schema and the requirement for these groups to exist had carried over from there. intervals, units, and scratch were added later sometime during the development of NWB 2.x so they were added as optional groups. But that's just for historical background.

In general, I think it is reasonable to make groups that have no required contents optional. general/ contains several required metadata fields, so I believe it should remain as required. analysis and processing I believe could be optional. acquisition and stimulus I believe could also be made optional (but I'm not 100% sure right now for those two). In principle, I think it would be Ok to make some of the top-level groups that don't have required contents optional. I am not sure, however, how involved the changes in PyNWB would be to support this. Also, I suspect that some external tools may rely on the fact that these groups are currently required (but I'm not sure). To make a long story short, I think it would be Ok to explore the option to make some the groups optional, but I suspect that the coding effort required will likely not be as trivial as simply changing the quantity in the schema. This is a balance of allowing flexibility in the file structure vs. using a strict top-level hierarchy that others can depend on without having to check if the group (e.g,. /acquisition) actually exists.

@rly @bendichter thoughts?

@stephprince stephprince added priority: low alternative solution already working and/or relevant to only specific user(s) category: proposal proposed enhancements or new features labels Oct 22, 2024
@sneakers-the-rat
Copy link
Author

I believe the parameters indicated here only mean that a user does not need to specify any contents to be stored in the groups

gotcha, makes sense. So I suppose there is some useful ux slippage between the schema and api - formally required by the schema/in the serialized file, but optional in the api with default to create a group during serialization. Reasonable call since a group doesn't have default_value and default_value i think only takes literals (?) so there isn't really such a thing as "this is required but when one isn't provided use an empty instance of this group" in the schema.

i am pretty sure (?) this is the only place where this happens, or at least where it's more complicated to to figure out which groups are technically-required-but-all-contents-are-optional because it has the deepest nesting. So if schema and serialization are more tightly bound than schema and API, i'm happy to just mimic that behavior and add this to my special cases handler in the generator by using blank SubModels() as defaults.

if this would be a complicating rather than a simplifying change, feel free to close it - it's also nice to be "given permission" to follow the pynwb behavior here, i'm trying to keep references in my docstrings to these explanations so people can see explanations for behaviors linked to each special case. trying to be useful with some passive documentation for yall :).

general/ contains several required metadata fields,

although now that you say this, i am looking at the schema and not seeing any required params in NWBFile.general and i'm worried i'm missing something (again) about the schema language...

Screenshot 2024-10-22 at 6 36 22 PM

@oruebel
Copy link
Contributor

oruebel commented Oct 23, 2024

although now that you say this, i am looking at the schema and not seeing any required params in NWBFile.general and i'm worried i'm missing something (again) about the schema language...

I don't think you are missing anything here. I thought some fields in /general were required (at least DANDI requires some, e.g., subject and species). Even without additional requirements from DANDI, in practice /general usually always has some metadata fields populated because, e.g., to store an ElectricalSeries one must have and ElectrodesTable which lives in /general. Similarly for ophys and icephys. I.e., while the fields in /general appear to be optional, /general in practice always has at least some metadata in it because they are being referenced (and hence needed) by other objects in the file.

but optional in the api with default to create a group during serialization

Since the top-level groups do not have their own type, they are represented in the API by NWBFile, so they don't need a separate instance of an object to represent them.

where it's more complicated to to figure out which groups are technically-required-but-all-contents-are-optional

If the quantity for a group is 1, then it is required by the schema (independent of whether everything else in the group may be optional).

if this would be a complicating rather than a simplifying change, feel free to close it

I'd like to hear from others what their thoughts are on making the top-level groups optional. I'm not opposed to the change, but I'm not sure right now whether this may open a hornets nest of downstream issues. I think the main complicating factor is probably the flexibility this adds, e.g., I'm not sure if/which tools may be relying on the fact that these groups exist (even if they are empty). @bendichter, @rly what do you think?

@sneakers-the-rat
Copy link
Author

If the quantity for a group is 1, then it is required by the schema (independent of whether everything else in the group may be optional).

right right, just an API implementation detail like how pynwb does it. i think the file object is the only place where this needs special handling, so i'll just document that.

I'd be curious too - wonder how many downstream impls directly interact with the HDF5 file rather than through pynwb

@bendichter
Copy link
Contributor

I'd be in favor of with making top-level groups like aquisition and processing optional. I don't like having empty groups. It seems like it should be a simple change, but it may be trickier than it seems.

@rly
Copy link
Contributor

rly commented Nov 14, 2024

I agree with @bendichter . Having empty groups is confusing and inelegant, particularly when browsing NWBFile in an HDF5 viewer or neurosift. /stimulus/presentation gets me every time because stimulus is not empty but all of its child groups usually are. That can be addressed via the visualization tool, but it seems unnecessary. I see @oruebel 's point that some downstream tools may look for these groups outside of pynwb. I think we should make the change but in schema 2.9.0 (not the one coming up this month) so we have some more time to test it and also look at it in matlab.

@rly rly added this to the 2.9.0 milestone Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: proposal proposed enhancements or new features priority: low alternative solution already working and/or relevant to only specific user(s)
Projects
None yet
Development

No branches or pull requests

5 participants