Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the was_generated_by field #598

Open
ehennestad opened this issue Nov 23, 2024 · 1 comment
Open

About the was_generated_by field #598

ehennestad opened this issue Nov 23, 2024 · 1 comment
Labels
category: proposal proposed enhancements or new features priority: medium non-critical problem and/or affecting only a small set of NWB users

Comments

@ehennestad
Copy link

ehennestad commented Nov 23, 2024

When I saw that there was a new field was_generated_by I initially thought that this was meant for storing information about which package was used for creating an nwb file, e.g pynwb, matnwb or NWB Guide, which I thought was great.

Only after reading the field description and this issue: #258, I realised that it is meant for storing information about software used to generate actual datasets / datatypes.

In my opinion, it would be great to have a field in the file dedicated to storing information about which software was used to generate the file (as I first interpreted it).

I also think it would make more sense to add information about which software generated a dataset to the actual datasets (similar to how you can add more detailed metadata to a device). Having a list on the file object itself is a slight improvement, but it requires some work for the user of the file to understand which software applies to which dataset/datatype which is not ideal

@stephprince
Copy link
Contributor

When I saw that there was a new field was_generated_by I initially thought that this was meant for storing information about which package was used for creating an nwb file, e.g pynwb, matnwb or NWB Guide, which I thought was great.

Only after reading the field description and this issue: #258, I realised that it is meant for storing information about software used to generate actual datasets / datatypes.

In my opinion, it would be great to have a field in the file dedicated to storing information about which software was used to generate the file (as I first interpreted it)

I think the current iteration of was_generated_by is intended to be a catch-all for both types of information that you listed, software used to generate the NWBFile and software used to acquire/generate data (at least until we determine how we want to attach the latter to the actual data).

I think we could clarify the description in the schema and maybe add an example to make this clearer?

I also think it would make more sense to add information about which software generated a dataset to the actual datasets (similar to how you can add more detailed metadata to a device). Having a list on the file object itself is a slight improvement, but it requires some work for the user of the file to understand which software applies to which dataset/datatype which is not ideal

I agree adding the information about which software generated a particular dataset to the actual dataset is a better solution to help users understand which software was used to generate what data.

One potential approach is to add was_generated_by as an optional dataset to the Container data type in hdmf-common-schema so that it is possible to add this optional dataset to all the NWB data types that inherit from Container. Any thoughts on that?

This comment also has a more thorough summary of the provenance information and support we might want to add based on other discussions.

@stephprince stephprince added category: proposal proposed enhancements or new features priority: medium non-critical problem and/or affecting only a small set of NWB users labels Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: proposal proposed enhancements or new features priority: medium non-critical problem and/or affecting only a small set of NWB users
Projects
None yet
Development

No branches or pull requests

2 participants