About the was_generated_by field #598

ehennestad · 2024-11-23T09:05:32Z

When I saw that there was a new field was_generated_by I initially thought that this was meant for storing information about which package was used for creating an nwb file, e.g pynwb, matnwb or NWB Guide, which I thought was great.

Only after reading the field description and this issue: #258, I realised that it is meant for storing information about software used to generate actual datasets / datatypes.

In my opinion, it would be great to have a field in the file dedicated to storing information about which software was used to generate the file (as I first interpreted it).

I also think it would make more sense to add information about which software generated a dataset to the actual datasets (similar to how you can add more detailed metadata to a device). Having a list on the file object itself is a slight improvement, but it requires some work for the user of the file to understand which software applies to which dataset/datatype which is not ideal

The text was updated successfully, but these errors were encountered:

stephprince · 2024-11-25T17:54:24Z

When I saw that there was a new field was_generated_by I initially thought that this was meant for storing information about which package was used for creating an nwb file, e.g pynwb, matnwb or NWB Guide, which I thought was great.

Only after reading the field description and this issue: #258, I realised that it is meant for storing information about software used to generate actual datasets / datatypes.

In my opinion, it would be great to have a field in the file dedicated to storing information about which software was used to generate the file (as I first interpreted it)

I think the current iteration of was_generated_by is intended to be a catch-all for both types of information that you listed, software used to generate the NWBFile and software used to acquire/generate data (at least until we determine how we want to attach the latter to the actual data).

I think we could clarify the description in the schema and maybe add an example to make this clearer?

I also think it would make more sense to add information about which software generated a dataset to the actual datasets (similar to how you can add more detailed metadata to a device). Having a list on the file object itself is a slight improvement, but it requires some work for the user of the file to understand which software applies to which dataset/datatype which is not ideal

I agree adding the information about which software generated a particular dataset to the actual dataset is a better solution to help users understand which software was used to generate what data.

One potential approach is to add was_generated_by as an optional dataset to the Container data type in hdmf-common-schema so that it is possible to add this optional dataset to all the NWB data types that inherit from Container. Any thoughts on that?

This comment also has a more thorough summary of the provenance information and support we might want to add based on other discussions.

stephprince added category: proposal proposed enhancements or new features priority: medium non-critical problem and/or affecting only a small set of NWB users labels Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the was_generated_by field #598

About the was_generated_by field #598

ehennestad commented Nov 23, 2024 •

edited

Loading

stephprince commented Nov 25, 2024

About the was_generated_by field #598

About the was_generated_by field #598

Comments

ehennestad commented Nov 23, 2024 • edited Loading

stephprince commented Nov 25, 2024

ehennestad commented Nov 23, 2024 •

edited

Loading