Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle DynamicTable round 1 #5

Open
5 of 8 tasks
sneakers-the-rat opened this issue Jul 12, 2024 · 0 comments
Open
5 of 8 tasks

Handle DynamicTable round 1 #5

sneakers-the-rat opened this issue Jul 12, 2024 · 0 comments

Comments

@sneakers-the-rat
Copy link
Contributor

sneakers-the-rat commented Jul 12, 2024

DynamicTables are one of the major non-spec (or i guess informal spec) with most of the meat of it in the implementation rather than the schema.

The reason they exist is bc of course we always need additional columns in our data that don't exist in the schema, and the typical schema makes it real hard to add new values. we won't have that problem anymore tho, bc if someone wants to make a table they just like inherit from the table and add some new attributes, generate model, boom done.

unhelpfully there is an absolute ocean of unspecified behavior that has been crammed into DynamicTable and its related classes, so we have to spec and accomodate them.

This is the first of probably several tracking issues for handling DynamicTables and related constructs, first let's just handle the most basic usage.

This issue is incomplete and will mostly serve as a place for notes and to track progress

Implementation

  • DynamicTables consist of a set of VectorData and optionally VectorIndex datasets - table columns
  • VectorData are implicitly an 1-4 dimension array of any type, and so each cell in a table can be a whole array
  • To support ragged arrays (in this case meaning that the table has equal length columns, but each of the cells may be different lengths) in hdf5 datasets, we unravel the array and store it alongside a VectorIndex which is a vector of ints that index the starting position of each cell.
  • VectorIndexes are supposed to have an explicit target, but often don't, and the relationship is encoded by the implicit _index naming convention
  • at access time, the VectorIndex class is silently substituted for the VectorData class (eg type(nwbfile.units['spike_times']) == VectorIndex in spite of the schema

Approach

  • nwb_language.py classes
    • Mixin for DynamicTable that emulates the VectorIndex behavior by wrapping all model fields with a __getitem__ method, allows extra fields.
    • model validator that ensures equal length columns
  • nwb -> linkml translation
    • drop VectorIndexes and just make a field for the VectorData corresponding to its dtype and dims/shape spec (are there cases we can't do this?)
  • pydantic model generation
    • insert custom language adapters into dynamictable model
    • ensure sufficient metadata is present to be able to invert models to schema
  • ....?

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant