Skip to content

Latest commit

 

History

History
76 lines (56 loc) · 4.67 KB

File metadata and controls

76 lines (56 loc) · 4.67 KB

Back to Projects List

Advanced Data I/O via PyNWB

Key Investigators

  • Oliver Ruebel (LBNL)
  • Andrew Tritt (LBNL)
  • Ben Dichter
  • Thomas Braun
  • Jean-Christophe Fillion-Robin (Kitware)
  • Aaron D. Milstein (Stanford)

Project Description

Enhance and gather requirements for advanced data I/O features, e.g.:

  • Compression
  • Iterative data write
  • Data streaming
  • MPI parallel I/O
  • External files

Objective

  1. Create list of requirments for the various advanced I/O features
  2. Expand existing advanced I/O features as needed to better support the requirements
  3. As approbriate, prioritize and define plan for how the features could be implemented

Current functionality

  1. Basic compression is currently supported via the H5DataIO class. An example for how to use H5DataIO is part of the PyNWB docs http://pynwb.readthedocs.io/en/latest/example.html#compressing-datasets .
  2. Iterative data write (and streaming) are currrently supported via:
  1. External files are currently supported through "reuse" of NWBContainers and through passing in of h5py.Dataset objects. Some known needs are:
  • See progress below Instead of using h5py.Dataset as inputs to NWBContainers to then create external links, this behavior should be made explicit by wrapping the datasets using HDF5IO and then configuring things on the container. This is needed to 1) make it explicit to users whether ExternalLinks are being created, 2) enable copy vs. linking of data, 3) facilitate error checking for mismatching attributes
  • TODO Need to add error checking ot ensure that attributes on the dataset match what the user is providing

Approach and Plan

  1. Review existing functionality in PyNWB for compression, iterative write, streaming,, external files, and parallel I/O
  2. Identify missing features
  3. Prioritize and define plan for implementing missing features and identify implementation leads for the different features.

Progress and Next Steps

  • The following pull request has been merged: NeurodataWithoutBorders/pynwb#400
    • Allow use of HDF5IO to configure creation of external links
    • Allow customization of default behavior when h5py.Dataset objects are used as input on write
    • Expand the list of supported I/O parameters on HDF5IO to allow chunking, compression, etc. options to be set explicitly
    • Some minor improvements to DataChunkIterator
  • Next steps:
  • Oliver to create Advanced Data I/O tutorial for the hackathon at LBNL
  • Enhance HDF5IO to create a queue with all DataChunkIterators to allow customization of how the write of DataChunkItertors is handled (see use case described in NeurodataWithoutBorders/pynwb#310 and NeurodataWithoutBorders/pynwb#309)
  • MPI I/O

Illustrations

Background and References