Neo is a free, open source (licensed under BSD-3) python library for electrophysiology data in python. It is only concerned with data representation and management and does not provide any analysis or visualization.
It supports importing data from 40 different formats, including closed source proprietary formats like Spike2, Plexon, AlphaOmega or BlackRock, as well as the ability to write the data into many (mostly open-source) formats. Besides this neo also provides functionality to directly write down the data formatted according to the neo internal structures using commonly used container formats like HDF5.
Therefore neo can be used in a variety of usecases, most importantly the unification of datasets in different formats (e.g. recorded using different proprietary software) into a common data representation. This allows for applying the same approach on the whole dataset with minimal to adjustments to the code.
Neo represents data in three kinds of objects. The first one are data objects. These represent raw data like the analog signal, discrete events, or spikes (either binary like events or including the waveform). These objects are implemented using the quantity package, which is based of numpy and represents numpy arrays with additional meta data such as the units of the data. As numpy is the quasi standard for numerical analysis in python, this allows to plug the neo data objects directly into data analysis algorithms or plotting libraries like matplotlib, all the while the quantity package ensures data consistency.
These data objects are stored within container objects, and form a small hierarchy of Blocks containing Segments containing the data objects. A segment contains all the data that has the same clock (this does not imply the same sampling rate, start or end time for each data). A segment can therefore be thought of as representing a trial, run or recording. A Block contains multiple segments and as well as groupings (see below) and does not necessarily require a common clock. While it can often be thought of as an recording session, in many cases one import just represents one block (e.g. a spike file will always produce one block with segments representing different recordings/recording segments).
Lastly there are groupings. These can be used to link data objects of different segments together, e.g. to relate the output of the same neurons from multiple recordings (segments) together, representing regions of interest, or linking and annotating different analog signals as logical or physical channels.
Besides this does neo also support the annotation of information on every level, allowing to mark features, link data or simply if you want to store a poem next to your favorite MNG data.
The current approach implements a custom written format to import the data where analog signals are represented as lists of floats and spikes (action potations) are extracted as lists of ActionPotentials, a class containing the spike analog signal as a list of float and some meta data. All of the current functionality is already provided by neo. In fact, our current format can easily be created from a neo segment, making it a strict subset of the capabilities of the neo format.
Besides not being worse than the current format, neo provides a few advantages. Through the usage of numpy neos representation of the data is more efficient. Numpy avoids unnecessary conversions and allows for lazy operations on the data, while the current list approach requires copying of the data multiple times. Besides this, for formats applicable (and if implemented by the importer) neo supports lazy loading, meaning single objects (e.g. signal channels) can be loaded from the file without loading all the other objects, which could potentially massively save up performance. Aside the technical benefits, there is of course also the benefit that we get importers for 40 different file formats without much effort when using neo, while the current approach has to implement all required importers manually. In the case of Spike2, for which we also have an importer, the neo importer reads the binary spike2 format with all the metadata contained in it. Our importer is restricted to the CSV export, only getting the channel names and values of channels selected for export. While not only adding another step, a lot of information is lost, and even if it was exported.
There are also a few downsides to using neo. The most important one is the effort of writing an importer. Even though neo is build such that not all features have to be supported by all importers (as not all formats include all the information), both the effort of writing a neo importer as well as the effort to learn how to write a neo importer is much greater than with the current very simple format. This is a little bit mitigated by the fact that neo has an example importer for explaining the API as well as a pretty decent documentation, but after all, providing more features requires more work. The second issue is that while the datatypes provided by neo are the same across all supported importers, how these datatypes are used is (intentionally) left quite open to the implementation. An importer can decide the semantics behind what a Block or Segment in that data represents completely by itself, decide which features to implement (e.g. if spikes are represented by waveforms or simply binary events) or what groupings to created. While this is necessary to support such a vast variety of formats, as each tool and format might emphasize different things, this might also require some additional work when importing the data to unify the loaded data from different sources (e.g. by creating missing groups if an importer does not create them automatically).