Use EArrays instead of Tables in the HDF5 file #11

rossant · 2013-04-10T12:50:18Z

@thesamovar @shabnamkadir @kdharris101 Currently, the new file format implemented in SpikeDetekt does not match exactly what is described here. These notes should be updated, but some decisions need to be made first.

A potential issue with the current format concerns the features. We will want the option to show all spikes at once in the FeatureView. With really large datasets (hundreds of channels) that will be available soon, it won't be possible to load the whole feature matrix (Nspikes x Nfeatures) at once because it won't fit in memory. We'll want to load just a Nspikes x 2 array with the two features we're interested in. I don't think that's possible to do that with the current file structure, where there's a table with Nspikes rows, and a Features column of datatype (Nfeatures-long vector).

A solution would be to have a special array in the file, called features, of size Nspikes x Nfeatures. With a chunk shape equal to Nspikes x 1, it will be quite efficient to read just two random columns of this array.

More generally, the advantage of using a Table with one row per spike, instead of independent arrays (features, masks...), should be precised somewhere. Both tables and arrays can have an extendable dimension.

The text was updated successfully, but these errors were encountered:

kdharris101 · 2013-04-10T13:01:45Z

Hi all,

To repeat this point: Cyrille and I were talking this morning. We are not sure why the hdf5 format uses tables rather than arrays; and as Cyrille explained, it seems that arrays will allow KlustaViewa to scale much better to the very large data sets that will be coming in a few years. So we are considering switching the format to have a separate array for the features, spikes, spike times, etc. This would mean dropping tables altogether.

Dan, I think I remember a discussion in which you explained why tables were the better choice. But I can’t remember the reason. What was it?

k

From: Cyrille Rossant [mailto:[email protected]]
Sent: 10 April 2013 13:50
To: klusta-team/spikedetekt
Cc: Harris, Kenneth
Subject: [spikedetekt] Decisions about the HDF5 file format (#11)

@thesamovarhttps://github.com/thesamovar @shabnamkadirhttps://github.com/shabnamkadir @kdharris101 https://github.com/kdharris101 Currently, the new file format implemented in SpikeDetekt does not match exactly what is described herehttps://github.com/klusta-team/spikedetekt/blob/master/docs/fileformat.md. These notes should be updated, but some decisions need to be made first.

A potential issue with the current format concerns the features. We will want the option to show all spikes at once in the FeatureView. With really large datasets (hundreds of channels) that will be available soon, it won't be possible to load the whole feature matrix (Nspikes x Nfeatures) at once because it won't fit in memory. We'll want to load just a Nspikes x 2 array with the two features we're interested in. I don't think that's possible to do that with the current file structure, where there's a table with Nspikes rows, and a Features column of datatype (Nfeatures-long vector).

A solution would be to have a special array in the file, called features, of size Nspikes x Nfeatures. With a chunk shapehttp://pytables.github.io/usersguide/optimization.html equal to Nspikes x 1, it will be quite efficient to read just two random columns of this array.

More generally, the advantage of using a Table with one row per spike, instead of independent arrays (features, masks...), should be precised somewhere. Both tables and arrays can have an extendable dimension.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/11.

thesamovar · 2013-04-10T14:43:16Z

The discussion we had before was around whether or not you can efficiently append to an array, which is necessary for the main loop of spikedetekt because you don't know in advance how many spikes you will have. You can definitely append to tables, no problem. At the time we thought you couldn't append to arrays, but what Cyrille writes suggests that he's found a way to append to arrays ("Both tables and arrays can have an extendable dimension."), in which case I think arrays would be preferable to tables.

rossant · 2013-04-10T14:49:54Z

@thesamovar @kdharris101 Indeed, arrays can be extendable in HDF5 along one dimension at most. In PyTables, one needs to use EArrays.

If everyone agrees I can update the code to use arrays instead of tables.

kdharris101 · 2013-04-10T14:51:05Z

I agree!

From: Cyrille Rossant [mailto:[email protected]]
Sent: 10 April 2013 15:50
To: klusta-team/spikedetekt
Cc: Harris, Kenneth
Subject: Re: [spikedetekt] Decisions about the HDF5 file format (#11)

@thesamovarhttps://github.com/thesamovar @kdharris101 https://github.com/kdharris101 Indeed, arrays can be extendable in HDF5 along one dimension at most. In PyTables, one needs to use EArrayshttp://pytables.github.io/usersguide/libref/homogenous_storage.html#the-earray-class.

If everyone agrees I can update the code to use arrays instead of tables.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/11#issuecomment-16178921.

thesamovar · 2013-04-10T14:51:19Z

Me too!

shabnamkadir · 2013-04-10T16:40:34Z

I agree too.

Shabnam

On Wed, Apr 10, 2013 at 3:51 PM, Dan Goodman [email protected]:

Me too!

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/11#issuecomment-16179030
.

Dr. Shabnam Kadir
Institute of Neurology, Department of Neuroscience, Physiology, and
Pharmacology
University College London
21 University Street
London WC1E 6DE

Tel: +44 (0)20 3108 2407

ghost assigned rossant Apr 10, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use EArrays instead of Tables in the HDF5 file #11

Use EArrays instead of Tables in the HDF5 file #11

rossant commented Apr 10, 2013

kdharris101 commented Apr 10, 2013

thesamovar commented Apr 10, 2013

rossant commented Apr 10, 2013

kdharris101 commented Apr 10, 2013

thesamovar commented Apr 10, 2013

shabnamkadir commented Apr 10, 2013

Use EArrays instead of Tables in the HDF5 file #11

Use EArrays instead of Tables in the HDF5 file #11

Comments

rossant commented Apr 10, 2013

kdharris101 commented Apr 10, 2013

thesamovar commented Apr 10, 2013

rossant commented Apr 10, 2013

kdharris101 commented Apr 10, 2013

thesamovar commented Apr 10, 2013

shabnamkadir commented Apr 10, 2013

Tel: +44 (0)20 3108 2407