full-length datasets #17

marius10p · 2016-08-03T13:17:53Z

Current datasets are way too short and not at all representative of real use scenarios. Many more cells will be detected from a typical 1-2 hour recording than from the length of time provided here. These don't have to be downloadable, perhaps only available for running algorithms remotely on?

marius10p · 2016-08-03T13:32:32Z

It would be useful to have even a single full length recording, ideally one that includes one of the existing datasets, so the performance difference can be estimated.

freeman-lab · 2016-08-03T17:43:57Z

Echoing some comments by @sofroniewn from the gitter chat...

Most of the datasets are ~7 minutes (3000 frames @ 6-7 Hz), frame rates are in the info.json with each dataset. That's not as long as some data in the literature, but also not completely atypical. To best support a diversity of languages and platforms, especially those that require a license, we can't run algorithms remotely, so for now require people to download and run themselves — that's the main reason for keeping the sizes reasonable!

But here's the current lineup, and what we could add:

00
we have 3000 frames @ 7 Hz
we posted everything we have

01
we have 9000-20000 frames @ 30 Hz total
we posted 3000 @ 30 Hz, could post the rest as is, or downsample to ~8 Hz

02 
we have 8000 frames @ 8 Hz total
we posted 3000 @ 8 Hz, could post the rest as is

03 
we have 9000 frames @ 30 Hz total
we posted 3000 @ 30 Hz, could post the rest as is, or downsample to ~8 Hz

04
we posted 3000 frames @ 6.5 Hz
i think that's all we have

I think my vote would be to standardize all of them @ 8 Hz by downsampling if neccessary, and post everything we have. Though this will increase the data sizes by quite a bit.

marius10p · 2016-08-03T18:04:47Z

I see. I guess I was thinking of series 01 and 03 which are in fact 1-2 minutes long.

Sounds like a good idea to standardize @8hz and post everything.

The longest dataset then will be ~10 minutes long. What is the bandwidth limitation, do you have to pay to host the data?

Maybe consider also adding a single good recording, with many neurons, for ~1 hour.

freeman-lab · 2016-08-04T03:58:08Z

Ok, updates are done! Data set durations are now as follows:

00
~3000 frames @ 7 Hz

01
~3000 frames @ 7.5 Hz

02
~8000 frames @ 8 Hz

03
~2500 frames @ 7.5 Hz

04
~3000 frames @ 6.5 Hz

So all are now close to 7-8 Hz, all are at least 7 min, and the longest is 17 min. We've now posted everything we have from the original providers. Will add this info to the website.

My only concern adding a ~1 hr dataset to the test data is that its size @ 8 Hz could become onerous for some people's machines / some algorithms, and submitting already requires running algorithms across 7 moderately sized datasets. We could downsample a longer one even more, say to 8000 frames @ 2.5 Hz, but then it wouldn't be consistent with the sample rate of the others.

That said, always happy to add extra datasets as training data of any size just for people to play with, storage / bandwidth isn't really an issue.

marius10p · 2016-08-04T08:22:53Z

Cool, thanks for expanding the datasets! I will be curious if it improves the scores or not.

For someone who comes to the website to see which algorithms might be useful, a single full-length dataset would be invaluable for assessing not just accuracy on a typical recording, but also how fast the algorithms are on realistic recordings. Shouldn't any algorithm be able to run on a 2hr dataset @30hz ? Most of our data is in that range.

Selmaan · 2016-08-08T01:24:23Z

The frame rate for the harvey lab datasets is not the same for .00 and .01. The frame rates saved in the .json file for each submitted dataset should be correct (for the .01 it is 3hz).

I think having (moderate) diversity in these datasets is a feature, not a bug. The test results so far to me look like a lot of low-rank structure: much better performance on some datasets than others. It's useful to see this to understand the successes and breakdowns of the algorithms under different conditions (and truth definitions).

marius10p · 2016-08-08T08:10:46Z

Well, I think that the wide range of results on different datasets has solely to do with the different types of ground truth definitions, and very little to do with the algorithms. Which isn't great, this benchmark is supposed to test the algorithms, not the annotation method!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

full-length datasets #17

full-length datasets #17

marius10p commented Aug 3, 2016

marius10p commented Aug 3, 2016

freeman-lab commented Aug 3, 2016

marius10p commented Aug 3, 2016

freeman-lab commented Aug 4, 2016 •

edited

Loading

marius10p commented Aug 4, 2016

Selmaan commented Aug 8, 2016

marius10p commented Aug 8, 2016

full-length datasets #17

full-length datasets #17

Comments

marius10p commented Aug 3, 2016

marius10p commented Aug 3, 2016

freeman-lab commented Aug 3, 2016

marius10p commented Aug 3, 2016

freeman-lab commented Aug 4, 2016 • edited Loading

marius10p commented Aug 4, 2016

Selmaan commented Aug 8, 2016

marius10p commented Aug 8, 2016

freeman-lab commented Aug 4, 2016 •

edited

Loading