full data annotations #26

anaelisa24 · 2018-09-11T18:09:49Z

Hi, I'm working on a project about sound and instrument classification using active learning. We ran some initial experiments using OpenMic for only binary classification but would like to make it multilabel. For this we would need to have some part of the data fully annotated so we can actually run some tests. I would greatly appreciate your input and help.

bmcfee · 2018-09-21T15:25:55Z

The eventual goal is to get the entire dataset completely annotated, but it would be nice off the bat to have a smaller slice with complete annotations, both for model development and evaluation.

Two options come to mind:

Pull out a subset of the openmic2018 data and crowdsource full annotations. This will be costly, so the set will have to be relatively small (1000 tops, i'm guessing). We'll have to work a bit to make sure the coverage is good.
Pull an independent set of clips from the larger FMA pool that openmic2018 came from, using similar ranking and quantile sampling strategies (per instrument), then source complete annotations. This way, we avoid any potential contimation / long-term overfitting on openmic2018, but still get a representative sample of full annotations.

(2) is obviously more work, but I think it's doable, and better all around. What do others think?

bmcfee added enhancement New feature or request question Further information is requested labels Sep 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

full data annotations #26

full data annotations #26

anaelisa24 commented Sep 11, 2018

bmcfee commented Sep 21, 2018

full data annotations #26

full data annotations #26

Comments

anaelisa24 commented Sep 11, 2018

bmcfee commented Sep 21, 2018