You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I'm working on a project about sound and instrument classification using active learning. We ran some initial experiments using OpenMic for only binary classification but would like to make it multilabel. For this we would need to have some part of the data fully annotated so we can actually run some tests. I would greatly appreciate your input and help.
The text was updated successfully, but these errors were encountered:
The eventual goal is to get the entire dataset completely annotated, but it would be nice off the bat to have a smaller slice with complete annotations, both for model development and evaluation.
Two options come to mind:
Pull out a subset of the openmic2018 data and crowdsource full annotations. This will be costly, so the set will have to be relatively small (1000 tops, i'm guessing). We'll have to work a bit to make sure the coverage is good.
Pull an independent set of clips from the larger FMA pool that openmic2018 came from, using similar ranking and quantile sampling strategies (per instrument), then source complete annotations. This way, we avoid any potential contimation / long-term overfitting on openmic2018, but still get a representative sample of full annotations.
(2) is obviously more work, but I think it's doable, and better all around. What do others think?
Hi, I'm working on a project about sound and instrument classification using active learning. We ran some initial experiments using OpenMic for only binary classification but would like to make it multilabel. For this we would need to have some part of the data fully annotated so we can actually run some tests. I would greatly appreciate your input and help.
The text was updated successfully, but these errors were encountered: