Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

full data annotations #26

Open
anaelisa24 opened this issue Sep 11, 2018 · 1 comment
Open

full data annotations #26

anaelisa24 opened this issue Sep 11, 2018 · 1 comment
Labels
enhancement New feature or request question Further information is requested

Comments

@anaelisa24
Copy link

Hi, I'm working on a project about sound and instrument classification using active learning. We ran some initial experiments using OpenMic for only binary classification but would like to make it multilabel. For this we would need to have some part of the data fully annotated so we can actually run some tests. I would greatly appreciate your input and help.

@bmcfee bmcfee added enhancement New feature or request question Further information is requested labels Sep 21, 2018
@bmcfee
Copy link
Collaborator

bmcfee commented Sep 21, 2018

The eventual goal is to get the entire dataset completely annotated, but it would be nice off the bat to have a smaller slice with complete annotations, both for model development and evaluation.

Two options come to mind:

  1. Pull out a subset of the openmic2018 data and crowdsource full annotations. This will be costly, so the set will have to be relatively small (1000 tops, i'm guessing). We'll have to work a bit to make sure the coverage is good.
  2. Pull an independent set of clips from the larger FMA pool that openmic2018 came from, using similar ranking and quantile sampling strategies (per instrument), then source complete annotations. This way, we avoid any potential contimation / long-term overfitting on openmic2018, but still get a representative sample of full annotations.

(2) is obviously more work, but I think it's doable, and better all around. What do others think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants