-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Librispeech960 Pretrained Model #27
Comments
How much data shall I use for fine tuning to get decent results to avoid initial over fitting? Does 50 files of 5 sec work in training? what's the general rule when dealing with over fitting in case of transformers? Do we really need more data to fine tune with or is it hyper parameters? |
hi there,
We do have AudioSet+Librispeech pretrained checkpoint for frame based AST, see https://github.com/YuanGongND/ssast#pretrained-models. One conclusion in our ablation study is that this checkpoint would be better than the model trained solely on Librispeech, even on speech tasks. Note that for speech tasks, we do not mean ASR, but speech classification, e.g., command recognition, emotion recognition, etc.
It is hard to estimate as there are many factors (e.g., how many classes, how easy it is to sepearate sounds). You would need to try, but 50 files is a very small number. The smallest dataset we tested is ESC-50 (50 classes, each 40 samples, total 2000 samples). -Yuan |
Hey Thanks Yuan, Nice answer, I guess I got the idea on what factors I should be looking for when deciding smallest dataset. kudos! I appreciate your answer to the Librispeech model, I guess I should have framed the question a little different. Anyway, from what I understand Frame-400 trained on both Audioset and Librispeech should perform better than others for Speech classification. Looking at ablation study in paper and table 2, I can't find whether Libripseech(only) has been trained with patch or frame . From table 5 I can see that Librispeech only has been paired with patch. It would be nice to the benchmarks for Librispeech only with frames for speech tasks. It's just that I'm unable to find it either on paper or Github readme. Apologies for inconvenience. Best Regards, |
Hey I'm curious on why you don't have Librispeech960 Pre-trained on Frame base. I saw you were recommending Frame based models. Do you have Pre-trained Librispeech on Frame?
The text was updated successfully, but these errors were encountered: