-
Notifications
You must be signed in to change notification settings - Fork 79
Training your own data set (audio and transcription) #33
Comments
Also I trained the model with just 4 audio files of common_voice data , to give it a try. It has prepared encoder.subwords file and the vocab size is given as 301. I trained the model for 1000 epochs. Now I want to train first 1000 audio files. When I used the command I believe this is due to new words got added; But for finetuing, the model is copying the old encounter.subwords from the checkpoint directory as here When we want to to train with new data and want to finetune (restore weights from a trained checkpoint), how should we add new vocabulary of the new audio files transcriptions.
|
Hi,
I would like to train my own dataset. Just wanted to know some guidelines on how to prepare data:
I'll prepare some .wav files with 16KHz sample rate and single channel.
Also is a there a trained model I can start my training from instead of starting from scratch ?
Thanks.
The text was updated successfully, but these errors were encountered: