-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to train the model with Token/s about 23, that is hopsize=1024 #35
Comments
There are three key considerations to note: 1.The downsampling process should adhere to sampling rate constraints. 2.When modifying the downsampling rate, corresponding adjustments to the hop length and n_fft parameters should be made accordingly. 3.If minimized the number of tokens is your objective, I recommend utilizing audio with a sampling rate of 16 kHz |
Thank you for your reply! |
There are many options, you can try the following configuration, and then adjust the parameters such as downsamples=[8,5,4,4], sample_rate=16000, hop_length=640, n_fft=2560 |
Thanks a lot! I will try to train using this config! |
Hi, wondering how to tune the n_fft parameter when adjusting hop_length. |
I try to train the model with hopsize=1024, shout 23 tokens per second, I only change the upsample_rates to [8,8,4,4] and num_samples to 71680. The trainning is running now, but the results seems not, the synthesized wave is not intelligent, not very good.
What is a good config?
The text was updated successfully, but these errors were encountered: