how to train the model with Token/s about 23, that is hopsize=1024 #35

Liujingxiu23 · 2024-09-21T10:05:25Z

I try to train the model with hopsize=1024, shout 23 tokens per second, I only change the upsample_rates to [8,8,4,4] and num_samples to 71680. The trainning is running now, but the results seems not, the synthesized wave is not intelligent， not very good.
What is a good config?

jishengpeng · 2024-09-21T11:30:46Z

I try to train the model with hopsize=1024, shout 23 tokens per second, I only change the upsample_rates to [8,8,4,4] and num_samples to 71680. The trainning is running now, but the results seems not, the synthesized wave is not intelligent， not very good. What is a good config?

There are three key considerations to note:

1.The downsampling process should adhere to sampling rate constraints.

2.When modifying the downsampling rate, corresponding adjustments to the hop length and n_fft parameters should be made accordingly.

3.If minimized the number of tokens is your objective, I recommend utilizing audio with a sampling rate of 16 kHz

Liujingxiu23 · 2024-09-21T11:38:41Z

Thank you for your reply!
For the third point, yes I just what to minimized the number of tokens to reduce the computation of the LLM part. You mean "sample_rate=16000 hopsize=600" may be a better choice?

jishengpeng · 2024-09-21T12:03:08Z

Thank you for your reply! For the third point, yes I just what to minimized the number of tokens to reduce the computation of the LLM part. You mean "sample_rate=16000 hopsize=600" may be a better choice?

There are many options, you can try the following configuration, and then adjust the parameters

such as downsamples=[8,5,4,4], sample_rate=16000, hop_length=640, n_fft=2560

Liujingxiu23 · 2024-09-21T12:20:00Z

Thanks a lot! I will try to train using this config!

guanw-pku · 2024-10-24T10:21:07Z

I try to train the model with hopsize=1024, shout 23 tokens per second, I only change the upsample_rates to [8,8,4,4] and num_samples to 71680. The trainning is running now, but the results seems not, the synthesized wave is not intelligent， not very good. What is a good config?

There are three key considerations to note:

1.The downsampling process should adhere to sampling rate constraints.

2.When modifying the downsampling rate, corresponding adjustments to the hop length and n_fft parameters should be made accordingly.

3.If minimized the number of tokens is your objective, I recommend utilizing audio with a sampling rate of 16 kHz

Hi, wondering how to tune the n_fft parameter when adjusting hop_length.
What is the connection between them?
Looking for your reply. Thanks so much.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to train the model with Token/s about 23, that is hopsize=1024 #35

how to train the model with Token/s about 23, that is hopsize=1024 #35

Liujingxiu23 commented Sep 21, 2024

jishengpeng commented Sep 21, 2024

Liujingxiu23 commented Sep 21, 2024 •

edited

Loading

jishengpeng commented Sep 21, 2024

Liujingxiu23 commented Sep 21, 2024

guanw-pku commented Oct 24, 2024

how to train the model with Token/s about 23, that is hopsize=1024 #35

how to train the model with Token/s about 23, that is hopsize=1024 #35

Comments

Liujingxiu23 commented Sep 21, 2024

jishengpeng commented Sep 21, 2024

Liujingxiu23 commented Sep 21, 2024 • edited Loading

jishengpeng commented Sep 21, 2024

Liujingxiu23 commented Sep 21, 2024

guanw-pku commented Oct 24, 2024

Liujingxiu23 commented Sep 21, 2024 •

edited

Loading