Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the difference between the config for training WavTokenizer-small and WavTokenizer-large? #25

Open
handsomelys opened this issue Sep 10, 2024 · 2 comments

Comments

@handsomelys
Copy link

I am curious about what parts of the config need to be modified when WavTokenizer trains the Large version on a larger data set? Could you please give me a reference configuration? In addition, can you give a reference loss value regarding whether the model has converged during training, including the loss of generator and discriminator? I eagerly anticipate your response.

@jishengpeng
Copy link
Owner

I am curious about what parts of the config need to be modified when WavTokenizer trains the Large version on a larger data set? Could you please give me a reference configuration? In addition, can you give a reference loss value regarding whether the model has converged during training, including the loss of generator and discriminator? I eagerly anticipate your response.

You can directly utilize the configuration of the small version to run the large version, although the increased data volume would typically necessitate a corresponding increase in model parameters. However, the WavTokenizer's parameter configuration already approaches 200M. If you have abundant computational resources, you can attempt to adjust the parameter quantity, and the encoder side can be further expanded. Moreover, it is recommended to evaluate the convergence based on the validation set. In our experiments, the generator's final loss converges to around 38, the discriminator's loss converges to around 10, and the total loss converges to around 25.

@handsomelys
Copy link
Author

I am curious about what parts of the config need to be modified when WavTokenizer trains the Large version on a larger data set? Could you please give me a reference configuration? In addition, can you give a reference loss value regarding whether the model has converged during training, including the loss of generator and discriminator? I eagerly anticipate your response.

You can directly utilize the configuration of the small version to run the large version, although the increased data volume would typically necessitate a corresponding increase in model parameters. However, the WavTokenizer's parameter configuration already approaches 200M. If you have abundant computational resources, you can attempt to adjust the parameter quantity, and the encoder side can be further expanded. Moreover, it is recommended to evaluate the convergence based on the validation set. In our experiments, the generator's final loss converges to around 38, the discriminator's loss converges to around 10, and the total loss converges to around 25.

Thx!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants