Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v3 VS v2 #2053

Open
foreverhell opened this issue Feb 14, 2025 · 3 comments
Open

v3 VS v2 #2053

foreverhell opened this issue Feb 14, 2025 · 3 comments

Comments

@foreverhell
Copy link

foreverhell commented Feb 14, 2025

I have finetuned the model with the same dataset in v2 and v3, and generate the audio, the intuitive sense is following:
the electronic noise in v3 is less than that in v2. However, the intonation, pace, emotion in v3 is worse than that in v2.
So, how to balance the training effects considering the model architecture and training process?

@pashanitw
Copy link

what is the major difference in architecture between v3 and v2

@foreverhell
Copy link
Author

what is the major difference in architecture between v3 and v2

I have seen the code of model architecture. The v2 is based on GAN architecture, while maybe the v3 is based on the diffusion architecture.

@foreverhell
Copy link
Author

I remember someone have said, the SynthesizerTrn trained with s2_train decides the timbre, while the t2s_model trained with s1_train decides the intonation, pace, emotion and other properties. So, why does changing s2 model architecture affect the latter?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants