v3 VS v2 #2053

foreverhell · 2025-02-14T08:04:09Z

I have finetuned the model with the same dataset in v2 and v3, and generate the audio, the intuitive sense is following:
the electronic noise in v3 is less than that in v2. However, the intonation, pace, emotion in v3 is worse than that in v2.
So, how to balance the training effects considering the model architecture and training process?

pashanitw · 2025-02-15T22:33:15Z

what is the major difference in architecture between v3 and v2

foreverhell · 2025-02-17T06:02:35Z

what is the major difference in architecture between v3 and v2

I have seen the code of model architecture. The v2 is based on GAN architecture, while maybe the v3 is based on the diffusion architecture.

foreverhell · 2025-02-17T06:07:42Z

I remember someone have said, the SynthesizerTrn trained with s2_train decides the timbre, while the t2s_model trained with s1_train decides the intonation, pace, emotion and other properties. So, why does changing s2 model architecture affect the latter?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v3 VS v2 #2053

v3 VS v2 #2053

foreverhell commented Feb 14, 2025 •

edited

Loading

pashanitw commented Feb 15, 2025

foreverhell commented Feb 17, 2025

foreverhell commented Feb 17, 2025

v3 VS v2 #2053

v3 VS v2 #2053

Comments

foreverhell commented Feb 14, 2025 • edited Loading

pashanitw commented Feb 15, 2025

foreverhell commented Feb 17, 2025

foreverhell commented Feb 17, 2025

foreverhell commented Feb 14, 2025 •

edited

Loading