You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have finetuned the model with the same dataset in v2 and v3, and generate the audio, the intuitive sense is following:
the electronic noise in v3 is less than that in v2. However, the intonation, pace, emotion in v3 is worse than that in v2.
So, how to balance the training effects considering the model architecture and training process?
The text was updated successfully, but these errors were encountered:
I remember someone have said, the SynthesizerTrn trained with s2_train decides the timbre, while the t2s_model trained with s1_train decides the intonation, pace, emotion and other properties. So, why does changing s2 model architecture affect the latter?
I have finetuned the model with the same dataset in v2 and v3, and generate the audio, the intuitive sense is following:
the electronic noise in v3 is less than that in v2. However, the intonation, pace, emotion in v3 is worse than that in v2.
So, how to balance the training effects considering the model architecture and training process?
The text was updated successfully, but these errors were encountered: