You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey @MaxMax2016 thanks for the code. I've tried to play with the current implementation a bit and honestly it doesn't really work as intended. Here are the reasons:
It needs weights in loss (usually has to be much smaller than 1.0) and also Gaussian weight similar to inference (noise_scale)
Because sound is time-variable it needs SoftDTW for KL loss otherwise it pushes to make speech very uniform. Paper mentions that.
Without SoftDTW after loss is applied automated evaluation CER goes down, Mel loss goes significantly up and Frechet score also goes significantly up. This is because speech is not following target audio anymore.
More advanced implementation of backward loss is here: heatz123/naturalspeech#12 but also not straigth to make it work.
作者好,关于 some Natural Speech Features Of Microsoft
这部分的优化代码是哪一部分呢,没有找到,请指示一下。
The text was updated successfully, but these errors were encountered: