You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I find all the files and issues, didn't find any description on 「How to train in one-hot pattern」, which u suggest we to train just in that mode, if we don't have the necessary to apply 「one-shot」performance.
Could any friend who have successfully trained Auto-VC in one-hot mode, not the embedding with pretrained speaker-encoder?
Hope to get any useful reply from u all !
All the best,
Luke Huang
The text was updated successfully, but these errors were encountered:
Hi
I have not trained one-hot version, but I have some idea to say~
The only difference between one-hot and speaker encoder version is: weather the speaker's embedding can be trained by AutoVC training process.
How to train in one-hot pattern, may like this:
get the number of total speakers, maybe 40
set a lookup embedding table, like multi-speaker tacotron2
every time get the sentences to train, the input is: mels for content encoder, not use speaker embedding, just sent speaker id as input, and among lookup embedding table, then get a trainable embedding vector, and concat this vector with content vector
when gradient back, speaker's embedding vector will change alittle
for all the training process, the same speaker has same embedding vector; like word embedding.
In fact, 「How to train in one-hot pattern」in author's mind may be just the most simple way to train model when face to multi-speaker problem, it's better than speaker encoder version because it's embedding can change by gradient , but speaker encoder's embedding can not.
Hi, @auspicious3000 ,
I find all the files and issues, didn't find any description on 「How to train in one-hot pattern」, which u suggest we to train just in that mode, if we don't have the necessary to apply 「one-shot」performance.
Could any friend who have successfully trained Auto-VC in one-hot mode, not the embedding with pretrained speaker-encoder?
Hope to get any useful reply from u all !
All the best,
Luke Huang
The text was updated successfully, but these errors were encountered: