-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
YourTTS checkpoint: Dutch, French, German, Italian, Portuguese, Polish, Spanish, and English #2735
Conversation
Hey @freds0 this is awesome. Would you mind if I move the model somewhere more convenient? It is not very reliable to keep it in gdrive. |
@erogol That sounds like a great idea! It would be great to send it to a more reliable drive. Thanks for the suggestion. |
To use the training speakers, speakers,pth should have the speaker embeddings too. Or we can release it with only voice cloning. |
I can share, but I didn't find it in my backups. It is likely that I will need to generate again! |
@freds0 your call, if it is too much work, we can release with voice cloning. |
Hi @erogol , all embeddings were extracted, and are available at the following link at google drive: Or onedrive: Is this really what you need? |
I'll give it a try next Monday. Thanks for sharing 👍 |
@freds0 those files are crazy big. So I'll go with only voice cloning. |
I'm relatively new to this field, is there some documentation somewhere that documents how I would go about consuming these myself? What I've tried is;
I tried with the provided initial file extracted into the user directory where the other models are downloaded, and hit the following:
I assume this is because of what @erogol initially said where the speakers.pth file doesn't contain the embeddings. With the provided JSON files, how would I go about recreating a working file containing these embedding? I tried removing the But doing that I hit:
I figure this might be because I haven't loaded the embeddings for every language? What should I go read? Or what am I missing? |
@itsjamie there are two ways to run this model effectively. The first method involves using these speaker embeddings files. Alternatively, you can opt for the second method, which requires providing a reference audio that will be sent to the model. To get started, simply follow the step-by-step instructions provided in this link: https://colab.research.google.com/drive/1nZuvfW-gjoKJgm_S5_f9ydi5W1xvesCK?usp=sharing |
hey @freds0 thanks for sharing this,cool stuff can you share the tensorboard log for this model if possible, i'm trying to reproduce training using new language using guidence from your paper dan the original yourtts thank you |
@acul3 Unfortunately I didn't save the logs. But to fine-tune a new language, you should mainly look at the alignment chart. When you have something close to the image below, the training can be ended. |
@erogol I created a version of the embeddings file with just 10 samples of each speaker (250MB). All speakers from the CML-TTS dataset are included, and also all the speakers from LibriTTS. Here is the download link |
@freds0 thanks I'll check. I try to finish my backlog before merging this PR. |
@Edresson you should make a separate PR. I can merge it before we merge this one. (I don't know when I can find time to merge this one. ) |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels. |
Correction in training the Fastspeech/Fastspeech2/FastPitch/SpeedySpeech model using external speaker embedding.
In this pull request, I have added a new checkpoint for the YourTTS model, which was trained in multiple languages, including Dutch, French, German, Italian, Portuguese, Polish, Spanish, and English.
To provide more context, the paper is available at the following link: CML-TTS A Multilingual Dataset for Speech Synthesis in Low-Resource Languages. The model was trained using the CML-TTS dataset and the LibriTTS dataset in English. I would also like to inform you that samples generated using this checkpoint can be verified by accessing the following link: https://freds0.github.io/CML-TTS-Dataset/