Voice Cloning: When to Fine-Tune Pretrained TTS Models and How Much Data is Needed? #102
Answered
by
eginhard
ClaudiuFilip110
asked this question in
Q&A
Replies: 3 comments 1 reply
-
|
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
eginhard
-
is it mean that i can use VITS preprained polish male voice and based on it use my LJSpeech dataset with my voice to train new model? |
Beta Was this translation helpful? Give feedback.
1 reply
-
As my expirience, to train from the scratch on dutch, de and flemish laguages, 20 hours is enough. But 50 hours got really good quality. For finetuning you need at least 2 hours, but you will hear some corrupts comprimised quality. 6 hours is good enough for usage. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, I'm a ML Engineer (with a few years experience, but) new to TTS (and audio ML). I have experience primarily with NLP and LLMs, but I’m working on a Voice Cloning project, and the transition into TTS has been a bit confusing.
Here's my plan, if you have any tips or suggestions please feel free to add them here.
English Voice Cloning
Another language Voice Cloning
I'm planning to train my own model from scratch (or from a checkpoint).
P.S. Any tips are welcome, as I said, I'm quite the novice when it comes to anything audio ML-related.
Beta Was this translation helpful? Give feedback.
All reactions