CLI / Web UI Implementation #613
Replies: 2 comments
-
Please check this configuration here: This uses tortoise-tts as a local server, and includes a few features that you want. There are also example videos in the README. All models stay loaded into memory while the server is running, and you can generate and save voices. The voices are tensor files that are created from the audio clips, saving the voices to files saves you the trouble of generating them again each time you need that speaker. It also makes the voices much more consistent, just make sure your voice performs well on a variety of test sentences before saving it. Some shorter sentences and certain punctuation can cause artifacts, keep generating the voice until you get a good one and then save it. |
Beta Was this translation helpful? Give feedback.
-
Try storing voices in a dictionary |
Beta Was this translation helpful? Give feedback.
-
Nicely done, tortoise tts.
I found that loading the model for inference takes a long time, if I want to clone the voice of multiple identities or multiple segments of speech from one identity, the overhead time could be huge.
I wonder if anyone could share me a piece of code regarding CLI or WebUI implementation so that I just need to load the model once and use the loaded model for inference: giving target identity, prompt speech segments and text, generating the desired speech.
Thanks in advance.
Beta Was this translation helpful? Give feedback.
All reactions