CLI / Web UI Implementation #613

Qifeng-Wu99 · 2023-09-29T21:06:45Z

Qifeng-Wu99
Sep 29, 2023

Nicely done, tortoise tts.

I found that loading the model for inference takes a long time, if I want to clone the voice of multiple identities or multiple segments of speech from one identity, the overhead time could be huge.

I wonder if anyone could share me a piece of code regarding CLI or WebUI implementation so that I just need to load the model once and use the loaded model for inference: giving target identity, prompt speech segments and text, generating the desired speech.

Thanks in advance.

Pandaily591 · 2023-10-02T11:09:46Z

Pandaily591
Oct 2, 2023

Please check this configuration here:
https://github.com/Pandaily591/OnlySpeakTTS

This uses tortoise-tts as a local server, and includes a few features that you want. There are also example videos in the README.

All models stay loaded into memory while the server is running, and you can generate and save voices.

The voices are tensor files that are created from the audio clips, saving the voices to files saves you the trouble of generating them again each time you need that speaker.

It also makes the voices much more consistent, just make sure your voice performs well on a variety of test sentences before saving it. Some shorter sentences and certain punctuation can cause artifacts, keep generating the voice until you get a good one and then save it.

0 replies

fakerybakery · 2023-11-18T21:53:37Z

fakerybakery
Nov 18, 2023

Try storing voices in a dictionary

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLI / Web UI Implementation #613

{{title}}

Replies: 2 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

CLI / Web UI Implementation #613

Qifeng-Wu99 Sep 29, 2023

Replies: 2 comments

Pandaily591 Oct 2, 2023

fakerybakery Nov 18, 2023

Qifeng-Wu99
Sep 29, 2023

Pandaily591
Oct 2, 2023

fakerybakery
Nov 18, 2023