Releases · KoljaB/RealtimeTTS

21 Feb 18:46

KoljaB

v0.3.42

625d1d9

v0.3.42

added CoquiVoice & OpenAIVoice classes to be consistent with all engines
routing coquiengine synthesize process output back to main process stdout to see log
added comma_silence_duration and sentence_silence_duration parameters to CoquiEngine constructor
added set_model(self, checkpoint: str) method to CoquiEngine to allow switching of basic xtts model at runtime

Assets 2

20 Dec 16:44

KoljaB

v0.3.40

3ad7819

v0.3.40

merged #22 & fixed #23 (thank you)
change CoquiEngine cloning_reference_wav parameter to "voice"
added coqui predefined speaker voices
Submit name or parts of the name as voice (case-insensitive):
```
engine = CoquiEngine(voice="aaron")
```
RealtimeTTS will first look for wav or json files, then for the predefined speaker voices

Assets 2

07 Dec 23:37

KoljaB

v0.3.35

1ad11f0

v0.3.35

support for multiple speaker cloning reference wave files of coqui engine (submit them as list of filenames as you normally would with a single filename string)
default voice now male (female voice here)

Assets 2

07 Dec 11:07

KoljaB

v0.3.34

69ffc72

v0.3.34

bugfix: AIFF system voice synthesis generation under Mac caused Exception
bugfix: "muted" parameter together with Elevenlabs caused Exception

Assets 2

04 Dec 20:44

KoljaB

v0.3.32

7a9a38b

v0.3.32

play methods got muted parameter
If True, disables audio playback via local speakers (in case you want to synthesize to file or process audio chunks). Default is False.
CoquiEngine got a set_speed method to change at runtime

Assets 2

02 Dec 13:32

KoljaB

v0.3.31

908d4ff

v0.3.31

fixes #17, #18

Assets 2

01 Dec 17:08

KoljaB

v0.3.3

d81dd63

v0.3.3

support of openai tts engine

Assets 2

30 Nov 13:12

KoljaB

v0.3.2

32c923e

v0.3.2

fixes a bug causing play_async() to fail after using stop(). Affected all engines.
added tqdm to requirements (used to show progressbar while coqui model downloads)

Assets 2

29 Nov 17:15

KoljaB

v0.3.0

e7fe147

v0.3.0

More languages

For example chinese supported now.

Technical background:
v0.3.0 uses a new stream2sentence version which now implements stanza tokenizer besides nltk. This allows sentence splitting for way more languages than the nltk tokenizer, which is specialized for english and supports few other languages. Downsides of stanza: download of a quite big model model is necessary, it consumes VRAM (~2GB min) and it is not the fastest even when running on GPU. But I think it is a very performant model. If anybody knows a more light-weight model or library that does the sentence tokenizing well, let me know.

To use the new tokenizer please call the ´play´ or ´play_async´ methods with the new parameter ´tokenizer="stanza"´ and provide the language shortcut. Also adjust ´minimum_sentence_length´, ´minimum_first_fragment_length´ and ´context_size´ parameters to the average word length of the desired language.

For example (chinese):

self.stream.play_async(
    minimum_sentence_length = 2,
    minimum_first_fragment_length = 2, 
    tokenizer="stanza", 
    language="zh",
    context_size=2)

Example implementations here and here.

Fallback engines

Fallback now supported for azure, coqui and system engine (elevenlabs coming soon), enhancing reliability for real-time scenarios by switching to alternate engines if one fails

To use the fallback mechanism just submit a list of engines to the TextToAudioStream constructor instead of a single engine. In case the synthesis of the first engine in the list throws an exception or gives a result hinting to a not successful synthesis the next engine in the list will be tried.

For example:

engines = [AzureEngine(azure_speech_key, azure_speech_region),
               coqui_engine,
               system_engine]
stream = TextToAudioStream(engines)

Example implementation here.

Audio file saving feature

Usage via the output_wavfile parameter of ´play´ and ´play_async´ methods. This allows for the simultaneous saving of real-time synthesized audio, enabling later playback of the live synthesis.

For example:

filename = "synthesis_" + engine.engine_name
stream.play(output_wavfile = f"{filename}.wav")

Also compare to usage here.

Assets 2

27 Nov 17:42

KoljaB

v0.2.7

2e1560f

v0.2.7

added specific_model parameter which allows using XTTS model checkpoints like "2.0.2" or "2.0.1"
- currently set to "2.0.2" as a default because 2.0.3 seems to perform worse
- set to None if you want to always use the latest model
added local_models_path
- if not specified specific_model models will be loaded in a directory "models" in the script directory
- if no specific_model is set it will use coqui's default model directory (users/start/appdata/local/tts under Windows)
added use_deepspeed parameter in case anybody has it installed
added prepare_text_for_synthesis_callback parameter in case the default handler for preparing text for synthesis fails (maybe due to using a language I am not familiar with) and you want to implement your own

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More languages

Fallback engines

Audio file saving feature

Releases: KoljaB/RealtimeTTS

v0.3.42

v0.3.40

v0.3.35

v0.3.34

v0.3.32

v0.3.31

v0.3.3

v0.3.2

v0.3.0

More languages

Fallback engines

Audio file saving feature

v0.2.7