Any plans to add interruption capabilities in the future? #242

RandomUser27 · 2025-01-02T17:18:07Z

RandomUser27
Jan 2, 2025

I'm curious if you have any plans to add interruption functionality (i.e. where the user can interrupt the TTS stream) into this library in the future? If not, do you any suggestions on where users may be able to implement their own solution in the current RealtimeTTS workflow? I'm fairly confident I can figure out the VAD and potentially even avoid the chatbot interrupting itself by using the input to the TTS to determine if the chatbot or user is speaking - but I'm not sure where to incorporate into the existing RealtimeTTS stream. I assume it's not as simple
as just pausing/stopping the engine when an interruption is detected? Thank you for the amazing library!

RandomUser27 · 2025-01-02T17:18:59Z

RandomUser27
Jan 2, 2025
Author

Just to add - I've mostly been using the coqui engine, but my question could apply to any/all of the supported architectures.

0 replies

KoljaB · 2025-01-02T20:06:03Z

KoljaB
Jan 2, 2025
Maintainer

TextToAudioStream offer pause/resume methods and stop to completely abort the stream: https://github.com/KoljaB/RealtimeTTS?tab=readme-ov-file#pause-resume--stop
For most engines this is immediately except for EdgeEngine and ElevenlabsEngine because their mpeg playout buffers some audio and lags a tiny bit when stopped.

0 replies

RandomUser27 · 2025-01-02T22:42:56Z

RandomUser27
Jan 2, 2025
Author

That worked perfectly for my use case - thank you!

One related follow-up question: is it possible to initiate two unique TextToAudioStream streams so long as they don't run at the same time?
An example of the scenario this would be helpful:
Stream1: "ABCDEFGHIJKLMNO..."
{interruption function pauses stream1 and initiates stream2}
Stream2: "Did you say something?"
{user responds "no"}
{interruption function aborts stream2 and resumes stream 1}
Stream1: "...PQRSTUVWXYZ"

0 replies

KoljaB · 2025-01-03T01:00:07Z

KoljaB
Jan 3, 2025
Maintainer

Not possible using the same engine for this because they would interfere with each other's audio queues and synthesis states.

With two engines possible but takes double VRAM for local neural engines like coqui, parler or styletts.

Code example for coqui and azure:

from RealtimeTTS import TextToAudioStream
import time
import logging
import os

# Function to initialize the engine based on a variable
def initialize_engine(engine_type):
    if engine_type == "azure":
        from RealtimeTTS import AzureEngine
        return AzureEngine(
            os.environ["AZURE_SPEECH_KEY"],
            os.environ["AZURE_SPEECH_REGION"]
        )
    elif engine_type == "coqui":
        from RealtimeTTS import CoquiEngine
        engine = CoquiEngine(level=logging.INFO)
        
        # Warm up the engine with muted playback
        warm_up_stream = TextToAudioStream(engine)
        warm_up_stream.feed("hi")
        warm_up_stream.play(muted=True)  # Play with muted output
        return engine
    else:
        raise ValueError(f"Unknown engine type: {engine_type}")

def test_stream_interruption(engine_type="azure"):
    # Set up logging
    logging.basicConfig(level=logging.INFO)

    # Initialize engines based on the specified type
    engine1 = initialize_engine(engine_type)
    engine2 = initialize_engine(engine_type)
    stream1 = TextToAudioStream(engine1)
    stream2 = TextToAudioStream(engine2)

    # Define text generators
    def stream1_generator():
        yield "A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z"

    def stream2_generator():
        yield "Did you say something?"

    try:
        print("Starting Stream 1...")
        stream1.feed(stream1_generator())
        stream1.play_async(log_synthesized_text=True)
        
        # Simulate detection of interruption after a few seconds
        time.sleep(5)
        print("\nInterruption detected! Pausing Stream 1...")
        stream1.pause()
        
        # Start Stream 2 (interruption response)
        print("Starting Stream 2...")
        stream2.feed(stream2_generator())
        stream2.play(log_synthesized_text=True)
        
        # Simulate waiting for user response
        time.sleep(2)
        print("\nUser responded 'no'")
        
        # Stop Stream 2 and resume Stream 1
        print("Resuming Stream 1...")
        stream1.resume()
        
        # Wait for completion
        time.sleep(5)
        
    finally:
        # Cleanup
        engine1.shutdown()
        engine2.shutdown()

if __name__ == "__main__":
    # Choose the engine type ('azure' or 'coqui')
    selected_engine = "coqui"  # Change to 'azure' to switch engines
    test_stream_interruption(engine_type=selected_engine)

0 replies

RandomUser27 · 2025-01-03T22:30:07Z

RandomUser27
Jan 3, 2025
Author

Understood. Two engines should work for my purposes. That example is incredibly helpful, thank you again for all of the quick support!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any plans to add interruption capabilities in the future? #242

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Any plans to add interruption capabilities in the future? #242

RandomUser27 Jan 2, 2025

Replies: 5 comments

RandomUser27 Jan 2, 2025 Author

KoljaB Jan 2, 2025 Maintainer

RandomUser27 Jan 2, 2025 Author

KoljaB Jan 3, 2025 Maintainer

RandomUser27 Jan 3, 2025 Author

RandomUser27
Jan 2, 2025

RandomUser27
Jan 2, 2025
Author

KoljaB
Jan 2, 2025
Maintainer

RandomUser27
Jan 2, 2025
Author

KoljaB
Jan 3, 2025
Maintainer

RandomUser27
Jan 3, 2025
Author