Any plans to add interruption capabilities in the future? #242
Replies: 5 comments
-
Just to add - I've mostly been using the coqui engine, but my question could apply to any/all of the supported architectures. |
Beta Was this translation helpful? Give feedback.
-
TextToAudioStream offer pause/resume methods and stop to completely abort the stream: https://github.com/KoljaB/RealtimeTTS?tab=readme-ov-file#pause-resume--stop |
Beta Was this translation helpful? Give feedback.
-
That worked perfectly for my use case - thank you! One related follow-up question: is it possible to initiate two unique TextToAudioStream streams so long as they don't run at the same time? |
Beta Was this translation helpful? Give feedback.
-
Not possible using the same engine for this because they would interfere with each other's audio queues and synthesis states. With two engines possible but takes double VRAM for local neural engines like coqui, parler or styletts. Code example for coqui and azure: from RealtimeTTS import TextToAudioStream
import time
import logging
import os
# Function to initialize the engine based on a variable
def initialize_engine(engine_type):
if engine_type == "azure":
from RealtimeTTS import AzureEngine
return AzureEngine(
os.environ["AZURE_SPEECH_KEY"],
os.environ["AZURE_SPEECH_REGION"]
)
elif engine_type == "coqui":
from RealtimeTTS import CoquiEngine
engine = CoquiEngine(level=logging.INFO)
# Warm up the engine with muted playback
warm_up_stream = TextToAudioStream(engine)
warm_up_stream.feed("hi")
warm_up_stream.play(muted=True) # Play with muted output
return engine
else:
raise ValueError(f"Unknown engine type: {engine_type}")
def test_stream_interruption(engine_type="azure"):
# Set up logging
logging.basicConfig(level=logging.INFO)
# Initialize engines based on the specified type
engine1 = initialize_engine(engine_type)
engine2 = initialize_engine(engine_type)
stream1 = TextToAudioStream(engine1)
stream2 = TextToAudioStream(engine2)
# Define text generators
def stream1_generator():
yield "A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z"
def stream2_generator():
yield "Did you say something?"
try:
print("Starting Stream 1...")
stream1.feed(stream1_generator())
stream1.play_async(log_synthesized_text=True)
# Simulate detection of interruption after a few seconds
time.sleep(5)
print("\nInterruption detected! Pausing Stream 1...")
stream1.pause()
# Start Stream 2 (interruption response)
print("Starting Stream 2...")
stream2.feed(stream2_generator())
stream2.play(log_synthesized_text=True)
# Simulate waiting for user response
time.sleep(2)
print("\nUser responded 'no'")
# Stop Stream 2 and resume Stream 1
print("Resuming Stream 1...")
stream1.resume()
# Wait for completion
time.sleep(5)
finally:
# Cleanup
engine1.shutdown()
engine2.shutdown()
if __name__ == "__main__":
# Choose the engine type ('azure' or 'coqui')
selected_engine = "coqui" # Change to 'azure' to switch engines
test_stream_interruption(engine_type=selected_engine) |
Beta Was this translation helpful? Give feedback.
-
Understood. Two engines should work for my purposes. That example is incredibly helpful, thank you again for all of the quick support! |
Beta Was this translation helpful? Give feedback.
-
I'm curious if you have any plans to add interruption functionality (i.e. where the user can interrupt the TTS stream) into this library in the future? If not, do you any suggestions on where users may be able to implement their own solution in the current RealtimeTTS workflow? I'm fairly confident I can figure out the VAD and potentially even avoid the chatbot interrupting itself by using the input to the TTS to determine if the chatbot or user is speaking - but I'm not sure where to incorporate into the existing RealtimeTTS stream. I assume it's not as simple
as just pausing/stopping the engine when an interruption is detected? Thank you for the amazing library!
Beta Was this translation helpful? Give feedback.
All reactions