Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TTS sentence aggregation fix #528

Merged
merged 1 commit into from
Sep 30, 2024
Merged

TTS sentence aggregation fix #528

merged 1 commit into from
Sep 30, 2024

Conversation

kwindla
Copy link
Contributor

@kwindla kwindla commented Sep 30, 2024

How do we feel about changing a utility function's return type from bool to int? This should not be a breaking change for any actual code, modulo type annotations complaining, perhaps?

This PR fixes the fact that we were looking for end-of-sentence patterns anywhere in a string, but not splitting the string. This worked well for GPT-4 models, which seem to always send chunks that break cleanly on sentence boundaries. But other models don't necessarily do that. So we were sending The first word or even first word fragment through to the TTS service, playing havoc with prosody.

I also changed the _push_tts_frames() code so that we are not stripping whitespace from the beginning and end of each chunk we send to the TTS. That was fine when all TTS models were non-stateful invocations via HTTP. But now that we can stream to several of the TTS services, whitespace can be important for prosody.

@kwindla kwindla merged commit 5d63615 into main Sep 30, 2024
3 checks passed
@aconchillo aconchillo deleted the khk/sentence-splits branch October 23, 2024 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants