-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to fix the voice across generations ? #554
Comments
You can add a reference audio to pin the timbre. |
@leng-yue Using reference audio can pin the timbre well, but the speed and pause seem to be random, and reducing the temperature cannot solve the problem. |
Did you include proper puncs in your reference text? |
Yes, the reference audio use the high-quality natural voice synthesized by Microsoft Speech, and the reference text also uses reasonable punctuation. |
Since it's an auto-regressive model, having different speed / porsody across different generation is an expected behavior, does this cause any issue on your side? |
I'm trying to use your (very cool!) TTS for a personal AI I'm developing for myself, and not having a way to control the speed of the speech is something that hurts the quality of the final result. |
We are working on that :) |
This issue is stale because it has been open for 30 days with no activity. |
Self Checks
1. Is this request related to a challenge you're experiencing? Tell me about your story.
When generating speech from webui, it samples random voice. How can I fix the generated voice ? I can help with a PR.
2. Additional context or comments
No response
3. Can you help us with this feature?
The text was updated successfully, but these errors were encountered: