Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Add OuteTTS! #255

Open
Meshwa428 opened this issue Jan 22, 2025 · 1 comment
Open

[Feature Request] Add OuteTTS! #255

Meshwa428 opened this issue Jan 22, 2025 · 1 comment

Comments

@Meshwa428
Copy link

I would like to request support for OuteTTS in this repository.

OuteTTS performs text-to-speech (TTS) using a large language model (Qwen2.5 0.5B) to generate audio tokens. This approach enables streaming TTS, making it efficient and responsive.

Additionally, OuteTTS includes a GGUF version with an impressively low real-time factor, making it suitable for resource-constrained environments.

Model Sizes and Performance

OuteTTS offers multiple quantization levels, catering to different needs:

  • q2: Smallest model, highest real-time factor, but lower accuracy.
  • q3: Balanced option with average performance.
  • q4: Ideal for edge devices; offers the best accuracy while maintaining efficiency. This size is widely recommended.
  • q5 to q8: Larger models with increasing accuracy at the cost of higher memory usage.
    • q8 achieves accuracy close to FP16 precision.

Benefits of Adding OuteTTS Support:

  • Enables real-time, low-latency TTS.
  • Scales across a range of devices, from low-power edge devices to high-performance systems.
  • Provides flexibility with different model sizes to balance accuracy and efficiency.

Thank you for considering this feature request!

@KoljaB
Copy link
Owner

KoljaB commented Jan 22, 2025

Hi @Meshwa428,

Thanks for suggesting OuteTTS and giving these detailed insights.

After testing I found OuteTTS often generates artifacts at synthesis start. It fades down at the end too fast, so sometimes you can't really hear the last word. It also lacks the expressiveness of alternatives like XTTS and StyleTTS2, these are more emotional. The requirement to install llama.cpp for GGUF makes setup complex for users, without synthesis is not that fast. Models like Kokoro-82M offer better performance and quality.

So for these reasons I won’t be adding OuteTTS currently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants