-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do you have any plan about Speech to Text or Speech to Speech End2End models? #78
Comments
For your first idea, I think the asr example have done it. |
I main speech inputs with LLM outputs. |
Your "text" means response, right? |
Exactly. |
Are you talking about ASR for the speech-to-text task? If so, you can try our ASR example. We may support speech-to-speech in the future, but as this task is much more difficult than ASR or TTS, it is more like combining these two seamlessly. Thank you for your advice; we will take it into consideration. If you have any further questions or need additional assistance, feel free to ask! |
I used the SLAM framework to fine-tune the inference results. Why are the test results on librispeech not as good as directly using the whisper open source model? |
I found one that supports both S2T and S2S simultaneously: https://github.com/MooreThreads/MooER |
🚀 The feature, motivation and pitch
As we all know, GPT-4o is an end2end multi-modal models, which support Speech to Text/Speech. I have some ideas about it:
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: