Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I add an audio dataset for Meadow Mari language? #87

Open
fu-lab opened this issue Nov 28, 2022 · 6 comments
Open

How can I add an audio dataset for Meadow Mari language? #87

fu-lab opened this issue Nov 28, 2022 · 6 comments
Assignees
Labels
New Language New Language that isn't yet supported

Comments

@fu-lab
Copy link

fu-lab commented Nov 28, 2022

I have a set of audio data (audio and transcription in Cyrillic) with a male voice in Meadow Mari: [https://cloud.mail.ru/public/VAKT/WwWiTXYTC]. What should I do so that you also support our language?

@NeonDaniel NeonDaniel added the New Language New Language that isn't yet supported label Nov 28, 2022
@NeonClary
Copy link
Member

Hello @fu-lab, I am adding Meadow Mari to the list of languages we plan to add support for.

From my preliminary checking, that audio archive may be right for our process. The sound quality and voice is very clear, it's one person, and he is speaking in a normal way. I have just one concern that I will check with our team about. I am not certain that having only short speech samples will work. We have used longer speech samples in the past, and all of your samples that I listened to were 10 seconds or less. I didn't see a way to sort your files by size, so perhaps I missed some longer samples. I will ask our team, maybe short samples will be fine since the total amount recorded is still good.

I read that Meadow Mari has an extra letter, a special "ҥ", and a few other rare linguistic features. Sometimes things like that make it tricky to build a language. We will still plan to do it, but I want to tell you in advance that it may be more difficult. When we start working on it, it will be important to have you or another native speaker available to listen to samples and talk to us about the language. Can you share an email I could use to contact you directly for that? You can send it to me at [email protected]

Our team has discussed what's most efficient for our resources, and we'd like to do several language requests at once. We plan to do it after finishing the project our STT/TTS team is working on right now, and before starting the next one. That shouldn't be very long. I will tell you when we get ready to start working on Meadow Mari.

@fu-lab
Copy link
Author

fu-lab commented Nov 29, 2022

Длинные предложения находятся тут: https://cloud.mail.ru/public/YCkw/fpBN7nbrr

С уважением,
Андрей
[email protected]

@NeonClary
Copy link
Member

That's very helpful, thank you! I'll send you an email as soon as we are ready to start.
Это очень полезно, спасибо! Я пришлю вам письмо, как только мы будем готовы начать.

@nfaraji2002
Copy link

We have trained a pytorch Coqui TTS model for Persian language. How could we convert the model to a TFLite one to be executed on Raspberry 4?
Is there any other ways without conversion for running in real-time?
Thanks for your support.

@NeonBohdan NeonBohdan changed the title How can I add an audio dataset for my language? How can I add an audio dataset for Meadow Mari language? Mar 2, 2023
@JohnClaw
Copy link

That's very helpful, thank you! I'll send you an email as soon as we are ready to start.

Did you start to develop Meadow Mari tts?

@NeonClary
Copy link
Member

I wish I could tell you we have. We're a very small team, and have had to prioritize other projects for now. We hope to get back to adding STT & TTS soon, and to find a larger organization willing to provide us some additional GPU time for it. If you're interested in helping out with that or anything else, please send me an email at [email protected].
I do want to assure you that we haven't forgotten about you, and we did add Meadow Mari to our planned languages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
New Language New Language that isn't yet supported
Projects
Status: Low Priority
Development

No branches or pull requests

5 participants