-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
We are trying to use sentiment text to make the generated audio have different emotions #606
Comments
Hi @xwan07017 , you mean to finetune with emotional data to enhance the ability and leverage the reference speech to control, Thought both will work. If you would like to share some observations, you may drop a zip in this issue or send with email if not proper here. |
we input reference speech and emotion feature(like happy or sad ...), then we got a new audio with a happy emotion(or sad ...) and the same timbre as the reference speech. |
so you have introduce a new embedding concatenated (model structure modified) or the emotion features are served as new tokens |
@SWivid Yes, we introduce a new embedding concatenated (model structure modified), So we collected a new dataset and trained it from scratch. |
None of these solution would handle the case where I want to pass emotions between delimited text (e.g. xml style) like in the example below, correct? e.g.
|
An alternative... Speak in a sad voice, happy voice, record into different files.
|
yes but that's not a model feature but an app feature that uses multiple files (available on gradio demo too in this repository) |
Checks
1. Is this request related to a challenge you're experiencing? Tell us your story.
We are trying to use sentiment text to make the generated audio have different emotions
2. What is your suggested solution?
We are trying to use sentiment text to make the generated audio have different emotions
3. Additional context or comments
We tried to add emotional features for training. The data set was an English data set of about 330 hours and trained for 200k steps, but the effect of emotional guidance was not very good. If you are interested, we can discuss it together.
4. Can you help us with this feature?
The text was updated successfully, but these errors were encountered: