Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Voice Conversion' paper candidate 2412.20359 #672

Open
github-actions bot opened this issue Dec 31, 2024 · 0 comments
Open

'Voice Conversion' paper candidate 2412.20359 #672

github-actions bot opened this issue Dec 31, 2024 · 0 comments

Comments

@github-actions
Copy link
Contributor

Please check whether this paper is about 'Voice Conversion' or not.

article info.

  • title: EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion

  • summary: The Emotional Voice Conversion (EVC) aims to convert the discrete emotional
    state from the source emotion to the target for a given speech utterance while
    preserving linguistic content. In this paper, we propose regularizing emotion
    intensity in the diffusion-based EVC framework to generate precise speech of
    the target emotion. Traditional approaches control the intensity of an
    emotional state in the utterance via emotion class probabilities or intensity
    labels that often lead to inept style manipulations and degradations in
    quality. On the contrary, we aim to regulate emotion intensity using
    self-supervised learning-based feature representations and unsupervised
    directional latent vector modeling (DVM) in the emotional embedding space
    within a diffusion-based framework. These emotion embeddings can be modified
    based on the given target emotion intensity and the corresponding direction
    vector. Furthermore, the updated embeddings can be fused in the reverse
    diffusion process to generate the speech with the desired emotion and
    intensity. In summary, this paper aims to achieve high-quality emotional
    intensity regularization in the diffusion-based EVC framework, which is the
    first of its kind work. The effectiveness of the proposed method has been shown
    across state-of-the-art (SOTA) baselines in terms of subjective and objective
    evaluations for the English and Hindi languages \footnote{Demo samples are
    available at the following URL: \url{https://nirmesh-sony.github.io/EmoReg/}}.

  • id: http://arxiv.org/abs/2412.20359v1

judge

Write [vclab::confirmed] or [vclab::excluded] in comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

0 participants