You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion
summary: The Emotional Voice Conversion (EVC) aims to convert the discrete emotional
state from the source emotion to the target for a given speech utterance while
preserving linguistic content. In this paper, we propose regularizing emotion
intensity in the diffusion-based EVC framework to generate precise speech of
the target emotion. Traditional approaches control the intensity of an
emotional state in the utterance via emotion class probabilities or intensity
labels that often lead to inept style manipulations and degradations in
quality. On the contrary, we aim to regulate emotion intensity using
self-supervised learning-based feature representations and unsupervised
directional latent vector modeling (DVM) in the emotional embedding space
within a diffusion-based framework. These emotion embeddings can be modified
based on the given target emotion intensity and the corresponding direction
vector. Furthermore, the updated embeddings can be fused in the reverse
diffusion process to generate the speech with the desired emotion and
intensity. In summary, this paper aims to achieve high-quality emotional
intensity regularization in the diffusion-based EVC framework, which is the
first of its kind work. The effectiveness of the proposed method has been shown
across state-of-the-art (SOTA) baselines in terms of subjective and objective
evaluations for the English and Hindi languages \footnote{Demo samples are
available at the following URL: \url{https://nirmesh-sony.github.io/EmoReg/}}.
Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion
summary: The Emotional Voice Conversion (EVC) aims to convert the discrete emotional
state from the source emotion to the target for a given speech utterance while
preserving linguistic content. In this paper, we propose regularizing emotion
intensity in the diffusion-based EVC framework to generate precise speech of
the target emotion. Traditional approaches control the intensity of an
emotional state in the utterance via emotion class probabilities or intensity
labels that often lead to inept style manipulations and degradations in
quality. On the contrary, we aim to regulate emotion intensity using
self-supervised learning-based feature representations and unsupervised
directional latent vector modeling (DVM) in the emotional embedding space
within a diffusion-based framework. These emotion embeddings can be modified
based on the given target emotion intensity and the corresponding direction
vector. Furthermore, the updated embeddings can be fused in the reverse
diffusion process to generate the speech with the desired emotion and
intensity. In summary, this paper aims to achieve high-quality emotional
intensity regularization in the diffusion-based EVC framework, which is the
first of its kind work. The effectiveness of the proposed method has been shown
across state-of-the-art (SOTA) baselines in terms of subjective and objective
evaluations for the English and Hindi languages \footnote{Demo samples are
available at the following URL: \url{https://nirmesh-sony.github.io/EmoReg/}}.
id: http://arxiv.org/abs/2412.20359v1
judge
Write [vclab::confirmed] or [vclab::excluded] in comment.
The text was updated successfully, but these errors were encountered: