You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: Unsupervised Rhythm and Voice Conversion of Dysarthric to Healthy Speech for ASR
summary: Automatic speech recognition (ASR) systems are well known to perform poorly
on dysarthric speech. Previous works have addressed this by speaking rate
modification to reduce the mismatch with typical speech. Unfortunately, these
approaches rely on transcribed speech data to estimate speaking rates and
phoneme durations, which might not be available for unseen speakers. Therefore,
we combine unsupervised rhythm and voice conversion methods based on
self-supervised speech representations to map dysarthric to typical speech. We
evaluate the outputs with a large ASR model pre-trained on healthy speech
without further fine-tuning and find that the proposed rhythm conversion
especially improves performance for speakers of the Torgo corpus with more
severe cases of dysarthria. Code and audio samples are available at https://idiap.github.io/RnV .
Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: Unsupervised Rhythm and Voice Conversion of Dysarthric to Healthy Speech for ASR
summary: Automatic speech recognition (ASR) systems are well known to perform poorly
on dysarthric speech. Previous works have addressed this by speaking rate
modification to reduce the mismatch with typical speech. Unfortunately, these
approaches rely on transcribed speech data to estimate speaking rates and
phoneme durations, which might not be available for unseen speakers. Therefore,
we combine unsupervised rhythm and voice conversion methods based on
self-supervised speech representations to map dysarthric to typical speech. We
evaluate the outputs with a large ASR model pre-trained on healthy speech
without further fine-tuning and find that the proposed rhythm conversion
especially improves performance for speakers of the Torgo corpus with more
severe cases of dysarthria. Code and audio samples are available at
https://idiap.github.io/RnV .
id: http://arxiv.org/abs/2501.10256v1
judge
Write [vclab::confirmed] or [vclab::excluded] in comment.
The text was updated successfully, but these errors were encountered: