M2M-VC-CycleGAN

CS224n/224s Class Project

There is a significant performance gap in ASR systems between black and white speakers, which is attributed to insufficient audio data from black speakers available for models to train on. We aim to close this gap by using a CycleGAN based voice converter to generate African American Vernacular English utterances from generic American English utterances as a data augmentation strategy. By using a two-step adversarial loss and a self-supervised frame filling task, we were able to noticeably improve the qualitative performance of our CycleGAN based voice conversion pipeline. In spite of this, we could not establish the method of CycleGAN based voice conversion as a reliable method for data augmentation. While this project was challenging, it was especially rewarding to conduct this line of research which has the ultimate goal of ensuring that marginalized voices are heard.

Train the MaskedCycleGAN-VC model:

bash_scripts/aws2/vc3_convert_voc_10.sh

Train the ASR model:

bash_scripts/aws2/asr_coraal_converted.sh

Name		Name	Last commit message	Last commit date
Latest commit History 288 Commits
args		args
asr		asr
bash_scripts		bash_scripts
cycleGAN_VC2		cycleGAN_VC2
data_preprocessing		data_preprocessing
dataset		dataset
logger		logger
manifests		manifests
mask_cycleGAN_VC		mask_cycleGAN_VC
notebooks		notebooks
saver		saver
sofian_notebooks		sofian_notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
debug.py		debug.py
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

M2M-VC-CycleGAN

About

Releases

Packages

Contributors 3

Languages

License

hikaruhotta/M2M-VC-CycleGAN

Folders and files

Latest commit

History

Repository files navigation

M2M-VC-CycleGAN

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages