This repository provides a implementation of the Contrastive Equilibrium Learning (CEL) for unsupervised learning of this paper. The code is developed on the baseline framework for the VoxSRC 2020 Challenge.
The dependencies for this code are the same as baseline framework.
pip install -r requirements.txt
The VoxCeleb datasets are used for these experiments. The train list should contain only the file path, one line per utterance, as follows:
id00012/21Uxsk56VDQ/00001.wav
id00012/21Uxsk56VDQ/00002.wav
...
id09272/u7VNkYraCw0/00026.wav
id09272/u7VNkYraCw0/00027.wav
The train list for VoxCeleb2 can be download from here and the test list for VoxCeleb1 from here.
The list for training also can be created by runing python makelist_post.py
in a directory ./list
.
Furthermore, you can download the MUSAN noise corpus. After downloading and extracting the files, you can split the audio files into short segments for faster random access as the following command:
python process_musan.py /home/shmun/DB/MUSAN/
where /home/shmun/DB/MUSAN/
is our path to the MUSAN corpus.
You can also follow the instructions on the following pages for download and the data preparation of training and augmentation.
- Training: VoxCeleb1&2 datasets
- Augmentation: MUSAN corpus and RIR filters
Uniformity (uniform)
Prototypical (proto)
Angular Prototypical (angleproto)
Angular Contrastive (anglecontrast)
FastResNet34 (ResNetSE34L)
VGGVox
TDNN
Training example with the development set of VoxCeleb2 in an unsupervised manner.
python trainSpeakerNet.py --max_frames 180 --batch_size 200 --unif_loss uniform --sim_loss anglecontrast --augment_anchor --augment_type 3 --save_path save/unif-a-cont --train_list list/train_vox2.txt --test_list list/test_vox1.txt --train_path /home/shmun/DB/VoxCeleb/VoxCeleb2/dev/wav/ --test_path /home/shmun/DB/VoxCeleb/VoxCeleb1/test/wav/ --musan_path /home/shmun/DB/MUSAN/musan_split/
where /home/shmun/DB/VoxCeleb/VoxCeleb2/dev/wav/
, /home/shmun/DB/VoxCeleb/VoxCeleb1/test/wav/
, /home/shmun/DB/MUSAN/musan_split/
are our paths to VoxCeleb2 development set, VoxCeleb1 test set, processed MUSAN corpus, respectively. And save/unif-a-cont
is a directory to save results.
Evaluation example with the original test set of VoxCeleb1.
python trainSpeakerNet.py --eval --initial_model save/unif-a-cont/model/model000000001.model --test_list list/test_vox1.txt --test_path /home/shmun/DB/VoxCeleb/VoxCeleb1/test/wav/
Code for VOiCES evaluation is here.
We share the pre-trained models reported in this paper. Move the downloaded pre-trained models to the directory ./save
.
Unif + A-Prot
EER: 8.01%
: Download
python trainSpeakerNet.py --eval --initial_model save/pre-trained_unspv_unif-a-prot.model --test_list list/test_vox1.txt --test_path /home/shmun/DB/VoxCeleb/VoxCeleb1/test/wav/
Unif + A-Cont
EER: 8.05%
: Download
python trainSpeakerNet.py --eval --initial_model save/pre-trained_unspv_unif-a-cont.model --test_list list/test_vox1.txt --test_path /home/shmun/DB/VoxCeleb/VoxCeleb1/test/wav/
If you make use of this repository, please consider citing:
@article{mun2020cel,
title={Unsupervised representation learning for speaker recognition via contrastive equilibrium learning},
author={Mun, Sung Hwan and Kang, Woo Hyun and Han, Min Hyun and Kim, Nam Soo},
journal={arXiv preprint arXiv:2010.11433},
year={2020}
}