Skip to content

Latest commit

 

History

History
95 lines (75 loc) · 4.47 KB

File metadata and controls

95 lines (75 loc) · 4.47 KB

Contrastive equilibrium learning

Unsupervised learning framwork

This repository provides a implementation of the Contrastive Equilibrium Learning (CEL) for unsupervised learning of this paper. The code is developed on the baseline framework for the VoxSRC 2020 Challenge.

Requirements

The dependencies for this code are the same as baseline framework.

pip install -r requirements.txt

Dataset for training and augmentation

The VoxCeleb datasets are used for these experiments. The train list should contain only the file path, one line per utterance, as follows:

id00012/21Uxsk56VDQ/00001.wav
id00012/21Uxsk56VDQ/00002.wav
...
id09272/u7VNkYraCw0/00026.wav
id09272/u7VNkYraCw0/00027.wav

The train list for VoxCeleb2 can be download from here and the test list for VoxCeleb1 from here. The list for training also can be created by runing python makelist_post.py in a directory ./list.

Furthermore, you can download the MUSAN noise corpus. After downloading and extracting the files, you can split the audio files into short segments for faster random access as the following command:

python process_musan.py /home/shmun/DB/MUSAN/

where /home/shmun/DB/MUSAN/ is our path to the MUSAN corpus.

You can also follow the instructions on the following pages for download and the data preparation of training and augmentation.

Objective functions

Uniformity (uniform)
Prototypical (proto)
Angular Prototypical (angleproto)
Angular Contrastive (anglecontrast)

Front-end encoders

FastResNet34 (ResNetSE34L)
VGGVox
TDNN

Training and evaluation using CEL

Training example with the development set of VoxCeleb2 in an unsupervised manner.

python trainSpeakerNet.py --max_frames 180 --batch_size 200 --unif_loss uniform --sim_loss anglecontrast --augment_anchor --augment_type 3 --save_path save/unif-a-cont --train_list list/train_vox2.txt --test_list list/test_vox1.txt --train_path /home/shmun/DB/VoxCeleb/VoxCeleb2/dev/wav/ --test_path /home/shmun/DB/VoxCeleb/VoxCeleb1/test/wav/ --musan_path /home/shmun/DB/MUSAN/musan_split/

where /home/shmun/DB/VoxCeleb/VoxCeleb2/dev/wav/, /home/shmun/DB/VoxCeleb/VoxCeleb1/test/wav/, /home/shmun/DB/MUSAN/musan_split/ are our paths to VoxCeleb2 development set, VoxCeleb1 test set, processed MUSAN corpus, respectively. And save/unif-a-cont is a directory to save results.

Evaluation example with the original test set of VoxCeleb1.

python trainSpeakerNet.py --eval --initial_model save/unif-a-cont/model/model000000001.model --test_list list/test_vox1.txt --test_path /home/shmun/DB/VoxCeleb/VoxCeleb1/test/wav/

Code for VOiCES evaluation is here.

Pre-trained models

We share the pre-trained models reported in this paper. Move the downloaded pre-trained models to the directory ./save.

python trainSpeakerNet.py --eval --initial_model save/pre-trained_unspv_unif-a-prot.model --test_list list/test_vox1.txt --test_path /home/shmun/DB/VoxCeleb/VoxCeleb1/test/wav/
python trainSpeakerNet.py --eval --initial_model save/pre-trained_unspv_unif-a-cont.model --test_list list/test_vox1.txt --test_path /home/shmun/DB/VoxCeleb/VoxCeleb1/test/wav/

Citation

If you make use of this repository, please consider citing:

@article{mun2020cel,
  title={Unsupervised representation learning for speaker recognition via contrastive equilibrium learning},
  author={Mun, Sung Hwan and Kang, Woo Hyun and Han, Min Hyun and Kim, Nam Soo},
  journal={arXiv preprint arXiv:2010.11433},
  year={2020}
}