A PyTorch implementation of dual-path transformer network (DPTNet) based speech separation on wsj0-2mix described in the paper "Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation", which has been accepted by Interspeech2020.
This implementation is based on DPRNN, thanks Yi Luo and ShiZiqiang for sharing.
File description:
optimizer_dptnet.py: a simple wrapper class for learning rate scheduling
transformer_improved.py: a PyTorch implementation of the improved transformer in the paper
dpt_net.py: where you can start
We obtain SDR 20.6 dB on wsj0-2mix and 16.8 dB on LS-2mix dataset.
CUDA_VISIBLE_DEVICES=0 python train_and_eval.py --train_dir /my/wsj2-mix/path/min/tr --valid_dir /my/wsj2-mix/path/min/cv
Continue training without eval:
CUDA_VISIBLE_DEVICES=0 python train_and_eval.py --mode train --continue_from exp/temp_ctd/epoch14.pth.tar --start_epoch 14 --warmup 0 --save_folder exp/tmp_ctd
Train a faster training model
CUDA_VISIBLE_DEVICES=1 python train_and_eval.py --print_freq 40 --W 16 --D 2 --save_folder exp/smaller_spectrogram
Eval only
CUDA_VISIBLE_DEVICES=1 python train_and_eval.py --mode eval --continue_from exp/temp_ctd2/epoch8.pth.tar --save_folder exp/tmp_ctd2_eval
https://github.com/ShiZiqiang/dual-path-RNNs-DPRNNs-based-speech-separation
https://github.com/kaituoxu/Speech-Transformer/blob/master/src/transformer/optimizer.py
https://github.com/pytorch/pytorch/blob/eace0533985641d9c2f36e43e3de694aca886bd9/torch/nn/modules/transformer.py
Working in pytorch 1.8, cuda 11.1. Default training model is working on a gpu with 16+ GB of memory. Faster-training model is working on a gpu with only 2.5 GB of memory.