In our paper, we proposed MP-SENet: a TF-domain monaural SE model with parallel magnitude and phase spectra denoising.
We provide our implementation as open source in this repository.
Abstract: This paper proposes MP-SENet, a novel Speech Enhancement Network which directly denoises Magnitude and Phase spectra in parallel. The proposed MP-SENet adopts a codec architecture in which the encoder and decoder are bridged by convolution-augmented transformers. The encoder aims to encode time-frequency representations from the input noisy magnitude and phase spectra. The decoder is composed of parallel magnitude mask decoder and phase decoder, directly recovering clean magnitude spectra and clean-wrapped phase spectra by incorporating learnable sigmoid activation and parallel phase estimation architecture, respectively. Multi-level losses defined on magnitude spectra, phase spectra, short-time complex spectra, and time-domain waveforms are used to train the MP-SENet model jointly. Experimental results show that our proposed MP-SENet achieves a PESQ of 3.50 on the public VoiceBank+DEMAND dataset and outperforms existing advanced SE methods.
Audio samples for short-version MP-SENet accepted by Interspeech 2023 can be found here.
A long-version MP-SENet is available on arxiv now, and its corresponding audio samples can be found here.
This source code is only for the MP-SENet accepted by Interspeech 2023.
- Python >= 3.6.
- Clone this repository.
- Install python requirements. Please refer requirements.txt.
- Download and extract the VoiceBank+DEMAND dataset. Resample all wav files to 16kHz, and move the clean and noisy wavs to
VoiceBank+DEMAND/wavs_clean
andVoiceBank+DEMAND/wavs_noisy
, respectively.
CUDA_VISIBLE_DEVICES=0,1 python train.py --config config.json
Checkpoints and copy of the configuration file are saved in the cp_mpsenet
directory by default.
You can change the path by adding --checkpoint_path
option.
python inference.py --checkpoint_file [generator checkpoint file path]
You can also use the pretrained best checkpoint file we provide in best_ckpt/g_best
.
Generated wav files are saved in generated_files
by default.
You can change the path by adding --output_dir
option.
We referred to HiFiGAN, NSPP and CMGAN to implement this.
@inproceedings{lu23e_interspeech,
author={Ye-Xin Lu and Yang Ai and Zhen-Hua Ling},
title={{MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra}},
year=2023,
booktitle={Proc. INTERSPEECH 2023},
pages={3834--3838},
doi={10.21437/Interspeech.2023-1441}
}