This repo is the official PyTorch implementation of 'VINP: Variational Bayesian Inference with Neural Speech Prior for Joint ASR-Effective Speech Dereverberation and Blind RIR Identification', which has been submitted to IEEE/ACM Trans. on TASLP.
Please see requirements.txt
Step1. Prepare clean source speech and noise recordings in .wav or .flac format.
Step2. Prepare reverberant and direct-path RIRs
python dataset/ -c [config/config_gen_rir.json]
Step3. Save the list of filepath for the source speech, simulated RIR (.npz), and noise to .txt file
python datset/ -i [dirpath] -o [.txt filepath] -e [filename extension]
Prepare the official single-channel test sets of REVERB Challenge Dataset.
Step1. Prepare the RIRs of the 'Single' subfolder in ACE Challenge.
Step2. Downsample the RIRs to 16kHz
python datset/ -i [ACE 'Single' dirpath] -o [saved dirpath]
Step3. Save the list of filepath for the source speech, ACE RIR, and noise to .txt file
python datset/ -i [dirpath] -o [.txt filepath] -e [filename extension]
Step4. Generate the test set (consists of reverberant speech and labels)
python dataset/ --[keyword] [arg]
Step1. Edit the config file (for example: config/config_VINP_oSpatialNet.toml
and config/config_VINP_TCNSAS.toml
Step2. Run
# train from scratch
torchrun --standalone --nnodes=1 --nproc_per_node=[number of GPUs] -c [config filepath] -p [saved dirpath]
# resume training
torchrun --standalone --nnodes=1 --nproc_per_node=[number of GPUs] -c [config filepath] -p [saved dirpath] -r
# use pretrained checkpoints
torchrun --standalone --nnodes=1 --nproc_per_node=[number of GPUs] -c [config filepath] -p [saved dirpath] --start_ckpt [pretrained model filepath]
python -c [config filepath] --ckpt [list of checkpoints] -i [reverberant speech dirpath] -o [output dirpath] -d [GPU id]
Evaluation results are saved to the output folder.
For SimData, run
bash eval/ -i [speech dirpath] -r [reference dirpath]
For RealData, the reference is not available. Run
bash eval/ -i [speech dirpath]
For SimData, run
python eval/ -i [speech dirpath] -m [whisper model name (tiny small medium)]
For RealData, run
python eval/ -i [speech dirpath] -m [whisper model name (tiny small medium)]
Step1. Estimate RT60 and DRR using
python -i [estimated RIR dirpath]
Step2. Run
python eval/ -o [estimated RT60 or DRR .json] -r [reference RT60 or DRR .json]
If you find our work helpful, please cite
title={VINP: Variational Bayesian Inference with Neural Speech Prior for Joint ASR-Effective Speech Dereverberation and Blind RIR Identification},
author={Pengyu Wang and Ying Fang and Xiaofei Li},