Skip to content

Latest commit

 

History

History
40 lines (22 loc) · 1.4 KB

README.md

File metadata and controls

40 lines (22 loc) · 1.4 KB

Decompose speech into content, style and speaker

Quick look

train

train_ddp_dataloader_segment_submit_success_v21.py

model

speechdecompose_dataloader_segment.py

loss

loss_dataloader_segment.py

inference

convert_batch_using_fs2vocoder_denorm.py


1. data preprocess

extract mel spectrogram

Following ming024's FastSpeech2, python3 preprocess.py config/LJSpeech/preprocess.yaml

extract d-vector

Following liusongxiang's ppg-vc, python3 3_compute_spk_dvecs_no_flatten.py

verify the d-vector using t-sne

2 train

local GCR debug : python3 -m torch.distributed.launch train_ddp_dataloader_segment_submit.py --model-dir ./model_debug_dir --log-dir ./log_debug_dir -p config/VCTK/preprocess.yaml -m config/VCTK/model.yaml -t config/VCTK/train.yaml

WebIDE :

dataloader: segment 128 (following Wendison's VQMIVC)

Reference