Custom Local Training GUI is moved to DiffTrainer
- lab + wav (NNSVS format)
- csv + wav (DiffSinger format)
- ds (DiffSinger .ds files)
- your_speaker_folder's folder name will be used as spk_name so please be careful about your file naming
- colab notebook primarily uses python; thus space and special character in file name or folder path may be invalid
- for an in-depth guide for SVS training and/or labeling, please see SVS Singing Voice Database - Tutorial
- it is advised to edit your data using SlurCutter for a more refined data for your pitch model
- please visit DiffSinger Discord for any help and questions regarding model production
Zip file format examples:
[NOTE] .ds training has the same zip organization as lab + wav, but with only .ds files- no wav needed #single speaker (lab + wav) your_zip.zip: | | your_speaker_folder: | | data_1.wav data_1.lab . data_2.wav data_2.lab . data_3.wav data_3.lab . ...
#single speaker (csv + wav) your_zip.zip: | | your_speaker_folder: | | wavs (folder named "wavs" containing all the wavs) . transcriptions.csv
#multi speaker (lab + wav) your_zip.zip: | | your_speaker_folder_1: | | data_1.wav data_1.lab . data_2.wav data_2.lab . data_3.wav data_3.lab . ... your_speaker_folder_2: | | data_1.wav data_1.lab . data_2.wav data_2.lab . data_3.wav data_3.lab . ...
#multi speaker (csv + wav) your_zip.zip: | | your_speaker_folder_1: | | wavs (folder named "wavs" containing all the wavs) . transcriptions.csv your_speaker_folder_2: | | wavs (folder named "wavs" containing all the wavs) . transcriptions.csv
- wav
- it is suggested to use manual segmented audio for cleaner segments (though there's minimal difference when using the auto segmentation)
- zip file format can consist of any type of files, even subfolders. data extraction will only account .wav that are within the zip into the training set
- lab + wav (NNSVS format)
- this notebook is still a rough draft, please either don't use it at all or use it with caution....
- [notebook] improve SOFA notebook, add inference
- [notebook] update dictionary conversion code for phoneme types in build OU
- [notebook] clean up multi-dict notebook and support logic for dictionary generating for out-of-spefied-lang labels (/)
- [resource] add example file(s) for multi-dicitonary training
Credits:
-
openvpi for DiffSinger fork and more
-
UtaUtaUtau for nnsvs-db-converter
-
Kei for the original notebook
-
MLo7 for the repo's content
-
PixPrucer for an in-depth SVS guide
-
haru0l for the base pretrain with embeds
-
AgentAsteriski for the local GUI