A trainer for SNAC (Multi-Scale Neural Audio Codec) has replaced the decoder with Vocos.
Suggested python>=3.9.
Clone the repository:
git clone https://github.com/hertz-pj/SNAC-Vocos
cd SNAC-Vocos
Install packages:
pip install -r requirements.txt
Refer to the infer.py for inference instructions and usage examples.
Model name | Huggingface | Corpus | Domain |
---|---|---|---|
snac_vocos_16khz_hop200_scale8421_1kh | 🤗 | 1k hours | Speech(Mandarin/English) |
1、Prepare a filelist of audio files for the training and validation set, e.g. train.list.
2、Fill a config file, e.g. snac_vocos.yaml. The main parameters to pay attention to are batch_size, filelist_path, save_dir, and device.
3、Start training
python train.py fit --config ./configs/snac_vocos.yaml
- Release code
- Release a checkpoint trained with 1k hours of speech(Mandarin/English).
- Demo page.
This implementation uses parts of the code from the following Github repos: