Skip to content

A trainer for SNAC (Multi-Scale Neural Audio Codec) has replaced the decoder with Vocos.

Notifications You must be signed in to change notification settings

hertz-pj/SNAC-Vocos

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SNAC-Vocos

A trainer for SNAC (Multi-Scale Neural Audio Codec) has replaced the decoder with Vocos.

Installation

Suggested python>=3.9.
Clone the repository:

git clone https://github.com/hertz-pj/SNAC-Vocos
cd SNAC-Vocos

Install packages:

pip install -r requirements.txt

Infer

Refer to the infer.py for inference instructions and usage examples.

Available Models

Model name Huggingface Corpus Domain
snac_vocos_16khz_hop200_scale8421_1kh 🤗 1k hours Speech(Mandarin/English)

Training

1、Prepare a filelist of audio files for the training and validation set, e.g. train.list.
2、Fill a config file, e.g. snac_vocos.yaml. The main parameters to pay attention to are batch_size, filelist_path, save_dir, and device.
3、Start training

python train.py fit --config ./configs/snac_vocos.yaml

TODO

  • Release code
  • Release a checkpoint trained with 1k hours of speech(Mandarin/English).
  • Demo page.

Acknowledgements

This implementation uses parts of the code from the following Github repos:

About

A trainer for SNAC (Multi-Scale Neural Audio Codec) has replaced the decoder with Vocos.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages