Skip to content

Latest commit

 

History

History
138 lines (120 loc) · 5.71 KB

README.md

File metadata and controls

138 lines (120 loc) · 5.71 KB

LoopTest

GitHub GitHub issues GitHub Repo stars

  • This is the official repository of A Benchmarking Initiative for Audio-domain Music Generation Using the FreeSound Loop Dataset co-authored with Paul Chen, Arthur Yeh and my supervisor Yi-Hsuan Yang. The paper has been accepted by International Society for Music Information Retrieval Conference 2021. [Demo Page], [arxiv].
  • We not only provided pretrained model to generate loops on your own but also provided scripts for you to evaluate the generated loops.

Environment

$ conda env create -f environment.yml 

Quick Start

  • Generate loops from one-bar looperman pretrained model
$ gdown --id 1GQpzWz9ycIm5wzkxLsVr-zN17GWD3_6K -O looperman_one_bar_checkpoint.pt
$ bash scripts/generate_looperman_one_bar.sh
  • Generate loops from four-bar looperman pretrained model
$ gdown --id 19rk3vx7XM4dultTF1tN4srCpdya7uxBV -O looperman_four_bar_checkpoint.pt
$ bash scripts/generate_looperman_four_bar.sh
  • Generate loops from freesound pretrained model
$ gdown --id 197DMCOASEMFBVi8GMahHfRwgJ0bhcUND -O freesound_checkpoint.pt 
$ bash scripts/generate_freesound.sh

Pretrained Checkpoint

Benchmarking Freesound Loop Dataset

Download dataset

$ gdown --id 1fQfSZgD9uWbCdID4SzVqNGhsYNXOAbK5
$ unzip freesound_mel_80_320.zip

Training

$ CUDA_VISIBLE_DEVICES=2 python train_drum.py \
    --size 64 --batch 8 --sample_dir freesound_sample_dir \
    --checkpoint_dir freesound_checkpoint \
    --iter 100000
    mel_80_320

Generate audio

$ CUDA_VISIBLE_DEVICES=2 python generate_audio.py \
    --ckpt freesound_checkpoint/100000.pt \
    --pics 2000 --data_path "./data/freesound" \
    --store_path "./generated_freesound_one_bar"

Evaluation

NDB_JS

  • 2000 looperman melspectrogram link
    $ cd evaluation/NDB_JS
    $ gdown --id 1aFGPYlkkAysVBWp9VacHVk2tf-b4rLIh
    $ unzip looper_2000.zip # contain 2000 looperman mel-sepctrogram
    $ rm looper_2000/.zip
    $ bash compute_ndb_js.sh 

IS

  • Short-Chunk CNN checkpoint
    $ cd evaluation/IS
    $ bash compute compute_is_score.sh 

FAD

  • FAD looperman ground truth link, follow the official doc to install required packages.

    $ ls --color=never generated_freesound_one_bar/100000/*.wav > freesound.csv
    $ python -m frechet_audio_distance.create_embeddings_main --input_files freesound.csv --stats freesound.stats
    $ python -m frechet_audio_distance.compute_fad --background_stats ./evaluation/FAD/looperman_2000.stats --test_stats freesound.stats

Train the model with your loop dataset

Preprocess the Loop Dataset

In the preprocess directory and modify some settings (e.g. data path) in the codes and run them with the following orders

$ python trim_2_seconds.py # Cut loop into the single bar and stretch them to 2 second.
$ python extract_mel.py # Extract mel-spectrogram from 2-second audio.
$ python make_dataset.py 
$ python compute_mean_std.py 

Train the Model

CUDA_VISIBLE_DEVICES=2 python train_drum.py \
    --size 64 --batch 8 --sample_dir [sample_dir] \
    --checkpoint_dir [checkpoint_dir] \
    [mel-spectrogram dataset from the proprocessing]
  • checkpoint_dir stores model in the designated directory.
  • sample_dir stores mel-spectrogram generated from the model.
  • You should give the data directory in the end.
  • There is an example training script

Vocoder

We use MelGAN as the vocoder. We trained the vocoder with looperman dataset and use the vocoder in generating freesound and looperman models. The trained vocoder is in melgan directory.

References

The code comes heavily from the code below

Citation

If you find this repo useful, please kindly cite with the following information.

@inproceedings{ allenloopgen, 
	title={A Benchmarking Initiative for Audio-domain Music Generation using the {FreeSound Loop Dataset}},
	author={Tun-Min Hung and Bo-Yu Chen and Yen-Tung Yeh, and Yi-Hsuan Yang},
	booktitle = {Proc. Int. Society for Music Information Retrieval Conf.},
	year={2021},
}