Recent advances in deep music generation models are often trained on large datasets containing thousands of hours of music. Due to the large amounts of readily available training data, many contemporary models such as MetaAI's MusicGen forego the implementation of data augmentation during pre-processing. This raises questions regarding the robustness and flexibility of these models against modified input data. This study investigates the performance of MusicGen when exposed to data with the following augmentations:
- Pitch Alterations sampled from a uniform distribution within [-6, 6] semitones
- Volume Adjustments sampled from a uniform distribution spanning [-30, 30] on the midi scale between 0-127
- Tempo Modifications sampled from a discrete distribution comprisiing values of [0.25, 0.5, 0.75, 1.25, 1.5, 1.75]
To run the code yourself,
- Run
Generate_Data.ipynb
to generate a file calledaugmented.zip
. The file contains music samples with data augmentation applied and corresponding conditioned outputs generated by MusicGen. - Evaluate the quality of the generated samples with the augmented inputs using
MFCC.ipynb
andRMSE.ipynb
.
Distributed under the MIT License. See LICENSE.txt
for more information.