Automatic Music Transcription (AMT) refers to the task of transcribing a given audio into symbolic representations (musical notes or MIDI). In this project, the goal is to transcribe musical recordings into music note events with pitch, onset, offset, and velocity. It is a challenging task due to the high polyphony of music pieces and requires appropriate data processing for audio files. We have implemented and evaluated Deep Learning models for music transcription. The architectural design of models and data processing techniques are based on this paper.
- The dataset used is MAPS, which can be downloaded from here. After downloading it, store it in
data/MAPS
- Install the required python packages by
pip install -r requirements.txt
- Loading the dataset, splitting it and storing in .h5 binaries -
python3 features.py --dir data/MAPS --workspace $(pwd)
- Training the model (includes both processing features and training)
We have implemented 3 models, choose the
python3 src/main.py train --model_type='CRNN_Conditioning' --loss_type='regress_onset_offset_frame_velocity_bce' --batch_size=8 --max_note_shift=0 --learning_rate=5e-4 --reduce_iteration=10000 --resume_iteration=0 --early_stop=50000 --workspace=$(pwd) --cuda
model_type
among ['CRNN', 'CCNN', 'CRNN_Conditioning']. Also, there are 2 loss functions available (regressed and non-regressed). Refer to the comments inrun.sh
for more info. The trained model will be stored at checkpoints incheckpoints
folder with training stats instatistics
folder - Infering the output probabilities on Test dataset and storing them in
probs
folderpython3 src/results.py infer_prob --model_type='CRNN_Conditioning' --checkpoint_path=$CHECKPOINT_PATH --dataset='maps' --split='test' --post_processor_type='regression' --workspace=$WORKSPACE --cuda
- Evaluating the Test dataset
python3 src/results.py calculate_metrics --model_type='CRNN_Conditioning' --dataset='maps' --split='test' --post_processor_type='regression' --workspace=$WORKSPACE
Also, there are some result plots present in notebooks/plots.ipynb
and piano roll with MIDI notes of a transcripted audio present in transcription_plots.ipynb
python3 src/transcribe_and_play.py --audio_file <name of audio file>
It will transcribe the given audio using the best checkpoint model into MIDI, generate the MIDI file and also generate a video using synthviz library corresponding to the MIDI, displaying the notes played. Note that transcription requires ffmpeg
backend and therefore does not work on gpu1.cse.iitb.ac.in, unless you install it with sudo permissions
-
Fur elise. The original music is this
fur_elise_transcripted.mp4
-
L theme (Death Note). The Original music is this
L_original_transcripted.mp4
-
Nezuko Theme (Demon Slayer). The original music is this
nezuko_transcripted.mp4
-
A musical piece from Aajkal tere mere pyar ke charche in Accordion. The original audio is this
aajkal_transcripted.mp4
-
Nagin. Notice that there is a lot of noise due to multiple instruments being played together (polyphonic music)
Nagin_transcripted.mp4
- Qiuqiang Kong, Bochen Li, Xuchen Song, Yuan Wan, and Yuxuan Wang. ”High-resolution Piano Transcription with Pedals by Regressing Onsets and Offsets Times.” arXiv preprint arXiv:2010.01815 (2020).
- bytedance and kong's repositories for data processing technique's and model architecture
- Valentin Emiya, Nancy Bertin, Bertrand David, Roland Badeau. MAPS - A piano database for multipitch estimation and automatic transcription of music
- This repository for information about datasets and understanding transcription pipeline