Skip to content

Commit

Permalink
low_melody hyperparameter as boolean
Browse files Browse the repository at this point in the history
very minor improvement
  • Loading branch information
alanngnet committed Jun 3, 2024
1 parent 23f0477 commit 93f4bd7
Show file tree
Hide file tree
Showing 4 changed files with 3 additions and 5 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@ The hparams.yaml file located in the "config" subfolder of the path you provide
| m_per_class | From CoverHunter code comments: "m_per_class must divide batch_size without any remainder" and: "At every iteration, this will return m samples per class. For example, if dataloader's batch-size is 100, and m = 5, then 20 classes with 5 samples iter will be returned." |
| spec_augmentation | spectral(?) augmentation settings, used to generate temporary data augmentation on the fly during training. CoverHunter settings were:<br>`random_erase`:<br> &nbsp; `prob`: 0.5<br> &nbsp; `erase_num`: 4<br>`roll_pitch`:<br> &nbsp; `prob`: 0.5<br> &nbsp; `shift_num`: 12 |
| spec_augmentation : random_erase | During each epoch, each CQT array may have a rectangular block of its array values replaced with the value -80 (a low amplitude signal). The size of the block is defined as 25% of the height of the frequency bins and 10% of the width of the time bins. `prob` specifies the probability of calling the erase method for this feature in this epoch, between 0 and 1. `erase_num` specifies the quantity of such blocks that will be erased if the erase method is called. |
| spec_augmentation : roll_pitch | During each epoch, each CQT array may be shifted pitch-wise. CoverHunter's original method, left as the default here, was to rotate the entire array in the frequency dimension, with the overflowing content wrapped around to the opposite end of the spectrum. For example, if shifted an octave up, then the top octave's CQT content would be presented as the bottom octave of content. `prob` specifies the probability of doing this for this feature in this epoch, between 0 and 1. `shift_num` specifies the number of frequency CQT bins by which the array will be shifted. `low_melody` is an optional hyperparameter and feature added for CoverHunterMPS to accommodate musical cultures in which CSI-significant melodic content may appear in the bottom frequency range of the CQT array. Since trimming CQT arrays to eliminate irrelevant harmonic and percussive content in the bottom octaves has proven beneficial, this feature can be significantly useful. In this case, instead of rotating the entire array either up or down, the array is shifted upwards either 1 x or 2 x `shift_num` bins, and overflowing high-frequency content is simply discarded, instead of being copied to the bottom rows of the array. |
| spec_augmentation : roll_pitch | During each epoch, each CQT array may be shifted pitch-wise. CoverHunter's original method, left as the default here, was to rotate the entire array in the frequency dimension, with the overflowing content wrapped around to the opposite end of the spectrum. For example, if shifted an octave up, then the top octave's CQT content would be presented as the bottom octave of content. `prob` specifies the probability of doing this for this feature in this epoch, between 0 and 1. `shift_num` specifies the number of frequency CQT bins by which the array will be shifted. `low_melody` takes either `true` or `false` (default even if omitted entirely is `false`, and is an optional hyperparameter and feature added for CoverHunterMPS to accommodate musical cultures in which CSI-significant melodic content may appear in the bottom frequency range of the CQT array. Since trimming CQT arrays to eliminate irrelevant harmonic and percussive content in the bottom octaves has proven beneficial, this feature can be significantly useful. In this case, instead of rotating the entire array either up or down, the array is shifted upwards either 1 x or 2 x `shift_num` bins, and overflowing high-frequency content is simply discarded, instead of being copied to the bottom rows of the array. |

#### Training Parameters
| key | value |
Expand Down
2 changes: 0 additions & 2 deletions tools/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,6 @@
import argparse
import os
import sys
import time

import torch

from src.trainer import Trainer
Expand Down
2 changes: 1 addition & 1 deletion training/covers80/config/hparams.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ spec_augmentation:
roll_pitch:
prob: 0.5
shift_num: 12
low_melody: "false"
low_melody: false

### Training parameters
device: 'mps' # 'mps' or 'cuda'
Expand Down
2 changes: 1 addition & 1 deletion training/covers80/config/hparams_prod.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ spec_augmentation:
roll_pitch:
prob: 0.5
shift_num: 12
low_melody: "false"
low_melody: false

### Training parameters
device: 'mps' # 'mps' or 'cuda'
Expand Down

0 comments on commit 93f4bd7

Please sign in to comment.