low_melody hyperparameter as boolean

very minor improvement
alanngnet · Jun 3, 2024 · 93f4bd7 · 93f4bd7
1 parent 23f0477
commit 93f4bd7
Show file tree

Hide file tree

Showing 4 changed files with 3 additions and 5 deletions.
diff --git a/README.md b/README.md
@@ -222,7 +222,7 @@ The hparams.yaml file located in the "config" subfolder of the path you provide
 | m_per_class | From CoverHunter code comments: "m_per_class must divide batch_size without any remainder" and: "At every iteration, this will return m samples per class. For example, if dataloader's batch-size is 100, and m = 5, then 20 classes with 5 samples iter will be returned." |
 | spec_augmentation | spectral(?) augmentation settings, used to generate temporary data augmentation on the fly during training.  CoverHunter settings were:<br>`random_erase`:<br> &nbsp; `prob`: 0.5<br> &nbsp; `erase_num`: 4<br>`roll_pitch`:<br> &nbsp; `prob`: 0.5<br> &nbsp; `shift_num`: 12 |
 | spec_augmentation : random_erase | During each epoch, each CQT array may have a rectangular block of its array values replaced with the value -80 (a low amplitude signal). The size of the block is defined as 25% of the height of the frequency bins and 10% of the width of the time bins. `prob` specifies the probability of calling the erase method for this feature in this epoch, between 0 and 1. `erase_num` specifies the quantity of such blocks that will be erased if the erase method is called. |
-| spec_augmentation : roll_pitch | During each epoch, each CQT array may be shifted pitch-wise. CoverHunter's original method, left as the default here, was to rotate the entire array in the frequency dimension, with the overflowing content wrapped around to the opposite end of the spectrum. For example, if shifted an octave up, then the top octave's CQT content would be presented as the bottom octave of content. `prob` specifies the probability of doing this for this feature in this epoch, between 0 and 1. `shift_num` specifies the number of frequency CQT bins by which the array will be shifted. `low_melody` is an optional hyperparameter and feature added for CoverHunterMPS to accommodate musical cultures in which CSI-significant melodic content may appear in the bottom frequency range of the CQT array. Since trimming CQT arrays to eliminate irrelevant harmonic and percussive content in the bottom octaves has proven beneficial, this feature can be significantly useful. In this case, instead of rotating the entire array either up or down, the array is shifted upwards either 1 x or 2 x `shift_num` bins, and overflowing high-frequency content is simply discarded, instead of being copied to the bottom rows of the array. |
+| spec_augmentation : roll_pitch | During each epoch, each CQT array may be shifted pitch-wise. CoverHunter's original method, left as the default here, was to rotate the entire array in the frequency dimension, with the overflowing content wrapped around to the opposite end of the spectrum. For example, if shifted an octave up, then the top octave's CQT content would be presented as the bottom octave of content. `prob` specifies the probability of doing this for this feature in this epoch, between 0 and 1. `shift_num` specifies the number of frequency CQT bins by which the array will be shifted. `low_melody` takes either `true` or `false` (default even if omitted entirely is `false`, and is an optional hyperparameter and feature added for CoverHunterMPS to accommodate musical cultures in which CSI-significant melodic content may appear in the bottom frequency range of the CQT array. Since trimming CQT arrays to eliminate irrelevant harmonic and percussive content in the bottom octaves has proven beneficial, this feature can be significantly useful. In this case, instead of rotating the entire array either up or down, the array is shifted upwards either 1 x or 2 x `shift_num` bins, and overflowing high-frequency content is simply discarded, instead of being copied to the bottom rows of the array. |
 
 #### Training Parameters
 | key | value |

diff --git a/tools/train.py b/tools/train.py
@@ -3,8 +3,6 @@
 import argparse
 import os
 import sys
-import time
-
 import torch
 
 from src.trainer import Trainer

diff --git a/training/covers80/config/hparams.yaml b/training/covers80/config/hparams.yaml
@@ -55,7 +55,7 @@ spec_augmentation:
   roll_pitch:
     prob: 0.5
     shift_num: 12
-    low_melody: "false"
+    low_melody: false
 
 ### Training parameters
 device: 'mps' # 'mps' or 'cuda' 

diff --git a/training/covers80/config/hparams_prod.yaml b/training/covers80/config/hparams_prod.yaml
@@ -54,7 +54,7 @@ spec_augmentation:
   roll_pitch:
     prob: 0.5
     shift_num: 12
-    low_melody: "false"
+    low_melody: false
 
 ### Training parameters
 device: 'mps' # 'mps' or 'cuda'