Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
explanation of spec_aug's region_size parameter
  • Loading branch information
alanngnet committed Jun 4, 2024
1 parent 6dc9c87 commit c6f10f0
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -221,7 +221,7 @@ The hparams.yaml file located in the "config" subfolder of the path you provide
| mode | "random" (default) or "defined". Changes behavior of AudioFeatDataset related to how it cuts each audio sample into chunks. "random" is described in CoverHunter code as "cut chunk from feat from random start". "defined" is described as "cut feat with 'start/chunk_len' info from line." We observed better training results using "defined" when working with datasets that are very consistently trimmed so that CSI-relevant audio always starts right at the beginning of the recording. "random" would be better when CSI-irrelevant audio may be present at the start of many of your audio data samples. |
| m_per_class | From CoverHunter code comments: "m_per_class must divide batch_size without any remainder" and: "At every iteration, this will return m samples per class. For example, if dataloader's batch-size is 100, and m = 5, then 20 classes with 5 samples iter will be returned." |
| spec_augmentation | spectral(?) augmentation settings, used to generate temporary data augmentation on the fly during training. CoverHunter settings were:<br>`random_erase`:<br> &nbsp; `prob`: 0.5<br> &nbsp; `erase_num`: 4<br>`roll_pitch`:<br> &nbsp; `prob`: 0.5<br> &nbsp; `shift_num`: 12 |
| spec_augmentation : random_erase | During each epoch, each CQT array may have a rectangular block of its array values replaced with the value -80 (a low amplitude signal). The size of the block is defined as 25% of the height of the frequency bins and 10% of the width of the time bins. `prob` specifies the probability of calling the erase method for this feature in this epoch, between 0 and 1. `erase_num` specifies the quantity of such blocks that will be erased if the erase method is called. |
| spec_augmentation : random_erase | During each epoch, each CQT array may have a rectangular block of its array values replaced with the value -80 (a low amplitude signal). The size of the block is defined as 25% of the height of the frequency bins and 10% of the width of the time bins. `prob` specifies the probability of calling the erase method for this feature in this epoch, between 0 and 1. `erase_num` specifies the quantity of such blocks that will be erased if the erase method is called. `region_size` specifies the size of each erased block, as (width, height) as fractions of the CQT array size. Default is "[.25, .1]"|
| spec_augmentation : roll_pitch | During each epoch, each CQT array may be shifted pitch-wise. CoverHunter's original method, left as the default here, was to rotate the entire array in the frequency dimension, with the overflowing content wrapped around to the opposite end of the spectrum. For example, if shifted an octave up, then the top octave's CQT content would be presented as the bottom octave of content. `prob` specifies the probability of doing this for this feature in this epoch, between 0 and 1. `shift_num` specifies the number of frequency CQT bins by which the array will be shifted. `low_melody` takes either `true` or `false` (default even if omitted entirely is `false`, and is an optional hyperparameter and feature added for CoverHunterMPS to accommodate musical cultures in which CSI-significant melodic content may appear in the bottom frequency range of the CQT array. Since trimming CQT arrays to eliminate irrelevant harmonic and percussive content in the bottom octaves has proven beneficial, this feature can be significantly useful. In this case, instead of rotating the entire array either up or down, the array is shifted upwards either 1 x or 2 x `shift_num` bins, and overflowing high-frequency content is simply discarded, instead of being copied to the bottom rows of the array. |

#### Training Parameters
Expand Down

0 comments on commit c6f10f0

Please sign in to comment.