-
Notifications
You must be signed in to change notification settings - Fork 5
Symbols Contexts
Starting with the annotated score images, we extract all pixels of the so-called full-context sub-image for each symbol.
The size of all sub-images must be the same, for all symbols and for all images, so that the classifier can be trained efficiently. This implies that, prior to pixel extraction, the page image at hand must be scaled to the predefined interline value, and the symbols bounds as well.
The key question is: which value should be chosen for predefined interline? A too low value would hide too many details to the classifier, whereas a too high value would result in an enormous network, perhaps un-trainable.
For Salzburg Hack-Day, we had chosen rather arbitrarily an interline of 10 pixels and a context window of (w:48, h:96) pixels = (w:4.8, h:9.6) interlines = (w:1.2, h:2.4) staff heights. These context pixel dimensions must be integer numbers multiple of 4, to accommodate the two sub-sampling layers used by the current convolutional neural network.
NOTA: The predefined interline value is thus a key parameter than can be adjusted as will, but it is important to make sure that all images in the OMR Dataset have a sufficiently large interline value.
To fully train a classifier, the representative training set of symbols should contain both valid
symbols and non-valid symbols. The latter ones are named None
shape symbols.
By construction, MuseScore provides only valid symbols, so we need to generate "artificial" None
symbols.
The current algorithm works as follows: We first use the valid symbols in a page to compute a population of "occupied" rectangles. Then, in the remaining areas, we try at random to insert artificial rectangles of a predefined size. Each rectangle successfully inserted gives birth to an artificial None-shaped symbol, whose bounds are reduced to just a point.
Typically, we try to insert as many None symbols as there are valid symbols in the page at hand.
Note that this insertion algorithm requires that all fixed-shape valid symbols are described in the Annotations XML file.
We can generate control images, with the (scaled) page image as background and with overlays composed of the bounds of all valid fixed-shape symbols together with the locations of all generated None symbols.
Visual checking shows that the None symbols are much more frequent in the open areas than in staff areas, which can easily be explained by the current algorithm: in a staff area, chances to collide with an existing symbol are higher than in "open" areas. So, we need to refine this algorithm to have None symbols that are more "representative"...
To check the features material submitted to the classifier, we wrote a simple program (SubImages
)
that read this .csv
file to generate the corresponding properly scaled symbol sub-images.
These images are not needed per se for training the classifier (it runs on the .csv
file),
but they are useful for a visual inspection of the full-context sub-images.