Preprocessed versions of raw datasets has to be generated before any neural network training:
deepo datagen -D mapillary -s 224 -P ./any-data-path -t 18000 -v 2000 -T 5000
This command will generates a set of 224 * 224 images based on Mapillary
dataset. The raw dataset must be in ./any-data-path/input
, and the
preprocessed dataset will be stored in ./any-data-path/preprocessed/224
.
Additionally, the preprocessed dataset may contain less images than the raw
dataset: the -t
, -v
and -T
arguments refer respectively to training,
validation and testing image quantities. The amount indicated as an example
correspond to raw dataset size.
For AerialImage dataset, a limited set of image sizes are supported. As smaller tiles will be generated by cutting the big original image, a divisor of 5000 is expected.
In the shape datase case, this preprocessing step generates a bunch of images from scratch.
As an easter-egg feature, label popularity is also printed by this command (proportion of images where each label appears in the preprocessed dataset).
See more details with deepo datagen -h
.