Skip to content

Latest commit

 

History

History
29 lines (21 loc) · 1.11 KB

preprocessing.md

File metadata and controls

29 lines (21 loc) · 1.11 KB

Data preprocessing

Preprocessed versions of raw datasets has to be generated before any neural network training:

deepo datagen -D mapillary -s 224 -P ./any-data-path -t 18000 -v 2000 -T 5000

This command will generates a set of 224 * 224 images based on Mapillary dataset. The raw dataset must be in ./any-data-path/input, and the preprocessed dataset will be stored in ./any-data-path/preprocessed/224.

Additionally, the preprocessed dataset may contain less images than the raw dataset: the -t, -v and -T arguments refer respectively to training, validation and testing image quantities. The amount indicated as an example correspond to raw dataset size.

For AerialImage dataset, a limited set of image sizes are supported. As smaller tiles will be generated by cutting the big original image, a divisor of 5000 is expected.

In the shape datase case, this preprocessing step generates a bunch of images from scratch.

As an easter-egg feature, label popularity is also printed by this command (proportion of images where each label appears in the preprocessed dataset).

See more details with deepo datagen -h.