title

Summary

Museo ToolBox is a Python library dedicated to the processing of georeferenced arrays, also known as rasters or images in remote sensing.

In this domain, classifying land cover type is a common and sometimes complex task, regardless of your level of expertise. Recurring procedures such as extracting Regions Of Interest (ROIs, or raster values from a polygon), computing spectral indices or validating a model with a cross-validation can be difficult to implement.

Museo ToolBox aims at simplifying the whole process by making the main treatments more accessible (extracting of ROIs, fitting a model with cross-validation, computing Normalized Difference Vegetation Index (NDVI) or various spectral indices, performing any kind of array function to the raster, etc).

The main objective of this library is to facilitate the transposition of array-like functions into an image and to promote good practices in machine learning.

To make Museo ToolBox easier to get started with, a full documentation with lot of examples is available online on read the docs.

Museo ToolBox in details

Museo ToolBox is organized into several modules (Figure 1):

processing: raster and vector processing.
cross-validation: stratified cross-validation compatible with scikit-learn.
ai: artificial intelligence module built upon scikit-learn [@scikitlearn_2011].
charts: plot confusion matrix with F1 score or producer/user's accuracy.
stats: compute statistics (such as Moran's Index [@moran_notes_1950], confusion matrix, commision/omission) or extracting truth and predicted label from a confusion matrix.

The main usages of Museo ToolBox are:

Reading and writing a raster block per block using your own function.
Generating cross-validation, including spatial cross-validation.
Fitting models with scikit-learn, extracting accuracy from each cross-validation fold, and predicting raster.
Plotting confusion matrix and adding f1 score or producer/user accuracy.
Getting the y_true and and y_predicted labels from a confusion matrix.

RasterMath

Available in museotoolbox.processing, the RasterMath class is the keystone of Museo ToolBox.

The question I asked myself is: How can we make it as easy as possible to implement array-like functions on images? The idea behind RasterMath is that if the function is intended to operate with an array, it should be easy to use it with your raster using as few lines as possible.

So, what does RasterMath really do? The user only works with an array and confirms with a sample that the process is doing well, and lets RasterMath generalize it to the whole image. The user doesn't need to manage the raster reading and writing process, the no-data management, the compression, the number of bands, or the projection. Figure 2 describes how RasterMath reads a raster, performs the function, and writes it to a new raster.

The objective is to allow the user to focus solely on the array-compatible function while RasterMath manages the raster part.

See RasterMath documentation and examples.

Artificial Intelligence

The artificial intelligence (ai) module is natively built to implement scikit-learn algorithms and uses state of the art methods (such as standardizing the input data). SuperLearner class optimizes the fit process using a grid search to fix the parameters of the classifier. There is also a Sequential Feature Selection protocol which supports a number of components (e.g. a single-date image is composed of four bands, i.e. four features, so a user may select four features at once).

See the SuperLearner documentation and examples.

Cross-validation

Museo ToolBox implements stratified cross-validation, which means the separation between the training and the validation samples is made by respecting the size per class. For example the Leave-One-Out method will keep one sample of validation per class. As stated by @olofsson_good_2014 "stratified random sampling is a practical design that satisfies the basic accuracy assessment objectives and most of the desirable design criteria". For spatial cross-validation, see @karasiak_2019 inspired by @roberts_2017.

Museo ToolBox offers two different kinds of cross-validation:

Non-spatial cross-validation

Leave-One-Out.
Leave-One-SubGroup-Out.
Leave-P-SubGroup-Out (Percentage of subgroup per class).
Random Stratified K-Fold.

Spatial cross-validation

Spatial Leave-One-Out [@karasiak_2019].
Spatial Leave-Aside-Out.
Spatial Leave-One-SubGroup-Out (using centroids to select one subgroup and remove other subgroups for the same class inside a specified distance buffer).

See the cross-validation documentation and examples.

Acknowledgements

I acknowledge contributions from Mathieu Fauvel, beta-testers (hey Yousra Hamrouni), and my thesis advisors: Jean-François Dejoux, Claude Monteil and David Sheeren. Many thanks to Marie for proofreading. Many thanks to Sigma students: Hélène Ternisien de Boiville, Arthur Duflos, Sam Antonetti and Anne-Sophie Tronc for their involvement in RasterMath improvements in early 2020.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

paper.md

paper.md

Summary

Museo ToolBox in details

RasterMath

Artificial Intelligence

Cross-validation

Non-spatial cross-validation

Spatial cross-validation

Acknowledgements

References

Files

paper.md

Latest commit

History

paper.md

File metadata and controls

Summary

Museo ToolBox in details

RasterMath

Artificial Intelligence

Cross-validation

Non-spatial cross-validation

Spatial cross-validation

Acknowledgements

References