plankton_classif_benchmark

Benchmark for plankton images classifications methods for images from multiple plankton imaging devices (ISIIS, Zooscan, Flowcam, etc.)

This tool allows you to run a comparison between a Convolutional Neural Network and a Random Forest classifier on a dataset of plankton images.

**Superseeded by https://github.com/emmaamblard/plankton_classif/tree/main. **

Data

Instruments

The comparison is to be done on data from multiple plankton imaging devices:

ISIIS (In Situ Ichthyoplankton Imaging System)
zooscan
flowcam
IFCB (Imaging FlowCytobot)
UVP (Underwater Vision Profiler)

Input data

Store your input data in data/<instrument_name>. Your data must contain an images folder with your images, as well as a csv file named <instrument_name>_data.csv with one row per object. This csv file should contain the following columns:

path_to_img: path to image
classif_id: object classification
living: whether the classification in classif_id is living or not (boolean)
features_1 to features_n: object features for random forest fit (choices for names of these columns are up to you)

It is strongly recommended that each class contain at least 100 images.

Data will be split into training, validation and testing sets.

Classification models

Convolutional Neural Network

A convolutional neural network takes an image as input and predicts a class for this image.

The CNN backbone is a MobileNetV2 feature extractor (https://tfhub.dev/google/imagenet/mobilenet_v2_140_224/feature_vector/4) with depth multiplier of 1.4. A classification head with the number of classes to predict is added on top of the backbone. Intermediate fully connected layers with customizable dropout rate can be inserted between both.

Input images are expected to have color values in the range [0,1] and a size of 224 x 224 pixels. If need be, images are automatically resized by the CNN DataGenerator.

Random Forest

A random forest takes a vector of features as input and predicts a class from these values.

Settings

Settings can be customized in the settings.yaml file. Reproductible results can be obtained using the random_state argument (random_state = 12 for paper results)

Training

Training is done in two phases:

model is optimized by training on the training set and evaluating on the validation set
optimized model is trained on the training set and evaluated on the test set never used before

CNN training

For each step (i.e. epoch) in the training of the CNN model, the model is trained on training data and evaluated on validation data. It is recommended to train for a large number of epochs and later decide where to stop based on the evolution of accuracy and loss for validation data. This process, called early stopping, is implemented in this tool: for each epoch, weights are saved if and only if the results of this epoch are better than previous one. Last saved weights are then used to test the model on the test data.

RF training

Random Forest parameters are optimized with a gridsearch including:

number of trees
number of features to use to compute each split (default for classification is sqrt(n_features))
minimum number of samples required to be at a leaf node (default for classification is 5)

For each set of parameters, model is trained on training data and evaluated on validation data. Finally, the best model is trained on training data and tested on test data.

Outputs

When you run train_cnn.py or train_rf.py, an output directory is created and results are stored in this directory. Results for each model and dataset can be explored with the notebooks inspect_cnn_results.ipynband inspect_rf_results.ipynb. Comparison of results across models and datasets is implemented in the notebook comparison.ipynb.

Name		Name	Last commit message	Last commit date
Latest commit History 169 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
comparison.ipynb		comparison.ipynb
datasets.py		datasets.py
inspect_cnn_results.ipynb		inspect_cnn_results.ipynb
inspect_rf_results.ipynb		inspect_rf_results.ipynb
models.py		models.py
orig_cnn.py		orig_cnn.py
read_settings.py		read_settings.py
settings.yaml		settings.yaml
train_cnn.py		train_cnn.py
train_mixed.py		train_mixed.py
train_rf.py		train_rf.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

plankton_classif_benchmark

Data

Instruments

Input data

Classification models

Convolutional Neural Network

Random Forest

Settings

Training

CNN training

RF training

Outputs

About

Releases

Packages

Contributors 2

Languages

License

ThelmaPana/plankton_classif_benchmark

Folders and files

Latest commit

History

Repository files navigation

plankton_classif_benchmark

Data

Instruments

Input data

Classification models

Convolutional Neural Network

Random Forest

Settings

Training

CNN training

RF training

Outputs

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages