"As the second-largest provider of carbohydrates in Africa, cassava is a key food security crop grown by smallholder farmers because it can withstand harsh conditions. At least 80% of household farms in Sub-Saharan Africa grow this starchy root, but viral diseases are major sources of poor yields. With the help of data science, it may be possible to identify common diseases so they can be treated.
Existing methods of disease detection require farmers to solicit the help of government-funded agricultural experts to visually inspect and diagnose the plants. This suffers from being labor-intensive, low-supply and costly. As an added challenge, effective solutions for farmers must perform well under significant constraints, since African farmers may only have access to mobile-quality cameras with low-bandwidth.
In this competition, we introduce a dataset of 21,367 labeled images collected during a regular survey in Uganda. Most images were crowdsourced from farmers taking photos of their gardens, and annotated by experts at the National Crops Resources Research Institute (NaCRRI) in collaboration with the AI lab at Makerere University, Kampala. This is in a format that most realistically represents what farmers would need to diagnose in real life.
Your task is to classify each cassava image into four disease categories or a fifth category indicating a healthy leaf. With your help, farmers may be able to quickly identify diseased plants, potentially saving their crops before they inflict irreparable damage."
Quoted from contest decription.
Train set: ~26,000 images (21367 images of the 2020 contest was merged with 500 images from the 2019 contest).
Test set: ~15,000 images.
Public test: 31% of the test set.
Private test: 69% of the test set.
Classes mapping
Class | Numeric label |
---|---|
Cassava Bacterial Blight (CBB) | 0 |
Cassava Brown Streak Disease (CBSD) | 1 |
Cassava Green Mottle (CGM) | 2 |
Cassava Mosaic Disease (CMD) | 3 |
Healthy | 4 |
Credit: https://www.kaggle.com/foolofatook/starter-eda-cassava-leaf-disease
- Code baseline and trainer on GPU + TPU
- Transforms: albumentations
- Implement models: EfficientNet, ViT, Resnext
- Implement losses: Focal loss, CrossEntropy loss, Bi-Tempered Loss
- Implement optimizers: SAM
- Implement schedulers: StepLR, WarmupCosineSchedule
- Implement metrics: accuracy
- Write inference notebook
- Implement Stratified K-folding
- Merge 2019 dataset and 2020 dataset
- Implement gradient_accumulation
- Implement Automatic Mixed Precision
- Write Optuna scripts for hyperparams tuning
- Evaluate classes distribution of public leaderboard test.
The final model is a stacking of three CNN-based models, EfficientNet, ResneXt and Densenet.
The dataset is in fact noisy (contains irrelevant images of the tree roots, or distorted images) and clearly imbalanced.
We tackled the first problem by splitting the training set into 5 equal-size folds, while each fold has the same classes distribution as the original set (this splitting scheme is called Stratified K-folding). This way of training gives every image in the training set a chance to contribute to the final predictions. We tried to adapt some robust loss functions like Bi-Tempered Loss and Taylor CE Loss but none of them showed significant improvements over the CV score. Later on we merged the training set of the last year competition to the current set, and thus improved the CV and public score. For the second problem of imbalanced dataset, Focal Loss gave a stable performance.
Most models we used are from the Timm library.
-
resnext50_32x4d - training config
-
efficientnet-b4 - training config
-
densenet121 - training config
Result
Model | fold0 | fold8 | fold2 | fold3 | fold4 | CV | Public | Private |
---|---|---|---|---|---|---|---|---|
densenet121 | 0.884346 | 0.881308 | 0.878710 | 0.873802 | 0.888993 | 0.881431 | 0.889 | 0.887 |
effnetB4 | 0.889018 | 0.889252 | 0.881046 | 0.876840 | 0.888525 | 0.884936 | 0.896 | 0.894 |
resnext50_32x4 | 0.884813 | 0.880140 | 0.881748 | 0.878008 | 0.892498 | 0.883441 | 0.895 | 0.891 |
Training notebooks
- GPU version:
- TPU version:
- Optuna facilitated the process of hyperparameters tuning:
- Cross-entropy Loss.
- Focal Loss.
- Accuracy - We save model checkpoints whenever the validation accuracy is improved.
Another Kaggle trick is to use Test-time Augmentation (TTA). The way this work is similar to data augmentation at training time, but now applying augmentations at testing time instead! We ran TTA 3 times on the test images, and took the average of the softmax vectors in the end.
Model stacking is a powerful ensemble method in which the final predictions is generated by a level 2 model that optimally combines the predictions of level 1 models (Efficientnet, ResNeXt, Densenet). It gives a significant boost to the overall accuracy/score. Nowadays, most Kaggle winners' solutions is very likely to involve the usage of stacking. For this particular competition, participants observed that using a simple level 2 model stacking yields the best result, whereas fancy ML models like CatBoost failed miserably.
Our stacked model was just a aggregation of weighted predictions of the level 1 models. We searched for those set of weights (using Optuna) by optimizing for the local CV score.
Optimal weights
fold | accuracy | effnet_weight | resnext_weight | densenet_weight |
---|---|---|---|---|
0 | 0.896962 | 0.432610 | 0.284500 | 0.282888 |
1 | 0.895794 | 0.497473 | 0.277661 | 0.224865 |
2 | 0.892030 | 0.541973 | 0.395530 | 0.062496 |
3 | 0.889460 | 0.316127 | 0.418937 | 0.264934 |
4 | 0.899742 | 0.391536 | 0.344358 | 0.264104 |
CV score: 0.894
Public score: 0.897
Private score: 0.898
Our score is in the top 8% (Bronze Medal) among 3900 teams.
- Vision Transformer models like ViT and Deit. The CV and public score of these two were not high (~0.83-ish) in this competition.
- Taylor Loss and Bi-Tempered Loss didn't improve the score, but they worked well for other teams.
- SAM optimizer didn't work due to implementation bugs.
- We could have produced a bigger training set using Generative models (GAN).
- Unsupervised Data Augmentation seems cool but we didn't have time to try.