Competition requires you to handle raw images with relatively big size(4GB in total). Convolutional Neural Network is a mainstream technique widely used by participants of this competition, Kaggle admin also allowed participants to use pre-trained model to kick-start their performance if and only if a license term is not violated. Hence, various ImageNet pre-trained convolutional neural networks have been used (VGG-16, VGG-19, GoogleNet, Inception, ResNet and Darknets), with VGG and ResNet being the most popular on the forum.
I approached this competition using a combination of VGG and ResNet implemented in two deep learning platforms - Keras and Caffe. Evaluation metric was multi-class logarithmic loss. My submission scored around 0.25 in Public Leaderboard which ranks around Top 9%.
Period: April 5 2016 ~ August 1 2016 (118 total days)
- VGG-16,-19 Model from Very Deep Convolutional Networks for Large-Scale Image Recognition
- GoogleNet Model from Rethinking the Inception Architecture for Computer Vision
- ResNet-50,-101,-152 Model from Deep Residual Learning for Image Recognition
- VGG-16 ImageNet pre-trained weights (Keras)
- VGG-19 ImageNet pre-trained weights (Keras)
- ResNet ImageNet pre-trained weights (Caffe)
- ResNet ImageNet pre-trained weights (Tensorflow)
- Visualization of CNN inspired by VGG-CAM
- Occlusion: Zero out random block of 150x100 to simulate occlusion
- Translation: Random translation upto 0.2*Min(width, height)
- Scale: Scale change of +/- 0.2
- Mean subtraction: Following ImageNet procedure
- Resize: Original image(640x480) to (224x224)
- VGG-16, VGG-19
- ResNet-50
- Optimizers: Stochastic Gradient Descent, RMSprop, Adadelta, Adagrad
- Learning Rate: [1e-1, 1e-2, 1e-3, 1e-4, 1e-5]
- Epochs: [5, 10, 20] Epochs
- Cross Validation: 7-Fold Cross Validation split by driver
- Image Preprocessing was also part of hyper parameter
- Inhouse Validation loss correlated well with Public Leaderboard
- Visualization inspired by VGG-CAM model in Keras
- Result: 188th / 1440 (Top 14%)
- Resnet-152, the best CNN model who won ImageNet2015, is too big for this model (overfits)
- VGG-16, VGG-19, GoogleNet, ResNet-50 trained from scratch did not converge (frankly, I could not find learning policy to converge)
- Visualizing CNN is cumbersome, but definitely fun!
- Keras is good starting point for CNN, especially validating ideas and brain-storming
- Caffe is fast, efficient and abundant in resources(pre-trained weights, CNN related papers with implementations in Caffe)
- Intended same setup in Keras, Caffe, Tensorflow, but results differ -> need to dig deeper into implementation of each platforms
- After all, I have the same tools/model architecture as top ranking kagglers. I need to improve on Machine Learning part (cross-validation, generalization, quick and smart iteration)
- Competition makes me learn
- Strong Ensemble Technique(Lowering Generalization Error) with Weak Single Models Outperform Weak Ensemble Techniques with Strong Single Models.
-
1st by jacobkie Link
- Pre-trained VGG16, modified VGG16_3 (Single Model got LB score around 0.3)
- VGG16_3 trained with two selected regions of interests(head and radio area) together with original image
- K-Nearest Neighbor Average: Uses last Maxpool layer(pool5) of VGG16 to map test image to 51277 coordinate, use distances in this space to define similarity, weighted average of predictions together with 10-NN improves single model score by 0.10~0.12
- Ensemble average for each category separately. Models with top 10% cross-entropy loss associated with category are chosen. Outperforms simple arithmetic/geometric average.
- Segment Average: Divide test images into group using pool5-feature space, if one group displayes consistency and confidence, renormalize all the images in that group to share predictions.
-
3rd by BRAZIL_POWER (0.08877 > 0.09058) Link
- Ensemble of 4 models - ResNet152, VGG16
- Use synthetic test image = image + nearest neighbor images
-
5th by DZS Team (0.10252 > 0.12144) Link
- Synthetic Train Images = Half + Half of train image. 5 Million synthetic image to train GoogleNet
-
10th by toshi-k (0.14354 > 0.14911) Link
- 20 Models for Ensembling
- CNN to detect driver body pixel (Semantic Segmentation)
- Crop driver region(by bounding box) and use another classifier on this region