Three models were experimented:
- TruckNN: A CNN model adapted and modified from NVIDIA's 2016 paper End to End Learning for Self-Driving Cars. The original model was augmented with batch normalization layers and dropout layers.
- TruckResnet50: A CNN transfer learning model utilizing feature maps extracted by ResNet50, connected to additional fully-connected layers. This model was adapated and modified from Du et al.'s 2019 paper Self-Driving Car Steering Angle Prediction Based on Image Recognition. The first 141 layers of the ResNet50 layers (instead of the first 45 layers as in the original paper) were frozen from updating. Dimensions of the fully-connected layers were also modified.
- TruckRNN: A Conv3D-LSTM model, also based on and modified from Du et al.'s 2019 paper mentioned above, was also experimented. The model consumes a sequence of 15 consecutive frames as input, and predicts the steering angle at the last frame. Comparing to the original model, maxpooling layers were omitted and batch normalization layers were introduced. 5 convolutional layers were implemented with the last convolutional layer connected with residual output, followed by two LSTM layers, which is rather different to the model architecture proposed in the paper.
TruckNN | TruckResnet50 | TruckRNN |
---|---|---|
Figures are authored in and extracted from the original papers respectively.
The loss function is the standard MSE loss between the predicted and groundtruth steering angles, as implemented in torch.nn.MSELoss. For all three models, the mean of all tensor elements' MSE loss was used.
The models were trained on driving scene images simulated and extracted from Udacity's self-driving car simulator. The dataset (~1.35GB) contains 97330 images (320x160) with steering angles labelled.
Models were trained on Tesla T4 on Google Colab.
All three models were first trained with 20 epochs, 1e-4 learning rate. For TruckNN and TruckResnet50, batch size is set to be 32, and that of TruckRNN is set to be 8. It was observed that TruckResnet50 out-performs other models, and hence was further trained to 40 epochs with early stopping tolerance of 10 epochs. Yet, no improvement of the loss was observed.
Models Comparison | TruckResnet50 continued training | |
---|---|---|
Train | ||
Validation |
The best validation loss observed is 0.066 MSE (or 0.25) from TruckResnet50, which is worse than the loss claimed in the paper by an entire degree. Possible contributing factors include model architecture (design and/or complexity), lack of training resources, limited dataset, difference in dataset content (simulator vs real world) and environmental variances, etc.
Despite of this, it can be observed that the models' predictions are reasonable to a high degree.
For further visualization, saliency maps of the last Resnet50 Convolutional layer (layer4) can be observed as below:
The model seems to possess salient features on the road.
Pretrained checkpoints for TruckNN and TruckRNN can be found in checkpoints
directory. Checkpoint for TruckResnet50 can be downloaded via this link. For usage, proceed to place it to ./checkpoints/TruckResnet50/best_ckpt_1.pth
.
- Set up all configurations in
config.py
. - To train networks,
python train.py
. - To inference networks on test images,
python inference.py
. - To visualize salient maps,
python visualize.py
. - To observe training history in tensorboard,
tensorboard --logdir runs
.
- NVIDIA 2016 paper End to End Learning for Self-Driving Cars.
- Du et al.'s 2019 paper Self-Driving Car Steering Angle Prediction Based on Image Recognition, and its affiliated repo.
- Manajit Pal's towards data science tutorial Deep Learning for Self-Driving Cars, as well as its affiliated repo.
- Aditya Rastogi's Data Driven Investor tutorial Visualizing Neural Networks using Saliency Maps.
- Zhenye Na's Self-Driving Car Simulator dataset on Kaggle.