This document describes ideas and tests performed for preparation of the traffic lights classifier. (download) (download) (workflow)
1.3 Images from a ROS Traffic Light Classifier
These images are cropped around several traffic lights.
PATH | samples |
/train/red | 7133 |
/train/green | 4638 |
/train/unknown | 23201 |
/test/red | 1759 |
/test/green | 1165 |
/test/unknown | 5823 |
These images have been used during the initial phase of development but have been omitted after realizing that we need to classify a full image containing the traffic light and not only a cropped part of it containing just the traffic light itself.
These images became available on August 24th over the slack channel #p-system-integration. It contains two bag files with an image feed from the Udacity self-driving car's camera in the test lot and a topic containing the car's position at google drive.
Caleb Kirksey: "The video has some shots of the traffic light that we'll be using in testing. I'll follow up with the relative location of the traffic light as well as more images of the light for training a neural net."
0-red | 1-yellow | 2-green | 4-unknown | Sum | |
test_images | 332 | 133 | 467 | 929 | 1861 |
train | 269 | 108 | 371 | 740 | 1488 |
test | 63 | 25 | 96 | 189 | 373 |
This is a small Jupiter notebook for testing the approach of this ROS node.
Downloading the original trained network and applying it on the test data set shows almost the same accuracy of 50% over all classes. Strange was the fact that the training accuracy was 97%. Due to availability of just a cpu and no gnu further training with more epochs could not be performed in the remote region of development at the time being.
For completeness here the results:
Accuracy | all | red | green | unknown |
for predictinon on test data | 50.83% | 50.17% | 50.82% | 51.15% |
Two models have been used.
The original model is shown here:
Layer (type) Output Shape Param # Connected to
convolution2d_5 (Convolution2D) (None, 64, 64, 32) 896 convolution2d_input_6[0][0]
activation_7 (Activation) (None, 64, 64, 32) 0 convolution2d_5[0][0]
convolution2d_6 (Convolution2D) (None, 62, 62, 32) 9248 activation_7[0][0]
activation_8 (Activation) (None, 62, 62, 32) 0 convolution2d_6[0][0]
maxpooling2d_3 (MaxPooling2D) (None, 31, 31, 32) 0 activation_8[0][0]
dropout_4 (Dropout) (None, 31, 31, 32) 0 maxpooling2d_3[0][0]
convolution2d_7 (Convolution2D) (None, 31, 31, 64) 18496 dropout_4[0][0]
activation_9 (Activation) (None, 31, 31, 64) 0 convolution2d_7[0][0]
convolution2d_8 (Convolution2D) (None, 29, 29, 64) 36928 activation_9[0][0]
activation_10 (Activation) (None, 29, 29, 64) 0 convolution2d_8[0][0]
maxpooling2d_4 (MaxPooling2D) (None, 14, 14, 64) 0 activation_10[0][0]
dropout_5 (Dropout) (None, 14, 14, 64) 0 maxpooling2d_4[0][0]
flatten_2 (Flatten) (None, 12544) 0 dropout_5[0][0]
dense_3 (Dense) (None, 512) 6423040 flatten_2[0][0]
activation_11 (Activation) (None, 512) 0 dense_3[0][0]
dropout_6 (Dropout) (None, 512) 0 activation_11[0][0]
dense_4 (Dense) (None, 3) 1539 dropout_6[0][0]
activation_12 (Activation) (None, 3) 0 dense_4[0][0]
Total params: 6,490,147
Trainable params: 6,490,147
Non-trainable params: 0
Squeezenet is the approach of the Winner of the Nexar Competition and described very extensively together with alternative methods that worked out and did not work out in this article.
The complete code - without the data! - is on github. The structure of the model is shown here:
Layer (type) Output Shape Param # Connected to
input_1 (InputLayer) (None, 224, 224, 3) 0
conv1 (Conv2D) (None, 112, 112, 96) 14208 input_1[0][0]
maxpool1 (MaxPooling2D) (None, 55, 55, 96) 0 conv1[0][0]
fire2_squeeze (Conv2D) (None, 55, 55, 16) 1552 maxpool1[0][0]
fire2_expand1 (Conv2D) (None, 55, 55, 64) 1088 fire2_squeeze[0][0]
fire2_expand2 (Conv2D) (None, 55, 55, 64) 9280 fire2_squeeze[0][0]
merge_1 (Merge) (None, 55, 55, 128) 0 fire2_expand1[0][0]
fire3_squeeze (Conv2D) (None, 55, 55, 16) 2064 merge_1[0][0]
fire3_expand1 (Conv2D) (None, 55, 55, 64) 1088 fire3_squeeze[0][0]
fire3_expand2 (Conv2D) (None, 55, 55, 64) s9280 fire3_squeeze[0][0]
merge_2 (Merge) (None, 55, 55, 128) 0 fire3_expand1[0][0]
fire4_squeeze (Conv2D) (None, 55, 55, 32) 4128 merge_2[0][0]
fire4_expand1 (Conv2D) (None, 55, 55, 128) 4224 fire4_squeeze[0][0]
fire4_expand2 (Conv2D) (None, 55, 55, 128) 36992 fire4_squeeze[0][0]
merge_3 (Merge) (None, 55, 55, 256) 0 fire4_expand1[0][0]
maxpool4 (MaxPooling2D) (None, 27, 27, 256) 0 merge_3[0][0]
fire5_squeeze (Conv2D) (None, 27, 27, 32) 8224 maxpool4[0][0]
fire5_expand1 (Conv2D) (None, 27, 27, 128) 4224 fire5_squeeze[0][0]
fire5_expand2 (Conv2D) (None, 27, 27, 128) 36992 fire5_squeeze[0][0]
merge_4 (Merge) (None, 27, 27, 256) 0 fire5_expand1[0][0]
fire6_squeeze (Conv2D) (None, 27, 27, 48) 12336 merge_4[0][0]
fire6_expand1 (Conv2D) (None, 27, 27, 192) 9408 fire6_squeeze[0][0]
fire6_expand2 (Conv2D) (None, 27, 27, 192) 83136 fire6_squeeze[0][0]
merge_5 (Merge) (None, 27, 27, 384) 0 fire6_expand1[0][0]
______________________________________________________________________________ ______________________
fire7_squeeze (Conv2D) (None, 27, 27, 48) 18480 merge_5[0][0]
fire7_expand1 (Conv2D) (None, 27, 27, 192) 9408 fire7_squeeze[0][0]
fire7_expand2 (Conv2D) (None, 27, 27, 192) 83136 fire7_squeeze[0][0]
merge_6 (Merge) (None, 27, 27, 384) 0 fire7_expand1[0][0]
fire8_squeeze (Conv2D) (None, 27, 27, 64) 24640 merge_6[0][0]
fire8_expand1 (Conv2D) (None, 27, 27, 256) 16640 fire8_squeeze[0][0]
fire8_expand2 (Conv2D) (None, 27, 27, 256) 147712 fire8_squeeze[0][0]
merge_7 (Merge) (None, 27, 27, 512) 0 fire8_expand1[0][0]
maxpool8 (MaxPooling2D) (None, 13, 13, 512) 0 merge_7[0][0]
fire9_squeeze (Conv2D) (None, 13, 13, 64) 32832 maxpool8[0][0]
fire9_expand1 (Conv2D) (None, 13, 13, 256) 16640 fire9_squeeze[0][0]
fire9_expand2 (Conv2D) (None, 13, 13, 256) 147712 fire9_squeeze[0][0]
merge_8 (Merge) (None, 13, 13, 512) 0 fire9_expand1[0][0]
fire9_dropout (Dropout) (None, 13, 13, 512) 0 merge_8[0][0]
conv10 (Conv2D) (None, 13, 13, 4) 2052 fire9_dropout[0][0]
avgpool10 (AveragePooling2D) (None, 1, 1, 4) 0 conv10[0][0]
flatten (Flatten) (None, 4) 0 avgpool10[0][0]
softmax (Activation) (None, 4) 0 flatten[0][0]
Total params: 737,476
Trainable params: 737,476
Non-trainable params: 0
The following training runs have been performed.
no. | lr | batch | images | model | Opt. | epochs | s/epoch | loss acc | test acc. |
001 | --- | 64 | all | get_model | RMSPROP | 25 | 600s | loss: 0.25 - acc: 0.85, | 44% |
002 | 0.0001 | 64 | 2000 | get_model | SGD | 25 | 50s | loss: 0.3328 - acc: 0.8896 | 42.4% |
003 | 0.001 | 64 | 2000 | get_model | SGD | 25 | 50s | loss: 0.1373 - acc: 0.9585 | 48.4% / 49.6% |
004 | 0.001 | 16 | 2000 | get_model | SGD | 25 | 80s | loss: 0.1613 - acc: 0.9545 | 47.5% / 50.1% |
005 | 0.001 | 128 | 2000 | get_model | SGD | 25 | 50s | loss: 0.1627 - acc: 0.9521 | 42.9% |
006 | 0.005 | 64 | 2000 | get_model | SGD | 25 | 50s | loss: 0.1304 - acc: 0.9673 | 44.3% |
007 | 0.001 | 64 | all | get_model | SGD | 50 | 850s | loss: 0.0919 - acc: 0.9737 | 50.0% |
008 | 0.001 | 64 | 2000 | squeezze | Adam | 3 | 380s | loss: 0.7187 - acc: 0.7310 | 51.2% / 54.8 |
009 | 0.01 | 64 | 2000 | squeezze | Adam | 3 | 371s | loss: 5.7216 - acc: 0.6450 | 67.0% |
010 | 0.005 | 64 | 2000 | squeezze | Adam | 3 | 370s | loss: 0.8576 - acc: 0.6704 | 65.8% / 65.8% |
011 | 0.005 | 64 | 2000 | squeezze | Adam | 25 | 349s | loss: 5.7216 - acc: 0.6450 | 66.1% |
using Udacity data from here onwards | |||||||||
012 | 0.001 | 64 | all-ud | squeezze | Adam | 25 | 349s | loss: 0.4217 - acc: 0.8324 | DATA MISMATCH |
013 | 0.001 | 64 | all-ud | squeezze | Adam | 25 | 338s | loss: 0.6871 - acc: 0.7021 | DATA MISMATCH |
014 | 0.001 | 64 | all-ud | squeezze | SGD | 25 | 260s | loss: 1.1941 - acc: 0.4973 | 51.7% |
015 | 0.001 | 64 | all-ud | get_model | SGD | 3 | 500s | loss: 1.0270 - acc: 0.6028 | 46.4% |
016 | 0.001 | 64 | all-ud | get_model | SGD | 25 | 450s | loss: 0.6067 - acc: 0.7272 | 38.1% |
Test accuracy is determined for predicting a completly unseen test dataset splitted from the original dataset.
2.2 Using Yolo v2
1. install yolo according to link above
2. Copy files into folder ./test
3. ./flow --test /Users/rainerbareiss/Downloads/traffic_light_images/PATH --model cfg/tiny-yolo-udacity.cfg --load 8987 --json
4. ./flow --test test/ --model cfg/tiny-yolo-udacity.cfg --load 8987
The following image shows a sample picture where yolo detected two cars on the other lane and a traffic light.
This image shows the problem, only one traffic light was detected of 3 being present.
This behavior resulted in not following this approach further.
Applying a color classifier from github just detected a lot of black and grey and no prominent green or red in the images of the dataset nr. 1.3. This approach has not been followed further.
This is the approach that went into final release code for the simulator traffic light detector and is described in further detail in the Jupyter notebooks. The model reached 100% actuacy for the simulator test dataset, but had a hard time identifying RED lights in the front camera images from the sample rosbags that Udacity provided. This GAN model is based on the Tensorflow model as described in Udacity AIND term 2, Semisupervised Learning section:
- Traffic Light GAN for Simulator: GAN-Semi-Supervised-sim/gan_semi_supervised.ipynb
- Refined Traffic Light GAN for New Simulator: GAN-Semi-Supervised-sim/gan_semi_supervised-new-sim.ipynb
- Traffic Light GAN for Carla via ROSBAG: GAN-Semi-Supervised-site/gan_semi_supervised.ipynb
The key hint for finally solving our traffic lights classifier problem for Carla came from Anthony S in #slack channel #p-system-integration in this post: using Tensorflow Model: Object Detection API as described below.
Tensorflow Object Detection API is an accurate machine learning model capable of localizing and identifying multiple objects in a single image. The API is an open source framework built on top of Tensorflow that makes it easy to construct, train and deploy object detection models. Rainer first tested the API with a pre-trained model to see how well it worked with the rosbag front camera image dataset, and the results, as shown below was indeed very promising:
Sebastian then tested this with the rosbag images and here is the resulting Gif:
The Faster R-CNN really kept the promise of a fast and powerful algorithm when using a pretrained model on a small number of our own training images. If you have GPU for re-training the model locally, clone the Tensorflow model github repository and follow the instructions below to build, test and verify the Object Detection model:
cd SDC-System-Integration/classifier/faster-R-CNN
git clone
cd models/research
sudo apt-get install protobuf-compiler python-pil python-lxml
protoc object_detection/protos/*.proto --python_out=.
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
python object_detection/builders/
Download and extract the pre-trained model and weights from (5gb):
cd SDC-System-Integration/classifier/faster-R-CNN
ls -l faster_rcnn_resnet101_coco_11_06_2017
total 659048
-rw-r----- 1 demo demo 196890839 Jun 11 20:58 frozen_inference_graph.pb
-rw-r----- 1 demo demo 20932156 Jun 11 20:58 graph.pbtxt
-rw-r----- 1 demo demo 445812832 Jun 11 21:00
-rw-r----- 1 demo demo 40521 Jun 11 21:00 model.ckpt.index
-rw-r----- 1 demo demo 11175327 Jun 11 21:00 model.ckpt.meta
We followed the API instruction on how to create the dataset for the API here and used the pretrained model and weights in a script to help generate the training and validation sets. We use two pre-generated and hand labeled CSV file to generate our training and validation set. NOTE: This has already been done.
python --infilename just_traffic_light.csv --outfilename data/train.record
python --infilename loop_with_traffic_light.csv --outfilename data/test.record
Move the model training configuration into position for training:
mkdir models/research/model
cp faster_rcnn_resnet101_tl.config models/research/model
We updated the model configuration from 300 to 4 max predictions and the number of labels from 900 to 4 to reduce the prediction time from 3 per second to over 13 per second using the faster_rcnn_resnet101_tl.config configuration. Retrain the model on the just_traffic_light.bag
rosbag data:
cd models/research
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
python object_detection/ --logtostderr --pipeline_config=model/faster_rcnn_resnet101_tl.config --train_dir=../../data
On a separate terminal, launch and monitor the training using Tensorboard:
cd models/research
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
tensorboard --logdir=../../data
After the training is complete, freeze the best model using the highest checkpoint number (assuming 18871 for this example):
mkdir data2
cd models/research
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
ls ../../data/model.ckpt*
python object_detection/ --input_type image_tensor --pipeline_config_path model/faster_rcnn_resnet101_tl.config --trained_checkpoint_prefix ../../data/model.ckpt-18871 --output_directory ../../data2
After the frozen model weights have been generated, move it in place into the checkpoints directory and you can test it with the Udacity sample rosbags:
cd classifier/faster-R-CNN
mv data2/frozen_inference_graph.pb checkpoints/frozen_inference_graph.pb
cd ../../tools
python --datasets /path/to/just_traffic_light.bag
Resulting model was able to correctly identify traffic light from the sample rosbag and its current state at high confidence:
If you want to use your newly generated frozen model weights for deployment, create new frozen chucks:
cd ../classifier/faster-R-CNN
2.6 Haar Classifier
This approach was an alternative to using yolo and cropping traffic light images and has not been followed.
The recently published DenseNet on medium and on github was one idea we would have tried if needed.