Semantic segmentation is understanding an image at pixel level i.e, assigning each pixel in the image an object class.In this project, the goal is labeling the pixels of a road in images using a Fully Convolutional Network (FCN).
The goals of this project are the following:
- Loading Pretrained VGG Model into TensorFlow
- Creating the layers for a fully convolutional network
- Building skip-layers using the vgg layers
- Optimizing the model loss
- Training model on segmented images (3 classes: background, road, other-road)
Download the Kitti Road dataset from here. Extract the dataset in the data
folder. This will create the folder data_road
with all the training a test images.
Current FCN8 implementation follows this paper
- Encoder: load the vgg model and weights for layers (3,4 & 7)
- Conv1x1: add a convolution filer with kernel_size (1,1) to the last layer to keep spatial information
- Decoder: add 3 upsample layers and skip connections in between to add information from multiple resolutions
- Data processing:
- categorization: images & ground truth images are provided in Kitti dataset, ground_truth images should be categorized into 3 classes of background, road & other_road
- data augmentation: flipping the images also helps adding more data to the training set
- Learning rate: 0.0001
- drop_out: 0.5
- Number of Epochs: 1000
- Batch_size: 16
Here are the results for 3 models, first model is trained on 1000 epochs without augmentation and only 1 road, 2nd models i trained on 2 roads without augmentation and 500 epochs, and the last model is trained on 2 roads with augmentation and 1000 epochs.
Single Road (1000 epochs without augmentation) | Multiple Roads (500 epochs without augmentation) | Multiple Roads (1000 epochs with augmentation) |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
As shown above training network on 2 roads need more data with more variety of other side of the road, overall all networks performed very good on 500-1000 epochs. Further improvement would require more images and angles, some preprocessing can also improve the results by removing noisy areas from road_detection such as limiting the scope of image to filter sky.
Here is the tensorboard graph for the loss minimization:
I save the tensorflow model and loaded it up in my jupyter notebook experiment.ipynb, model processes each frame individually.
Make sure you have the following is installed:
Run the following command to run the project:
python main.py