This repository contains the code and images necessary to calibrate a camera, perform perspective transforms on images, and ultimate detect and track lane line features from dash cam footage. I wrote a more in-depth article about the process on Medium.
When working with a camera, there is always a certain amount of lens distortion that occurs. Lens distortion is where distances or sizes of objects in the center of the photo and the sides are not accurately preserved. The amount of distortion that occurs depends on the camera. Luckily, a distorted photo and a distortion-corrected photo are just a linear matrix transform of each other. This means that we can generate that matrix transform according to each unique camera's lens distortion.
To do so, we need to calibrate the camera using a chessboard. OpenCV includes functions to extract the camera matrix transform from a chessboard image because chessboards are perfectly uniform. The process looks like this:
Fig1. - The coordinates of the corners are stored in a list.The coordinates are passed to cv2.undistort()
which returns the matrix transform and distortion. These are then used to correct photos for distortion.
When we are using very traditional computer vision and machine learning techniques, it's often the case that binary (black and white / zero and one) matrices/images are the best to work with. This is not the same as grayscale where there is a single color channel with pixel values that exist in the range [0, 255]
. Rather, we must apply some filters to a grayscale image to push those values to either 0 or 1 (i.e. 255).
I chose to use an ensemble method of filters. Each pixel was evaluated to meet either a threshold of Sobel gradients in the X and Y direction or a threshold of gradient magnitude and direction. This ensemble filter was combined bitwise_or
with a threshold filter of the Saturation channel from the HSV color space. My final pipeline looks like this:
def binary_pipeline(img):
img_copy = cv.GaussianBlur(img, (3, 3), 0)
#img_copy = np.copy(img)
# color channels
s_binary = hls_select(img_copy, sthresh=(140, 255), lthresh=(120, 255))
#red_binary = red_select(img_copy, thresh=(200,255))
# Sobel x
x_binary = abs_sobel_thresh(img_copy,thresh=(25, 200))
y_binary = abs_sobel_thresh(img_copy,thresh=(25, 200), orient='y')
xy = cv.bitwise_and(x_binary, y_binary)
#magnitude & direction
mag_binary = mag_threshold(img_copy, sobel_kernel=3, thresh=(30,100))
dir_binary = dir_threshold(img_copy, sobel_kernel=3, thresh=(0.8, 1.2))
# Stack each channel
gradient = np.zeros_like(s_binary)
gradient[((x_binary == 1) & (y_binary == 1)) | ((mag_binary == 1) & (dir_binary == 1))] = 1
final_binary = cv.bitwise_or(s_binary, gradient)
return final_binary
Now that we have the lanes in a nice, easy-to-work-with binary form, we need to remove all of the other extraneous information and only look at the lines. We can use cv2.getPerspectiveTransform
and cv2.warpPerspective
to generate a linear matrix transform between a source polygon and a destination polygon and then apply that transform to an image. It looks like this:
The Destination image above lends itself nicely to a very simple way to find the lane lines - a histogram! By keeping track of the greatest number of white/one values in a given column, we can track where the lane is.
Fig5. - A histogram of white pixels.The preliminary search works like this:
- Create a search window on the bottom of the image whose height is 1/9 of the image's height.
- Split the window into left and right halves.
- Locate the pixel column with the highest value via histogram.
- Draw a box around that area using a margin variable.
- Identify all of the non-zero pixels in that box. If there are enough, center the box on their mean position for the next window.
- Fit a quadradtic equation to all of the non-zero pixels identified in each half of the image (left lane and right lane)
Once we have the polynomial of the line that best fits the lane, we can optimize our search by only looking in the neighborhood of that polynomial from frame to frame.
Fig7. - Local area search for lane lines.Then we can pass back an overlay to the original frame of the area between the curves:
Fig8. - Dash cam footage with lane overlay.Polynomials are great because they are pure mathematics! We can use the formula for radius of curvature to extract some useful information about the road itself. Also, lane lines are a standardized size. This means we can convert from pixel space to metric space and measure where the car is in the lane. We can extract those features and output them to the frame:
Fig9. - Radius of curvature and vehicle offset in meters.Now it's time to apply the pipeline to a video! You can check out some footage from California or from my hometown of Chicago!
This project is mostly a showcase of the power of being explicit. Often times we think of deep learning as a cure-all, but there are situations where explicit computer vision is much better and traditional machine learning is much faster. This project has a very fast backend, but the drawing of bounding boxes, radius, etc (the image editing) is very slow. I can imagine using a pipeline like this to send information to a robotics system in realtime, but not for displaying a HUD to a driver/passenger. Further, this pipeline is not robust enough to handle the driving conditions that it needs to in order to be useable:
- Going uphill or downhill
- Rain/snow/etc
- Poor lighting conditions
- Roads that have little or no lane markers
- Occlusion by vehicles/signs/etc