Perception: Added document of Apollo 3.0 perception

Perception: Fixed typo Apoloo -> Apollo
iradionov · Jun 29, 2018 · 5372a22 · 5372a22
1 parent 882f8c6
commit 5372a22
Show file tree

Hide file tree

Showing 7 changed files with 395 additions and 7 deletions.
diff --git a/docs/specs/Guideline_sensor_Installation_apollo_3.0.md b/docs/specs/Guideline_sensor_Installation_apollo_3.0.md
@@ -0,0 +1,69 @@
+# Guideline of sensor Installation for Apollo 3.0
+June 27, 2018
+
+## Required Hardware
+
+![Image](images/perception_required_hardware.png)
+
+Peripherals
+
+![Image](images/perception_peripherals.png)
+
+
+## Coordinate system
+
+Unit: millimeter (mm)
+
+Origin: The center of the rear wheel Axle	
+
+
+
+![Image](images/perception_setup_figure1.png)
+
+**Figure 1. The origin and the coordinate of the system**
+
+![Image](images/perception_setup_figure2.png)
+
+**Figure 2. Coordinates and Installation of cameras and a radar for truck**
+## Sensor installation Guideline
+###	IMU/GPS
+IMU/GPS need to be installed near the rear wheel axle. GPS antenna needs to install at the top of the vehicle.
+###	Radar
+The long-range Radar needs to be installed at the front bumper of the vehicle as shown in Figure 1 and Figure 2.
+###	Camera
+One camera with 6mm-lens should face the front of ego-vehicle. The front-facing camera needs to be installed at the center of the front of a vehicle the height between 1,600mm and 2,000mm from the ground (Camera_1) or at the windshield of a vehicle (Camera_2).
+
+![Image](images/perception_setup_figure3.png)
+
+**Figure 3. Example setup of cameras**
+
+After installation of cameras, The physical x, y, z location of camera w.r.t. origin should be saved in the calibration file.  	
+
+#### Verification of camera Setups
+The orientation of all three cameras should be all zeros. When the camera is installed, it is required to record a rosbag by driving a straight highway. By the replay of rosbag, the camera orientation should be re-adjusted to have pitch, yaw, and roll angles to be zero degree. When the camera is correctly installed, the horizon should be at the half of image width and not tilted. The vanishing point should be also at the center of the image. Please see the image below for the ideal camera setup.
+
+![Image](images/perception_setup_figure4.png)
+
+**Figure 4. An example of an image after camera installation. The horizon should be at the half of image height and not tilted. The vanishing point should be also at the center of the image. The red lines show the center of the width and the height of the image.**
+
+The example of estimated translation parameters is shown below. 
+```
+header:
+    seq: 0
+    stamp:
+        secs: 0
+        nsecs: 0
+    frame_id: white_mkz
+child_frame_id: onsemi_obstacle
+transform:
+    rotation:
+        x:  0.5
+        y: -0.5
+        z:  0.5
+        w: -0.5
+    translation:	
+        x: 1.895
+        y: -0.235
+        z: 1.256 
+```
+If angles are not zero, they need to be calibrated and represented in quarternion (see above stransformation->rotation).
diff --git a/docs/specs/images/perception_flow_chart_apollo_3.0.png b/docs/specs/images/perception_flow_chart_apollo_3.0.png
diff --git a/docs/specs/images/perception_visualization_apollo_3.0.png b/docs/specs/images/perception_visualization_apollo_3.0.png
diff --git a/docs/specs/perception_apollo_3.0.md b/docs/specs/perception_apollo_3.0.md
@@ -0,0 +1,92 @@
+# Perception
+Apollo 3.0
+June 27, 2018
+
+## Introduction
+Apollo 3.0 aims for Level-2 autonomous driving with low cost sensors. An autonomous vehicle will stay in the lane and keep a distance with a closest in-path vehicle (CIPV) using a single front-facing camera and a frontal radar. Apollo 3.0 supports high-speed autonomous driving on highway without any map. The deep network was learned to process an image data. The performance of the deep network will be improved over time as collecting more data.
+
+
+***Safety alert***
+
+Apollo 3.0 *does not* support a high curvature road, a road without lane marks including local roads and intersections. The perception module is based on the visual detection using a deep network with limited data. Therefore, before we release a better network, the driver should be careful in driving and always be ready to disengage the autonomous driving by turning the wheel to the right direction. Please perform the test drive at the safe and restricted area.
+
+- ***Recommended road***
+	- ***Road with clear white lane lines on both sides***
+
+- ***Avoid***
+	- ***High curvature road***
+	- ***Road without lane line marks***
+	- ***Intersection***
+	- ***Butt dots or dotted lane lines***
+	- ***Public road***
+
+## Perception modules
+The flow chart of each module is shown below.
+
+![Image](images/perception_flow_chart_apollo_3.0.png)
+
+**Figure 1: Flow diagram of Apollo 3.0**
+
+### Deep network
+Deep network ingests an image and provides two detection outputs, lane lines and objects for Apollo 3.0. There is an ongoing debate on individual task and co-trained task for deep learning. Individual networks such as a lane detection network or an object detection network usually perform better than one co-trained multi-task network. However, with given limited resources, multiple individual networks will be costly and consume more time in processing. Therefore, for the economic design, co-train is inevitable with some compromise in performance. In Apollo 3.0, YOLO [1][2] was used as a base network of object and lane segment detection. The object has vehicle, truck, cyclist, and pedestrian categories and represented by a 2-D bounding box with orientation information. The lane lines are detected by segmentation using the same network with some modification. For whole lane line, we have an indivdual network to provide longer lane lines either lane lines are broken or solid. 
+
+### Object detection/tracking
+In a traffic scene, there are two kinds of objects: stationary objects and dynamic objects. Stationary objects include lane lines, traffic lights, and thousands of traffic signs written in different languages. Other than driving, there are multiple landmarks on the road mostly for visual localization including streetlamp, barrier, bridge on top of the road, or any skyline. For stationary object, Apollo 3.0 will detect only lane lines.
+
+Among dynamic objects, Apollo cares passenger vehicles, trucks, cyclists, pedestrians, or any other object including animal or body parts on the road. Apollo can also categorize object based on which lane the object is in. The most important object is CIPV (closest object in our path). Next important objects would be the one in neighbor lanes.
+
+#### 2D-to-3D bounding box
+
+Given a 2D box, with its 3D size and orientation in camera, this module searches the 3D position in a camera coordinate system and estimates an accurate 3D distance using either the width, the height, or the 2D area of that 2D box. The module works without accurate extrinsic camera parameters.
+
+#### Object tracking
+
+The object tracking module utilizes multiple cues such as 3D position, 2D image patches, 2D boxes, or deep learning ROI features. The tracking problem is formulated as multiple hypothesis data association by combining the cues efficiently to provide the most correct association between tracks and detected object, thus obtaining correct ID association for each object.
+
+### Lane detection/tracking
+Among static objects, we will handle lane lines only in Apollo 3.0. The lane is for both longitudinal and lateral control. A lane itself guides lateral control and an object in the lane guides longitudinal control.
+
+#### Lane lines
+We have two types of lane lines, lane mark segment and whole lane line. The lane mark segment is used for visual localization and whole lane line is used for lane keeping. 
+The lane can be represented by multiple sets of polylines such as next left lane line, left line, right line, and next right line. Given a heatmap of lane lines from the deep network, the segmented binary image is generated by thresholding. The method first finds the connected components and detects the inner contours. Then it generates lane marker points based on the contour edges in the ground space of ego-vehicle coordinate system. After that, it associates these lane markers into several lane line objects with corresponding relative spatial (e.g., left(L0), right(R0), next left(L1), next right(L2), etc.) labels.
+
+### CIPV (Closest-In Path Vehicle)
+A CIPV is the closest vehicle in the ego-lane. An object is represented by 3D bounding box and its 2D projection from the top-down view localizes the object on the ground. Then, each object will be checked if it is in the ego-lane or not. Among the objects in our ego-lane, the closest one will be selected as a CIPV.
+
+### Tailgating
+Tailgating is a maneuver to follow a front vehicle. From the tracked objects and ego-vehicle motion, the trajectories of objects are estimated. This trajectory will guide how the objects are moving as a group on the road and the future trajectory can be predicted. There is two kinds of tailgating, the one is pure tailgating by following the specific car and the other is CIPV-guided tailgating, which the ego-vehicle follows the CIPV's trajectory when the no lane line is detected.  
+
+The snapshot of visualization of the output is shown in Figure 2.  
+![Image](images/perception_visualization_apollo_3.0.png)
+
+**Figure 2: Visualization of perception output in Apollo 3.0. The top left image shows image-based output. The bottom-left image shows the 3D bounding box of objects. The left image shows 3-D top-down view of lane lines and objects. The CIPV is marked with a red bounding box. The yellow lines shows the trajectory of each vehicle**
+
+### Radar + camera fusion
+Given multiple sensors, their output should be combined in a synergic fashion. Apollo 3.0. introduces a sensor set with a radar and a camera. For this process, both sensors need to be calibrated. Each sensor will be calibrated using the same method introduced in Apollo 2.0. After calibration, the output will be represented in a 3-D world coordinate and each output will be fused by their similarity in location, size, time and the utility of each sensor. After learning the utility function of each sensor, the camera contributes more on lateral distance and the radar contributes more on longitudinal distance measurement. The asynchronous sensor fusion algorithm is also provided as an option.
+
+### Psuedo lane
+All lane detection results will be combined spatially and temporarily to induce the psuedo lane which will be fed to planning and control. Some lane lines would be incorrect or missing in a certain frame. To provide the smooth lane line output, the history of lane lines using vehicle odometry is used. As the vehicle moves, the odometer of each frame is saved and lane lines in previous frames will be also saved in the history buffer. The detected lane line which does not match with the history lane lines will be removed and the history output will replace the lane line and be provided to the planning module.
+
+### Ultrasonic sensors
+Apollo 3.0 supports ultrasonic sensors. Each ultrasonic sensor provide the distance of a detected object through a CAN bus. The distance measurement from each ultrasonic sensor are gathered and broadcasted as a ROS topic. In the future, after fusing ultrasonic sensors, the map of objects and boundary will be published as a ROS output.
+
+## Output of perception
+The input of PnC will be quite different with that of the previous lidar-based system.
+
+- Lane line output
+	- Polyline and/or a polynomial curve
+	- Lane type by position: L1(next left lane line), L0(left lane line), R0(right lane line), R1(next right lane line)
+
+- Object output
+	- 3D rectangular cuboid
+	- Relative velocity and direction
+	- Type: CIPV, PIHP, others
+	- Classification type: car, truck, bike, pedestrian
+	- Drops: trajectory of an object
+
+The world coordinate will be ego-coordinate in 3D where the rear center axle is an origin.
+
+## References
+[1] J Redmon, S Divvala, R Girshick, A Farhadi, "You only look once: Unified, real-time object detection," CVPR 2016
+
+[2] J Redmon, A Farhadi, "YOLO9000: Better, Faster, Stronger," arXiv preprint