Investigate end-to-end delay in sudden obstacle braking response #5540

xmfcx · 2023-11-09T15:25:37Z

Checklist

I've read the contribution guidelines.
I've searched other issues and no duplicate issues were found.
I've agreed with the maintainers that I can plan this task.

Description

During several tests, a notable delay has been observed in the autonomous system's response to sudden obstacles, particularly when braking is required.

For instance, at 3:32 in a test discussion, the planning/perception components reacted later than expected.
This issue is not isolated, as searching for "reacts late" in the discussion page reveals multiple instances.

The proposed approach to investigate this involves a systematic test in both the planning simulator and AWSIM, focusing on the time taken from the obstacle's appearance to the initiation of the braking maneuver by the ego vehicle.

Purpose

The purpose of this investigation is to ensure that the response time of the Autoware system to sudden obstacles meets the necessary safety requirements.
By identifying the sources of delay within the system, from perception to actuation, we aim to optimize the response time and enhance the safety and reliability of the autonomous driving system.

Possible approaches

Planning Simulator Test:
- Simulate the ego vehicle traveling at 50km/h on a straight path.
- Introduce a sudden obstacle at a predetermined emergency braking distance. (Like 40m)
- Measure the latency from the obstacle's appearance to the initiation of the braking sequence with millisecond precision.
- Analyze where the delays occur most significantly within the system.
AWSIM Test:
- Develop a detailed test procedure that spans from point cloud level detection to the end of the control stack.
- Measure and document the delays at each stage of the detection and response process.

Definition of done

The systematic testing and measurement of delays have been completed in both the planning simulator and AWSIM.
A report detailing the delay times at each stage of the detection and reaction process has been compiled.
Recommendations for improvements, based on the findings, have been documented.
All tests and analyses have been peer-reviewed to ensure accuracy and reproducibility.
Any identified bottlenecks have been addressed with proposed solutions, and a follow-up plan for implementation and testing of these solutions is in place.

brkay54 · 2023-12-25T05:47:03Z

We started work on this issue. To see the reaction time, we are currently working on a package. Currently, our target design is:

For the first tests, we investigated only the reactions of the control topics.

Package: https://github.com/brkay54/autoware.universe/tree/feat/reaction-measure-tool/tools/reaction_analyzer

For an easy start, please read the usage part of the readme of the package and also, you can use the LEO-VM-00001. You do not need to edit the entity position in the parameter file, it is already set for LEO-VM-00001 map as default.

First test results:
15 tests have been made for each case: (obsacle_stop_planner with use_predicted_objects),
(obsacle_stop_planner without use_predicted_objects), (obstacle_cruise_planner).

Case - 1 - obsacle_stop_planner with use_predicted_objects:

Case - 2 - obsacle_stop_planner without use_predicted_objects:

Case - 3 - obsacle_cruise_planner:

Test document:
https://docs.google.com/spreadsheets/d/1qkSAMAeYa1taIg3HsUJKYvRS2nIYbMWsq9-b7bN5Dfw/edit?usp=sharing

For the first reaction tests, use_predicted_objects condition for obstacle_stop_planner is running unstable, reaction time is very high for some cases.

brkay54 · 2024-01-04T12:45:58Z

The PR is ready to fix the high reaction times of obstacle_stop_planner when use_predicted_objects is true.
#5794
cc @xmfcx @mitsudome-r

brkay54 · 2024-01-23T16:07:55Z

Reaction Tests For Perception Pipeline

The purpose of the tests is to measure the reaction time of each node in the Perception Pipeline. To achieve this, we developed a test environment by utilizing the AWSIM. We recorded rosbags in AWSIM to be able to launch the perception pipeline easily on the local devices. The disadvantage of this method is time resolution is about ~10ms because we can get a maximum 100 hz clock frequency. We used the sample_vehicle with awsim_sensor_kit (which has only 1 lidar) and tests were made in the awsim-stable branch but I am going to change the test environment to be able to run in main branches.

Firstly, we wanted to measure the processing time of the pointcloud_preprocessor to see how much time takes on the sensing side. To measure the delay times in the pointcloud_preprocessor pipeline, we added accumulated_time debug information which reports the delay inside of each node. (PR is here)

Because the time resolution is about ~10 ms and the ring_outlier_filter and crop_box_filter processes finished mostly below 10 ms, the results above are not accurate. However, the concatenating process time is higher than 10 ms and we can see that the total pointcloud preprocess finished around ~115 ms on average. The most time-consuming process is concatenating.

To measure the times in the perception pipeline, we created a test environment by using AWSIM. Firstly, we recorded two different rosbag files: one of them was in an empty area without any obstacle around the ego vehicle, and the other bag file was recorded in an area where there was a car in front of the ego vehicle. In both environments, the ego vehicle is in the same position and stationary, so all the topics except pointcloud topic are the same.

In reaction_analyzer node, we took sample messages from pointcloud topic in the rosbags and we recorded both pointcloud messages (pointcloud message without any object and pointcloud message with the car in front of the vehicle) at node initialization. It publishes the pointcloud without any obstacle at the beginning, when we want to spawn the obstacle, it starts to publish the pointcloud with car in front of the ego. After reaction_analyzer publishing the pointcloud with a car, it starts to search the first messages of the perception pipeline topics.

To be able to run all test environment, steps are explained below:

We should run the rosbag file which will publishes some necessary messages like vehicle_status, gnss_pose etc. to run the Autoware.
We run reaction_analyzer to publish pointcloud messages (firstly publishes empty one, then it publishes the pointcloud with car when we want to spawn the car)
We run autoware by the command: ros2 launch autoware_launch e2e_simulator.launch.xml vehicle_model:=sample_vehicle sensor_model:=awsim_sensor_kit map_path:=[PATH]

Test video:

2024-01-23.18-15-31.mp4

After the obstacle is spawned (I mean reaction_analyzer starts to publish pointcloud with a car), reaction_analyzer starts to search for the obstacle in predefined topics. When it finds the reacted message, it calculates the time between the spawn command time and the header time of the reacted message. Nodes in the perception pipeline, header times do not have the timing of the process done. Firstly, I had to change the header times of some nodes by changing them to include the current time of the process done.

With this test environment, we made 10 tests with some predefined checkpoints in the perception pipeline:

occupancy_grid_map_outlier: /perception/obstacle_segmentation/pointcloud
voxel_based_compare_map_filter: /perception/object_recognition/detection/pointcloud_map_filtered/pointcloud
lidar_centerpoint: /perception/object_recognition/detection/centerpoint/objects
obstacle_pointcloud_based_validator: /perception/object_recognition/detection/centerpoint/validation/objects
detected_object_feature_remover: /perception/object_recognition/detection/clustering/objects
detection_by_tracker: /perception/object_recognition/detection/detection_by_tracker/objects
object_lanelet_filter: /perception/object_recognition/detection/objects
map_based_prediction: /perception/object_recognition/objects

The results:

Statistics:

As we can see, the DetectedObjects outputs react in around ~200 ms, however, the PredictedObjects output takes much more time (around ~600 ms).

Future Work:

After solving some compatibility issues, I am planning to make these tests in the main branch
The current sensor setup has only 1 lidar, I am planning to use another vehicle setup in the AWSIM which will have more than 1 lidar.

kaancolak · 2024-01-24T09:35:42Z

@brkay54 Thank you for your great work. 🙏

Currently, trust counts 3 in the multi-object tracker, object should be detected 3 times before publishing as a tracked object. The result looks reasonable. Also, disabling delay_compensation inside into multi-object tracker module could give better results in this use case, it could add extra delay due to publishing a tracked object into a timer.

brkay54 · 2024-02-05T05:58:47Z

Weekly Update

Last week, to be able to run the perception pipeline reaction test, we ran the reaction_analyzer by playing rosbag in another terminal. This causes ~10ms time resolution while measuring the reaction times. This week, I made some changes to the reaction analyzer to be able to record the necessary messages of the rosbag and replay them inside the node. In this way, we can measure the reaction times in the system time.
3 Lidar test environment created by using AWSIM, and I can run perception pipeline tests by using them successfully.

Sample Test Video:

reaction-analyzer-sample-test-perception.mp4

Firstly, I investigated the pointcloud_preprocess pipeline to analyze how much delay we got to get concatenated pointcloud output. For this test, I used 3 Lidar setup and I analyzed the pipeline_latency_ms which we added before:

When I published 3 lidar pointcloud outputs, the pipeline_latency_ms started with a lower latency value:

After 12 minutes:

As you can see, the latency value of the concatenated nodelet of the right lidar is accumulating.

After 10 minutes:

After this higher value, the concatenated pointcloud delay decreases immediately as you can see graph below:

It seems to me a weird behavior. After, some time, I thought that it would be caused by phase differences between lidar output publishers. Then, I added another feature to the reaction analyzer to make it able to publish pointcloud outputs synchronically.

After running the reaction analyzer with synchronically published pointcloud, I couldn't see this accumulated pipeline_latency on the concatenating process:

The concatenated pointcloud delay of the top lidar is higher than others. It is caused by the higher delay of the distortion corrector. Distortion_corrector in the top lidar process takes more time than others because the pointcloud size of the crop_box_filtered_mirror of the top lidar is higher than others. The difference in width between the top lidar and right-left lidars is ~13,000 and it causes ~10ms delay. However, because of this ~10ms delay, the concatenate process is always running with the past pointcloud output of the top lidar. (I didn't try yet but I saw that we can set offset values for the specific pointclouds w.r.t documentation of the concatenate nodelet. I am going to try it.)

For the development phase of the reaction_analyzer, all features were added, just need to clean up the code. And we need to decide how to implement the timestamp reporting process of the perception pipeline. I created an issue for it here.
The Lidar-Only pipeline was measured above, I will also measure the Lidar-Camera pipeline.

brkay54 · 2024-02-12T06:18:11Z

Weekly Update

We added reset() function to be able to restart test automatically. By using this, we can test the reaction times multiple times. After launch once, it can test as much as iteration_number and it calculates statistics. Then, it creates CSV file as shown below.

Code cleaned-up and ready for the first review. Usage was explained in detail in readme file.
To be able to run the reaction_analyzer, we need to add an option to disable all the perception related modules: feat(tier4_simulator_launch): add option to disable all perception related modules #6382
We realized a bug in perception pipeline, created issue here.
Only blocker is how to define the publishing times of the nodes: Make it possible to detect publishing times of the messages in the pipelines #6255

xmfcx · 2024-03-05T12:57:38Z

Closing because the delays are investigated.

xmfcx added this to Sensing & Perception Working Group and Planning & Control Working Group Nov 9, 2023

xmfcx assigned brkay54 Nov 9, 2023

idorobotics moved this to Todo in Sensing & Perception Working Group Nov 15, 2023

idorobotics moved this to Todo in Planning & Control Working Group Nov 22, 2023

xmfcx mentioned this issue Dec 25, 2023

feat(reaction_analyzer): add reaction anaylzer tool to measure end-to-end delay in sudden obstacle braking response #5954

Merged

7 tasks

brkay54 mentioned this issue Jan 4, 2024

Obstacle stop planner reacts too late when use_predicted_objects is true #6007

Closed

3 tasks

brkay54 mentioned this issue Jan 31, 2024

Make it possible to detect publishing times of the messages in the pipelines #6255

Closed

8 tasks

mehmetdogru moved this from Todo to In Progress in Planning & Control Working Group Feb 15, 2024

xmfcx closed this as completed Mar 5, 2024

github-project-automation bot moved this from In Progress to Done in Planning & Control Working Group Mar 5, 2024

github-project-automation bot moved this from Todo to Done in Sensing & Perception Working Group Mar 5, 2024

xmfcx mentioned this issue Mar 7, 2024

Add a new tool to measure end-to-end delay of the Autoware for sudden obstacles #6547

Closed

16 tasks

xmfcx mentioned this issue Sep 4, 2024

Statistics of latest Autoware's Whole Perception Pipeline Latency (Lidar only mode) #8752

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate end-to-end delay in sudden obstacle braking response #5540

Investigate end-to-end delay in sudden obstacle braking response #5540

xmfcx commented Nov 9, 2023 •

edited by brkay54

Loading

brkay54 commented Dec 25, 2023 •

edited

Loading

brkay54 commented Jan 4, 2024

brkay54 commented Jan 23, 2024

kaancolak commented Jan 24, 2024 •

edited

Loading

brkay54 commented Feb 5, 2024 •

edited

Loading

brkay54 commented Feb 12, 2024 •

edited

Loading

xmfcx commented Mar 5, 2024

Investigate end-to-end delay in sudden obstacle braking response #5540

Investigate end-to-end delay in sudden obstacle braking response #5540

Comments

xmfcx commented Nov 9, 2023 • edited by brkay54 Loading

Checklist

Description

Purpose

Possible approaches

Definition of done

brkay54 commented Dec 25, 2023 • edited Loading

brkay54 commented Jan 4, 2024

brkay54 commented Jan 23, 2024

kaancolak commented Jan 24, 2024 • edited Loading

brkay54 commented Feb 5, 2024 • edited Loading

brkay54 commented Feb 12, 2024 • edited Loading

xmfcx commented Mar 5, 2024

xmfcx commented Nov 9, 2023 •

edited by brkay54

Loading

brkay54 commented Dec 25, 2023 •

edited

Loading

kaancolak commented Jan 24, 2024 •

edited

Loading

brkay54 commented Feb 5, 2024 •

edited

Loading

brkay54 commented Feb 12, 2024 •

edited

Loading