Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate end-to-end delay in sudden obstacle braking response #5540

Closed
4 of 8 tasks
xmfcx opened this issue Nov 9, 2023 · 7 comments
Closed
4 of 8 tasks

Investigate end-to-end delay in sudden obstacle braking response #5540

xmfcx opened this issue Nov 9, 2023 · 7 comments
Assignees
Labels
component:perception Advanced sensor data processing and environment understanding. (auto-assigned) component:planning Route planning, decision-making, and navigation. (auto-assigned) component:sensing Data acquisition from sensors, drivers, preprocessing. (auto-assigned) type:performance Software optimization and system performance.

Comments

@xmfcx
Copy link
Contributor

xmfcx commented Nov 9, 2023

Checklist

  • I've read the contribution guidelines.
  • I've searched other issues and no duplicate issues were found.
  • I've agreed with the maintainers that I can plan this task.

Description

During several tests, a notable delay has been observed in the autonomous system's response to sudden obstacles, particularly when braking is required.

For instance, at 3:32 in a test discussion, the planning/perception components reacted later than expected.
This issue is not isolated, as searching for "reacts late" in the discussion page reveals multiple instances.

The proposed approach to investigate this involves a systematic test in both the planning simulator and AWSIM, focusing on the time taken from the obstacle's appearance to the initiation of the braking maneuver by the ego vehicle.

Purpose

The purpose of this investigation is to ensure that the response time of the Autoware system to sudden obstacles meets the necessary safety requirements.
By identifying the sources of delay within the system, from perception to actuation, we aim to optimize the response time and enhance the safety and reliability of the autonomous driving system.

Possible approaches

  1. Planning Simulator Test:

    • Simulate the ego vehicle traveling at 50km/h on a straight path.
    • Introduce a sudden obstacle at a predetermined emergency braking distance. (Like 40m)
    • Measure the latency from the obstacle's appearance to the initiation of the braking sequence with millisecond precision.
    • Analyze where the delays occur most significantly within the system.
  2. AWSIM Test:

    • Develop a detailed test procedure that spans from point cloud level detection to the end of the control stack.
    • Measure and document the delays at each stage of the detection and response process.

Definition of done

  • The systematic testing and measurement of delays have been completed in both the planning simulator and AWSIM.
  • A report detailing the delay times at each stage of the detection and reaction process has been compiled.
  • Recommendations for improvements, based on the findings, have been documented.
  • All tests and analyses have been peer-reviewed to ensure accuracy and reproducibility.
  • Any identified bottlenecks have been addressed with proposed solutions, and a follow-up plan for implementation and testing of these solutions is in place.
@xmfcx xmfcx added type:performance Software optimization and system performance. component:planning Route planning, decision-making, and navigation. (auto-assigned) component:perception Advanced sensor data processing and environment understanding. (auto-assigned) component:sensing Data acquisition from sensors, drivers, preprocessing. (auto-assigned) labels Nov 9, 2023
@brkay54
Copy link
Member

brkay54 commented Dec 25, 2023

We started work on this issue. To see the reaction time, we are currently working on a package. Currently, our target design is:

PsimObserver

For the first tests, we investigated only the reactions of the control topics.

Package: https://github.com/brkay54/autoware.universe/tree/feat/reaction-measure-tool/tools/reaction_analyzer

For an easy start, please read the usage part of the readme of the package and also, you can use the LEO-VM-00001. You do not need to edit the entity position in the parameter file, it is already set for LEO-VM-00001 map as default.

First test results:
15 tests have been made for each case: (obsacle_stop_planner with use_predicted_objects),
(obsacle_stop_planner without use_predicted_objects), (obstacle_cruise_planner).

Case - 1 - obsacle_stop_planner with use_predicted_objects:

image

image

Case - 2 - obsacle_stop_planner without use_predicted_objects:

image

image

Case - 3 - obsacle_cruise_planner:

image

image

Test document:
https://docs.google.com/spreadsheets/d/1qkSAMAeYa1taIg3HsUJKYvRS2nIYbMWsq9-b7bN5Dfw/edit?usp=sharing

For the first reaction tests, use_predicted_objects condition for obstacle_stop_planner is running unstable, reaction time is very high for some cases.

@brkay54
Copy link
Member

brkay54 commented Jan 4, 2024

The PR is ready to fix the high reaction times of obstacle_stop_planner when use_predicted_objects is true.
#5794
cc @xmfcx @mitsudome-r

@brkay54
Copy link
Member

brkay54 commented Jan 23, 2024

Reaction Tests For Perception Pipeline

The purpose of the tests is to measure the reaction time of each node in the Perception Pipeline. To achieve this, we developed a test environment by utilizing the AWSIM. We recorded rosbags in AWSIM to be able to launch the perception pipeline easily on the local devices. The disadvantage of this method is time resolution is about ~10ms because we can get a maximum 100 hz clock frequency. We used the sample_vehicle with awsim_sensor_kit (which has only 1 lidar) and tests were made in the awsim-stable branch but I am going to change the test environment to be able to run in main branches.

Firstly, we wanted to measure the processing time of the pointcloud_preprocessor to see how much time takes on the sensing side. To measure the delay times in the pointcloud_preprocessor pipeline, we added accumulated_time debug information which reports the delay inside of each node. (PR is here)

Screenshot from 2024-01-23 17-19-22

Because the time resolution is about ~10 ms and the ring_outlier_filter and crop_box_filter processes finished mostly below 10 ms, the results above are not accurate. However, the concatenating process time is higher than 10 ms and we can see that the total pointcloud preprocess finished around ~115 ms on average. The most time-consuming process is concatenating.

To measure the times in the perception pipeline, we created a test environment by using AWSIM. Firstly, we recorded two different rosbag files: one of them was in an empty area without any obstacle around the ego vehicle, and the other bag file was recorded in an area where there was a car in front of the ego vehicle. In both environments, the ego vehicle is in the same position and stationary, so all the topics except pointcloud topic are the same.

In reaction_analyzer node, we took sample messages from pointcloud topic in the rosbags and we recorded both pointcloud messages (pointcloud message without any object and pointcloud message with the car in front of the vehicle) at node initialization. It publishes the pointcloud without any obstacle at the beginning, when we want to spawn the obstacle, it starts to publish the pointcloud with car in front of the ego. After reaction_analyzer publishing the pointcloud with a car, it starts to search the first messages of the perception pipeline topics.

To be able to run all test environment, steps are explained below:

  • We should run the rosbag file which will publishes some necessary messages like vehicle_status, gnss_pose etc. to run the Autoware.
  • We run reaction_analyzer to publish pointcloud messages (firstly publishes empty one, then it publishes the pointcloud with car when we want to spawn the car)
  • We run autoware by the command: ros2 launch autoware_launch e2e_simulator.launch.xml vehicle_model:=sample_vehicle sensor_model:=awsim_sensor_kit map_path:=[PATH]

Test video:

2024-01-23.18-15-31.mp4

After the obstacle is spawned (I mean reaction_analyzer starts to publish pointcloud with a car), reaction_analyzer starts to search for the obstacle in predefined topics. When it finds the reacted message, it calculates the time between the spawn command time and the header time of the reacted message. Nodes in the perception pipeline, header times do not have the timing of the process done. Firstly, I had to change the header times of some nodes by changing them to include the current time of the process done.

With this test environment, we made 10 tests with some predefined checkpoints in the perception pipeline:

occupancy_grid_map_outlier: /perception/obstacle_segmentation/pointcloud
voxel_based_compare_map_filter: /perception/object_recognition/detection/pointcloud_map_filtered/pointcloud
lidar_centerpoint: /perception/object_recognition/detection/centerpoint/objects
obstacle_pointcloud_based_validator: /perception/object_recognition/detection/centerpoint/validation/objects
detected_object_feature_remover: /perception/object_recognition/detection/clustering/objects
detection_by_tracker: /perception/object_recognition/detection/detection_by_tracker/objects
object_lanelet_filter: /perception/object_recognition/detection/objects
map_based_prediction: /perception/object_recognition/objects

The results:
Screenshot from 2024-01-23 18-56-31

Statistics:
Screenshot from 2024-01-23 18-57-09

As we can see, the DetectedObjects outputs react in around ~200 ms, however, the PredictedObjects output takes much more time (around ~600 ms).

Future Work:

  • After solving some compatibility issues, I am planning to make these tests in the main branch
  • The current sensor setup has only 1 lidar, I am planning to use another vehicle setup in the AWSIM which will have more than 1 lidar.

@kaancolak
Copy link
Contributor

kaancolak commented Jan 24, 2024

@brkay54 Thank you for your great work. 🙏

Currently, trust counts 3 in the multi-object tracker, object should be detected 3 times before publishing as a tracked object. The result looks reasonable. Also, disabling delay_compensation inside into multi-object tracker module could give better results in this use case, it could add extra delay due to publishing a tracked object into a timer.

@brkay54
Copy link
Member

brkay54 commented Feb 5, 2024

Weekly Update

  • Last week, to be able to run the perception pipeline reaction test, we ran the reaction_analyzer by playing rosbag in another terminal. This causes ~10ms time resolution while measuring the reaction times. This week, I made some changes to the reaction analyzer to be able to record the necessary messages of the rosbag and replay them inside the node. In this way, we can measure the reaction times in the system time.

  • 3 Lidar test environment created by using AWSIM, and I can run perception pipeline tests by using them successfully.

Sample Test Video:

reaction-analyzer-sample-test-perception.mp4
  • Firstly, I investigated the pointcloud_preprocess pipeline to analyze how much delay we got to get concatenated pointcloud output. For this test, I used 3 Lidar setup and I analyzed the pipeline_latency_ms which we added before:

When I published 3 lidar pointcloud outputs, the pipeline_latency_ms started with a lower latency value:

Screenshot from 2024-02-05 06-11-03

After 12 minutes:

Screenshot from 2024-02-05 06-23-43

As you can see, the latency value of the concatenated nodelet of the right lidar is accumulating.

After 10 minutes:

Screenshot from 2024-02-05 06-32-52

After this higher value, the concatenated pointcloud delay decreases immediately as you can see graph below:

Screenshot from 2024-02-05 06-33-46

It seems to me a weird behavior. After, some time, I thought that it would be caused by phase differences between lidar output publishers. Then, I added another feature to the reaction analyzer to make it able to publish pointcloud outputs synchronically.

After running the reaction analyzer with synchronically published pointcloud, I couldn't see this accumulated pipeline_latency on the concatenating process:

Screenshot from 2024-02-05 06-57-55

The concatenated pointcloud delay of the top lidar is higher than others. It is caused by the higher delay of the distortion corrector. Distortion_corrector in the top lidar process takes more time than others because the pointcloud size of the crop_box_filtered_mirror of the top lidar is higher than others. The difference in width between the top lidar and right-left lidars is ~13,000 and it causes ~10ms delay. However, because of this ~10ms delay, the concatenate process is always running with the past pointcloud output of the top lidar. (I didn't try yet but I saw that we can set offset values for the specific pointclouds w.r.t documentation of the concatenate nodelet. I am going to try it.)


  • For the development phase of the reaction_analyzer, all features were added, just need to clean up the code. And we need to decide how to implement the timestamp reporting process of the perception pipeline. I created an issue for it here.

  • The Lidar-Only pipeline was measured above, I will also measure the Lidar-Camera pipeline.

@brkay54
Copy link
Member

brkay54 commented Feb 12, 2024

Weekly Update

  • We added reset() function to be able to restart test automatically. By using this, we can test the reaction times multiple times. After launch once, it can test as much as iteration_number and it calculates statistics. Then, it creates CSV file as shown below.

image

@mehmetdogru mehmetdogru moved this from Todo to In Progress in Planning & Control Working Group Feb 15, 2024
@xmfcx
Copy link
Contributor Author

xmfcx commented Mar 5, 2024

Closing because the delays are investigated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:perception Advanced sensor data processing and environment understanding. (auto-assigned) component:planning Route planning, decision-making, and navigation. (auto-assigned) component:sensing Data acquisition from sensors, drivers, preprocessing. (auto-assigned) type:performance Software optimization and system performance.
Projects
No open projects
Development

No branches or pull requests

3 participants