Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(autoware_lidar_centerpoint): added the cuda_blackboard to centerpoint #9453

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

knzo25
Copy link
Contributor

@knzo25 knzo25 commented Nov 25, 2024

Description

This PR is part of a series of PRs that aim to accelerate the Sensing/Perception pipeline through an appropriate use of CUDA.

List of PRs:

To use these branches, the following additions to the autoware.repos are necessary:

  vendor/cuda_blackboard:
    type: git
    url: [email protected]:knzo25/cuda_blackboard.git
    version: main
  vendor/negotiated:
    type: git
    url: https://github.com/osrf/negotiated.git
    version: master

Depending on your machine and how many nodes are in a container, the following branch may also be required:
https://github.com/knzo25/launch_ros/tree/fix/load_composable_node
There seems to be a but in ROS where if you send too many services at once some will be lost and ros_launch can not handle that.

Related links

Parent Issue:

  • Link

How was this PR tested?

The sensing/perception pipeline was tested until centerpoint for TIER IV's taxi using the logging simulator.
The following tests were executed in a laptop equipped with a RTX 4060 (laptop) GPU and a Intel(R) Core(TM) Ultra 7 165H (22 cores)

Node / processing time [ms] Current PR
/sensing/lidar/top/crop_box_filter_self/debug/processing_time_ms 5.81 N/A
/sensing/lidar/top/crop_box_filter_mirror/debug/processing_time_ms 4.59 N/A
/sensing/lidar/top/distortion_corrector/debug/processing_time_ms 10.96 N/A
/sensing/lidar/top/ring_outlier_filter/debug/processing_time_ms 10.69 N/A
/sensing/lidar/top/cuda_organized_pointcloud_adapter/debug/processing_time_ms N/A 3.75
/sensing/lidar/top/cuda_pointcloud_preprocessor/debug/processing_time_ms N/A 1.00
/sensing/lidar/concatenate_data_synchronizer/debug/processing_time_ms 7.83 0.70
Total 38.8 5.45

Notes for reviewers

The main branch that I used for development is feat/cuda_acceleration_and_transport_layer.
However, the changes were too big so I split the PRs. That being said, development, if any will still be on that branch (and then cherrypicked to the respective PRs), and the review changes will be cherrypicked into the development branch.

Interface changes

An additional topic is added to perform type negotiation:
Example: input/pointcloud -> input/pointcloud and input/pointcloud/cuda

Effects on system behavior

Enabling this preprocessing in the launchers should provide a much reduced latency and cpu usage (at the cost of a higher GPU usage)

@github-actions github-actions bot added component:perception Advanced sensor data processing and environment understanding. (auto-assigned) tag:require-cuda-build-and-test labels Nov 25, 2024
Copy link

github-actions bot commented Nov 25, 2024

Thank you for contributing to the Autoware project!

🚧 If your pull request is in progress, switch it to draft mode.

Please ensure:

@kminoda
Copy link
Contributor

kminoda commented Nov 26, 2024

@knzo25 Thank you for your great work 🎉

Do you have any documentation for the cuda_blackboard package? Just a simple API references and overall design in a readme would be helpful to review the PRs.

@knzo25
Copy link
Contributor Author

knzo25 commented Nov 26, 2024

@kminoda
Yes, sorry. I am adding some results (processing time, memory consumption, cpu use), documentation, etc as of now, but wanted to open the PRs before to the the ball moving

@knzo25
Copy link
Contributor Author

knzo25 commented Dec 13, 2024

@kminoda
Added the cuda_blackboard documentation in https://github.com/knzo25/cuda_blackboard.
As discussed internally, I will create an issue to manage and discuss about this series of PRs

std::bind(&LidarCenterPointNode::pointCloudCallback, this, std::placeholders::_1));
pointcloud_sub_ =
std::make_unique<cuda_blackboard::CudaBlackboardSubscriber<cuda_blackboard::CudaPointCloud2>>(
*this, "~/input/pointcloud", false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you write a documentation for bool add_compatible_sub in cuda_blackboard repository? It is difficult to tell whether this "false" value is OK or not from the current documentation.

}
inference();
postProcess(det_boxes3d);
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we also perform voxel size validation?

Suggested change
return true;
// Check the actual number of pillars after inference to avoid unnecessary synchronization.
unsigned int num_pillars = 0;
CHECK_CUDA_ERROR(
cudaMemcpy(&num_pillars, num_voxels_d_.get(), sizeof(unsigned int), cudaMemcpyDeviceToHost));
if (num_pillars >= config_.max_voxel_size_) {
rclcpp::Clock clock{RCL_ROS_TIME};
RCLCPP_WARN_THROTTLE(
rclcpp::get_logger("image_projection_based_fusion"), clock, 1000,
"The actual number of pillars (%u) exceeds its maximum value (%zu). "
"Please considering increasing it since it may limit the detection performance.",
num_pillars, config_.max_voxel_size_);
}
return true;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:perception Advanced sensor data processing and environment understanding. (auto-assigned) tag:require-cuda-build-and-test
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

2 participants