Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(hesai)!: combine Hesai ROS wrappers into a single node #127

Merged
merged 57 commits into from
May 22, 2024

Conversation

mojomex
Copy link
Collaborator

@mojomex mojomex commented Mar 21, 2024

PR Type

  • Improvement

Related Links

TIER IV INTERNAL LINK -- corresponding issue
TIER IV INTERNAL LINK -- detailed article

This PR depends on the following PR being merged first:

This PR fixes the following issues (for Hesai only):

Description

This PR combines the three ROS wrapper nodes (Driver Wrapper, HW Interface Wrapper, HW Monitor Wrapper) into one.
Doing this has multiple maintainability, reliability and performance benefits:

Before This PR
Parameters and state have been duplicated across nodes, with the potential for inconsistencies All parameters are handled in one place, the current configuration is one object pointed to by all modules
Multiple concurrent TCP connections have been made to the sensor, increasing the risk of race conditions such as the one that motivated #126 The HW Interface ensures that only one TCP connection is made. There is only one HW interface instance which is pointed to by all modules
Received UDP packets were aggregated, and then pub/subbed between wrapper nodes, introducing unnecessary latency UDP packets are queued and processed 1-by-1

This PR only addresses Hesai. Once this PR has been accepted, others will follow for the other vendors.

Changes Made

  • transport_drivers
    • removed const qualifier from returned byte vector in asyncReceive to allow std::moveing packet data
  • HesaiCalibrationConfiguration and HesaiCorrection
    • made both inherit from a new HesaiCalibrationConfigurationBase type providing a unified interface for parsing/loading/storing
  • HesaiDecoder
    • removed dependencies on ROS message types, changed to parsing from std::vector<uint8_t> instead
    • initialize last_phase correctly as otherwise, sometimes an empty pointcloud is published on startup
  • HesaiDriver
    • changed constructor accept only one HesaiCalibrationConfigurationBase instead of HesaiCalibrationConfiguration + HesaiCorrection as the two are used mutually exclusively. The passed-in pointer is type-cast to the correct subtype
  • nebula_ros/hesai
    • created one single node class HesaiRosWrapper containing a HesaiDecoderWrapper, HesaiHwInterfaceWrapper and HesaiHwMonitorWrapper which are not nodes
    • each of the sub-wrappers manages functionality specific to its contained module:
      • HesaiDecoderWrapper receives raw packet data from HesaiRosWrapper, decodes and publishes it. it also uses the HW interface provided by HesaiHwInterfaceWrapper to get calibration data
      • HesaiHwInterfaceWrapper manages the one and only HW interface, and all sensor configuration getting/setting
      • HesaiHwMonitorWrapper uses the HW interface provided by HesaiHwInterfaceWrapper to make diagnostics requests and published diagnostic data
      • HesaiRosWrapper registers a callback for incoming UDP packets with the HW interface, maintains a thread-safe packet queue, and calls the decoder in a second thread when packets are in the queue
    • made each of the sub-wrappers define and get its own parameters. HesaiSensorConfiguration still exists and is populated in HesaiDecoderWrapper
  • mt_queue
    • created a thread-safe queue for producer-consumer problems like the HW interface receiving UDP packets in one thread while decoder is decoding them in another
    • added blocking functions pop() and push(), as well as a non-blocking try_push()
    • added a capacity argument, after which mt_queue::push() blocks and mt_queue::try_push() returns instantly
  • hesai_launch_all_hw.xml
    • removed delay_hw_ms and delay_monitor_ms as their communication is blocking and sequential now
    • refactored to contain only one node
  • hesai_launch_component.launch.xml
    • removed as only one node exists now
  • nebula_launch.py
    • made the file throw an error if used with Hesai. hesai_launch_all_hw.xml should be used instead
  • HesaiDecoderWrapper
    • made calibration fetching code cleaner and de-duplicated AT128/non-AT128 sections
    • removed dependency on HW interface when launch_hw:=false, so HW interface is not instantiated at all in that mode

Other changes:

  • Made all occurrences of HesaiSensorConfiguration const, swapping pointers to update the config if necessary.
  • Reduced boilerplate for parameter definitions
  • Split parameter definitions across sub-wrappers as necessary

Review Procedure

I structured the changelog above roughly in the order the files appear in Files changed on GitHub for easier back-referencing.

The sensor driver should behave as it did before, so testing sensor setup and pointcloud output with real sensors or PCAPs is advised. Also, testing with ros2 bag record /pandar_packets and ros2 bag play should be performed.

Pre-Review Checklist for the PR Author

PR Author should check the checkboxes below when creating the PR.

  • Assign PR to reviewer

Checklist for the PR Reviewer

Reviewers should check the checkboxes below before approval.

  • Commits are properly organized and messages are according to the guideline
  • (Optional) Unit tests have been written for new behavior
  • PR title describes the changes

Post-Review Checklist for the PR Author

PR Author should check the checkboxes below before merging.

  • All open points are addressed and tracked via issues or tickets

CI Checks

  • Build and test for PR: Required to pass before the merge.

@mojomex mojomex force-pushed the single-node-refactor branch from e27acbf to dd1bbdc Compare March 21, 2024 08:49
@codecov-commenter
Copy link

codecov-commenter commented Mar 21, 2024

Codecov Report

Attention: Patch coverage is 2.93040% with 795 lines in your changes are missing coverage. Please review.

Please upload report for BASE (develop@9e1a52a). Learn more about missing BASE report.

Files Patch % Lines
nebula_ros/src/hesai/hw_monitor_wrapper.cpp 0.00% 209 Missing ⚠️
nebula_ros/src/hesai/hesai_ros_wrapper.cpp 0.00% 165 Missing ⚠️
nebula_ros/src/hesai/decoder_wrapper.cpp 0.00% 153 Missing ⚠️
.../nebula_hesai_hw_interfaces/hesai_hw_interface.cpp 0.00% 75 Missing ⚠️
...es/src/hesai/hesai_ros_offline_extract_bag_pcd.cpp 0.00% 58 Missing ⚠️
nebula_ros/src/hesai/hw_interface_wrapper.cpp 0.00% 42 Missing ⚠️
nebula_ros/include/nebula_ros/common/mt_queue.hpp 0.00% 20 Missing ⚠️
...ecoders/src/nebula_decoders_hesai/hesai_driver.cpp 11.11% 5 Missing and 11 partials ⚠️
...a_ros/include/nebula_ros/common/watchdog_timer.hpp 0.00% 13 Missing ⚠️
...amples/src/hesai/hesai_ros_offline_extract_pcd.cpp 0.00% 8 Missing ⚠️
... and 9 more

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop     #127   +/-   ##
==========================================
  Coverage           ?   11.73%           
==========================================
  Files              ?       85           
  Lines              ?     8882           
  Branches           ?     1190           
==========================================
  Hits               ?     1042           
  Misses             ?     6995           
  Partials           ?      845           
Flag Coverage Δ
differential 11.73% <2.93%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mojomex mojomex force-pushed the single-node-refactor branch from cb2f0c8 to fdbeb45 Compare April 4, 2024 10:35
@mojomex mojomex changed the title refactor: combine ROS wrappers into a single node refactor(hesai): combine ROS wrappers into a single node Apr 8, 2024
@mojomex mojomex force-pushed the single-node-refactor branch 2 times, most recently from 600bc84 to 7fd450b Compare April 12, 2024 06:04
@mojomex mojomex changed the title refactor(hesai): combine ROS wrappers into a single node refactor(hesai): combine Hesai ROS wrappers into a single node Apr 18, 2024
@mojomex mojomex self-assigned this Apr 18, 2024
@mojomex mojomex marked this pull request as ready for review April 18, 2024 12:07
@mojomex mojomex requested review from amc-nu and drwnz April 18, 2024 12:07
@mojomex
Copy link
Collaborator Author

mojomex commented Apr 19, 2024

🟢 Evaluation

🟢 Latency

This section evaluates latency of this PR (ref. 1bfba05) compared to Nebula's current main branch (ref. 8fe029e).
The below figure shows topic delays akin to ros2 topic delay but measured from within the pointcloud container with intra-process communication and zero-copy. Also, the 100 ms it takes for the sensor to rotate have been subtracted.
single.log is this PR, main.log is the main branch.

Topic_Delay

The same data in table form:

Version Topic count mean std min 50% 90% 99% 99.9% max
single.log /pandar_points 1857 8.465221 1.388225 3.439506 9.182622 9.437252 9.783906 10.524514 10.561508
single.log /pandar_packets 1704 0.132777 0.076802 0.000069 0.125594 0.236806 0.341254 0.412680 0.421062
main.log /pandar_points 1762 14.752899 4.151622 11.655384 12.567231 22.301004 24.662492 27.849439 28.005953
main.log /pandar_packets 1590 0.168928 0.121366 0.000032 0.158946 0.297627 0.448383 1.536159 2.099693

The PR has a speedup of 75% (avg-avg) and 154% (max-max) for /pandar_points compared to main.
The speedup for /pandar_packets is 31% (avg-avg) and 425% (max-max).

🟢 TCP Communication

Because the same HWInterface is shared by the decoder, monitor and the HW interface itself, communication is done in one TCP stream.
The order of initialization has changed from decoder -> HW interface -> monitor to HW interface -> monitor -> decoder, so the order of requests (0x05 - get calibration data) has also changed. The rest of the communication stays the same.

Main This PR
image image

🟢 Error Logging

🟢 This PR includes #131, so parsing errors and TCP errors are printed or result in exceptions thrown with pretty-printed error messages.
🟢 Logging during the acquisition of calibration data changed slightly:

launch_hw

PTC command succeeded

Saved file from sensor available

Log Messages

true

✔️

n/a

[INFO] Downloaded calibration data from sensor. 
[INFO] Saved downloaded data to [...]_from_sensor.csv
[INFO] Using calibration data from [...]_from_sensor.csv

true

✔️

[ERROR] Could not download calibration data: [...]
[INFO] Using calibration data from [...]_from_sensor.csv

true

[ERROR] Could not download calibration data: [...]
[ERROR] No downloaded calibration data found.
[WARN] Falling back to generic calibration file.
[INFO] Using calibration data from [...].csv

false

n/a

✔️

[INFO] Using calibration data from [...]_from_sensor.csv

false

n/a

[ERROR] No downloaded calibration data found.
[WARN] Falling back to generic calibration file.
[INFO] Using calibration data from [...].csv

🟢 Warnings about pointclouds not being decoded are currently not printed are now (
37bc575
~) printed:

[WARN 1713859054.393181226] [nebula_hesai.HesaiDecoder]: Missed pointcloud output deadline

This is implemented using a watchdog timer that checks every 100ms if the last pointcloud has been decoded <= 100ms ago.

🟢 Parameter Handling

🟢 All parameters are correctly handled on startup. Parameters specific to a sub-wrapper are only defined and accessed if that sub-wrapper is instantiated (e.g. setup_sensor will not be queried if launch_hw:=false).

🟢 Updating parameters via ros2 param set and other runtime methods is not supported yet now (
5bba66d
~) supported. All config parameters that were writable before are correctly handled by this PR too.

When changing calibration/correction file path during runtime, it will override the configuration downloaded from the sensor, even when the sensor is connected.

@mojomex mojomex changed the title refactor(hesai): combine Hesai ROS wrappers into a single node refactor(hesai)!: combine Hesai ROS wrappers into a single node May 10, 2024
@drwnz drwnz changed the base branch from main to develop May 17, 2024 04:02
@mojomex mojomex force-pushed the single-node-refactor branch from 67225bd to 7a27587 Compare May 21, 2024 11:53
Copy link
Collaborator

@drwnz drwnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the new files, please run pre-commit to limit the syntax discrepancies (for example, using braces on newlines for if statements).

All the other comments are pedantic/can be considered notes for future development (I don't think the larger ones need addressing in this PR - will leave to your discretion).

nebula_ros/src/hesai/decoder_wrapper.cpp Outdated Show resolved Hide resolved
nebula_ros/src/hesai/hw_monitor_wrapper.cpp Outdated Show resolved Hide resolved
nebula_ros/src/hesai/decoder_wrapper.cpp Outdated Show resolved Hide resolved
nebula_tests/hesai/hesai_ros_decoder_test.cpp Outdated Show resolved Hide resolved
nebula_ros/src/hesai/hw_monitor_wrapper.cpp Outdated Show resolved Hide resolved
@mojomex mojomex force-pushed the single-node-refactor branch from 65be49b to 28b42fc Compare May 22, 2024 05:44
mojomex and others added 20 commits May 22, 2024 15:42
@mojomex mojomex force-pushed the single-node-refactor branch from 1f00fa6 to 18f193a Compare May 22, 2024 06:43
@mojomex mojomex requested a review from drwnz May 22, 2024 07:38
.cspell.json Show resolved Hide resolved
@drwnz drwnz merged commit 384d3c9 into tier4:develop May 22, 2024
6 checks passed
@mojomex mojomex deleted the single-node-refactor branch May 23, 2024 01:05
@mojomex mojomex mentioned this pull request Jul 18, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants