Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: introduce the advanced publisher and subscriber #368

Merged
merged 13 commits into from
Feb 18, 2025

Conversation

YuanYuYuan
Copy link
Contributor

With this PR, we replace the QueryingSubscriber and PublicationCache (+ a normal publisher)
with the AdvancedSubscriber and AdvancedPublisher.

NOTE: A deadlock is found when undeclaring advanced subscribers. It will be fixed once eclipse-zenoh/zenoh#1685 is merged and synced with zenoh-c.

@clalancette clalancette marked this pull request as draft January 2, 2025 16:09
@YuanYuYuan YuanYuYuan force-pushed the feat/advanced-pub-sub branch from e0c490f to 1f9357f Compare January 7, 2025 07:57
@YuanYuYuan
Copy link
Contributor Author

Update: The deadlock fix has been merged. Mark this PR as ready for review.

@YuanYuYuan YuanYuYuan marked this pull request as ready for review January 7, 2025 07:58
@YuanYuYuan
Copy link
Contributor Author

YuanYuYuan commented Jan 8, 2025

Test Failure

ros2.repos

  • rclcpp_action
    • 1 - test_client
  • rclc_parameter
    • 1 - rclc_parameter_test
  • test_rclcpp.log
    • 74 - test_n_nodes__rmw_zenoh_cpp (Failed) <--- Flaky

No new failure is found.

@YuanYuYuan
Copy link
Contributor Author

Update: bump up zenoh-cpp version to include the memory leak fix eclipse-zenoh/zenoh-cpp#363.

@YuanYuYuan
Copy link
Contributor Author

Will rebase once #405 is merged.

@YuanYuYuan YuanYuYuan force-pushed the feat/advanced-pub-sub branch from 08b0906 to b76ee4c Compare January 11, 2025 17:05
Copy link
Member

@Yadunund Yadunund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see feedback inline.
I have the following additional questions

  1. It looks like this PR does not fix [Test Failure] rclcpp_action / test_client #326. But from my experience, it changes the behavior from deterministically failing to failing sometimes. ie, the test is now flaky. Do you observe the same?
  2. With AdvancedSubscribers do we no longer need to call Get() when we detect an AdvancedPublisher with publication cache? This was needed to fix Subscribers sometimes not getting messages from topics using durability transient_local #263.

@YuanYuYuan YuanYuYuan force-pushed the feat/advanced-pub-sub branch from b76ee4c to 9cc776b Compare January 13, 2025 08:28
@YuanYuYuan
Copy link
Contributor Author

YuanYuYuan commented Jan 13, 2025

Please see feedback inline. I have the following additional questions

1. It looks like this PR does not fix [[Test Failure] rclcpp_action / test_client #326](https://github.com/ros2/rmw_zenoh/issues/326). But from my experience, it changes the behavior from deterministically failing to failing sometimes. ie, the test is now flaky. Do you observe the same?

2. With `AdvancedSubscribers` do we no longer need to call `Get()`  when we detect an `AdvancedPublisher` with publication cache? This was needed to fix [Subscribers sometimes not getting messages from topics using durability transient_local #263](https://github.com/ros2/rmw_zenoh/issues/263).

Here’s an improved version of your paragraph with clearer phrasing and a more professional tone:

  1. This PR does not resolve the issue reported in [Test Failure] rclcpp_action / test_client #326. Based on the investigation, the root cause lies in the inappropriate synchronization assumptions in the test. We have also observed similar failures (slightly more often) when using the previous QueryingSubscriber implementation. To address this issue, we can either make the local publisher-to-subscriber action a blocking call or modify the test itself.

  2. We expect the issue will be resolved. So no longer need the hack 😄

@YuanYuYuan
Copy link
Contributor Author

YuanYuYuan commented Jan 13, 2025

Redid the test with the repos. No new failures were found.

  • rclcpp_action/test_client
  • rclc_parameter/rclc_parameter_test

@YuanYuYuan YuanYuYuan force-pushed the feat/advanced-pub-sub branch from 9cc776b to 9f72ee8 Compare January 22, 2025 08:37
@YuanYuYuan
Copy link
Contributor Author

YuanYuYuan commented Jan 22, 2025

Now we set the advanced pub/sub to map the ROS2 QoS according to the following table.

  • AdvPub
Option ROS2 QoS
publisher_detection Durability::TransientLocal
cache Durability::TransientLocal or Reliable
cache->max_samples Depth
sample_miss_detection Reliable
sample_miss_detection->heartbeat Reliable
Z_RELIABILITY_RELIABLE Reliable
Z_CONGESTION_CONTROL_BLOCK Reliable & History::KeepALL
  • AdvSub
Option ROS2 QoS
subscriber_detection Durability::TransientLocal
sample_miss_detection Durability::TransientLocal
history->detect_late_publishers Durability::TransientLocal
history->max_samples Depth
recovery Reliable
recovery->last_sample_miss_detection = heartbeat Reliable

Tested against Yadunund/ros2@87d1477. No regression was found.

@Yadunund
Copy link
Member

We discussed this today and we agreed to hold merging this PR until we test the changes here on more complex applications to gauge whether the constants adopted are appropriate.

@Yadunund Yadunund self-assigned this Feb 6, 2025
@Yadunund Yadunund marked this pull request as draft February 6, 2025 21:28
@Yadunund
Copy link
Member

Yadunund commented Feb 6, 2025

We discussed this again today and agreed to convert this to a draft PR until eclipse-zenoh/zenoh#1753 is merged and zenoh version here is updated. Along with this change, we will also be disabling recovery and sample missed options as it does not fit the typical ROS use case.

@YuanYuYuan YuanYuYuan force-pushed the feat/advanced-pub-sub branch from c199a39 to 60042bf Compare February 7, 2025 07:40
@YuanYuYuan
Copy link
Contributor Author

The recovery option has been removed in 0e4b99d

@JEnoch
Copy link
Contributor

JEnoch commented Feb 12, 2025

I created #457 to address the possible usage of recovery and sample_miss_detection options.
Those options are only required to benefit end-to-end reliability.
They are not required to implement TRANSIENT_LOCAL QoS which is the goal of this PR, as a replacement of the deprecated QueryingSubscriber and Publication Cache.

As soon as #456 (that includes eclipse-zenoh/zenoh#1753) is merged, I think this PR is good to go.

Copy link
Member

@Yadunund Yadunund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested with latest zenoh bump and I can confirm that rclcpp_action/test_client passes 🎉

However, I'm concerned with one change which i've flagged below.

@YuanYuYuan YuanYuYuan force-pushed the feat/advanced-pub-sub branch from ab8a7aa to 6068096 Compare February 17, 2025 07:04
@Yadunund Yadunund force-pushed the feat/advanced-pub-sub branch from 15d522a to b1b2dd8 Compare February 18, 2025 19:25
Copy link
Member

@Yadunund Yadunund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for iterating on this PR to find the right fix! We can finally close out #326

@Yadunund Yadunund merged commit 77561d8 into ros2:rolling Feb 18, 2025
5 checks passed
@Yadunund
Copy link
Member

@Mergifyio backport jazzy humble

Copy link

mergify bot commented Feb 18, 2025

backport jazzy humble

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Feb 18, 2025
* feat: introduce the advanced publisher/subscriber

* chore: ament_uncrustify and ament_cpplint

* trigger CI

* refactor: remove the unneeded querying callbacks

* fix: reorder the advanced pub/sub options to match the ROS2 QoS

* chore: add tmp

* chore: apply the QoS mapping

* style: ament_cpplint + ament_uncrustify

* trigger CI

* refactor: remove the recovery option as it exceeds typical ROS use cases

* chore: remove the QoS for end-to-end reliability

* style: ament_uncrustify

* Cleanup

Signed-off-by: Yadunund <[email protected]>

---------

Signed-off-by: Yadunund <[email protected]>
Co-authored-by: Yadunund <[email protected]>
(cherry picked from commit 77561d8)

# Conflicts:
#	rmw_zenoh_cpp/src/detail/rmw_publisher_data.cpp
mergify bot pushed a commit that referenced this pull request Feb 18, 2025
* feat: introduce the advanced publisher/subscriber

* chore: ament_uncrustify and ament_cpplint

* trigger CI

* refactor: remove the unneeded querying callbacks

* fix: reorder the advanced pub/sub options to match the ROS2 QoS

* chore: add tmp

* chore: apply the QoS mapping

* style: ament_cpplint + ament_uncrustify

* trigger CI

* refactor: remove the recovery option as it exceeds typical ROS use cases

* chore: remove the QoS for end-to-end reliability

* style: ament_uncrustify

* Cleanup

Signed-off-by: Yadunund <[email protected]>

---------

Signed-off-by: Yadunund <[email protected]>
Co-authored-by: Yadunund <[email protected]>
(cherry picked from commit 77561d8)

# Conflicts:
#	rmw_zenoh_cpp/src/detail/rmw_publisher_data.cpp
Yadunund pushed a commit that referenced this pull request Feb 19, 2025
…#469)

* feat: introduce the advanced publisher and subscriber (#368)

* feat: introduce the advanced publisher/subscriber

* chore: ament_uncrustify and ament_cpplint

* trigger CI

* refactor: remove the unneeded querying callbacks

* fix: reorder the advanced pub/sub options to match the ROS2 QoS

* chore: add tmp

* chore: apply the QoS mapping

* style: ament_cpplint + ament_uncrustify

* trigger CI

* refactor: remove the recovery option as it exceeds typical ROS use cases

* chore: remove the QoS for end-to-end reliability

* style: ament_uncrustify

* Cleanup

Signed-off-by: Yadunund <[email protected]>

---------

Signed-off-by: Yadunund <[email protected]>
Co-authored-by: Yadunund <[email protected]>
(cherry picked from commit 77561d8)

# Conflicts:
#	rmw_zenoh_cpp/src/detail/rmw_publisher_data.cpp

* Fixed merge

Signed-off-by: Alejandro Hernandez Cordero <[email protected]>

---------

Signed-off-by: Alejandro Hernandez Cordero <[email protected]>
Co-authored-by: Yuyuan Yuan <[email protected]>
Co-authored-by: Alejandro Hernandez Cordero <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants