Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Joint Trajectory Controller misses subscriber callback #1414

Open
my-rice opened this issue Dec 9, 2024 · 3 comments
Open

Joint Trajectory Controller misses subscriber callback #1414

my-rice opened this issue Dec 9, 2024 · 3 comments
Labels

Comments

@my-rice
Copy link

my-rice commented Dec 9, 2024

I tested the Joint Trajectory Controller (JTC) on the UR10 robot in simulation and encountered a rare bug with the controller subscriber (joint_command_subscriber_) associated with the \joint_trajectory topic. Occasionally, the callback bound to the subscriber is not triggered, despite there being no apparent reason for this behavior.
After further testing, I confirmed that the issue lies with the callback not being called. I suspect the ROS 2 middleware might not be triggering the callback for unknown reasons. Since all nodes are running on the same machine, I don't believe the issue is related to packet loss or network problems.
This bug is difficult to replicate as it occurs very infrequently. Do you have any insights into why this might be happening? Could it be related to a configuration issue in the controller manager that’s causing this bug?

To reproduce, load and activate the Joint Trajectory Controller on any simulated robot. Then, in another terminal, publish a valid message on the \joint_trajectory topic. After many successful executions, you may notice that occasionally a message will be lost for unknown reasons.

My environment is as follows:

OS: Ubuntu 22.04
ROS 2 Version: Humble

@my-rice my-rice added the bug label Dec 9, 2024
@christophfroehlich
Copy link
Contributor

Please add which RMW you are using.

If we can't reproduce it, this is very hard to debug.
JTC subscriber has SystemDefaultsQoS. You could try to change the QoS settings and see if any changes improve your behavior.

@my-rice
Copy link
Author

my-rice commented Dec 10, 2024

The RMW used for testing is the default one, rmw_fastrtps_cpp.

Based on your response, I conducted additional tests.
First, I changed the reliability QoS setting of the subscriber from BEST_EFFORT (the default value in SystemDefaultsQoS) to RELIABLE. In this case, I observed the same bug described in the issue, which I found quite strange.
Next, I ran another set of tests where I changed the middleware from rmw_fastrtps_cpp to rmw_cyclonedds_cpp, suspecting that the problem might be related to the middleware. With the new middleware, I was unable to reproduce the bug, even after extensive testing (I kept the reliability setting at RELIABLE). Additionally, I have a side project with the same issue, and switching to rmw_cyclonedds_cpp resolved the problem there as well.

At this point, I believe the issue is related to the rmw_fastrtps_cpp middleware. A similar problem with rmw_fastrtps_cpp has already been reported here, and the issue is still open. However, in their report, they mention 0% packet loss with the QoS option KEEP_ALL. I tried this option (along with RELIABLE and the rmw_fastrtps_cpp middleware) in my side project, but the problem persists.

In conclusion, rmw_cyclonedds_cpp seems to resolve the issue, but I cannot explain the behavior with rmw_fastrtps_cpp. If anyone has any insights into why this might be happening or potential causes, I would be happy to hear them.

@firesurfer
Copy link
Contributor

Even though not declared so upstream rmw_fastrtps_cpp seems to be rather difficult to configure properly - which then results into the perception that it is not reliable. This starts at discovery and ends at issues with the lost messages.

We made the experience in our system that with an increasing number of participants and topics it seemed to perform worse. For simple demo setups I never ran into any issues with it. With cyclone we never had issues also with an increasing amount of endpoints.
Some of the bigger ros projects such as the nav2 stack are also recommending the use of the cyclonedds.

If you want or need to stay with fastrtps you might want to try working on Jazzy instead of Humble. I can imagine that there are some patches which might improve the behavior.

You might also want to play around with the configuration a bit: https://github.com/ros2/rmw_fastrtps?tab=readme-ov-file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants