Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ClientData thread-safe #293

Merged
merged 10 commits into from
Nov 12, 2024
Merged

Make ClientData thread-safe #293

merged 10 commits into from
Nov 12, 2024

Conversation

Yadunund
Copy link
Member

@Yadunund Yadunund commented Oct 9, 2024

Make access to ClientData thread-safe.

I've not fully tested this PR yet but I know colcon test --event-handlers console_direct+ --packages-select rclcpp_action --ctest-args -R test_server fails. It has to do with how i'm handling cases where the client is shutdown when there are still queries in flight.

Base automatically changed from yadu/raii-service to rolling October 10, 2024 21:32
Copy link
Contributor

@ahcorde ahcorde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix conflicts

@Yadunund
Copy link
Member Author

@ahcorde could you help check if the expected tests are passing here?

@ahcorde
Copy link
Contributor

ahcorde commented Oct 14, 2024

ros2.repos

rmw_zenoh/yadu/raii-service

ROS Package Failing test name Failing test output
rcl None None
rcl_action None None
rcl_lifecycle None None
rcl_yaml_param_parser None None
rclcpp test_intra_process_manager Failed (Check logs)
rclcpp test_node_interfaces__node_graph Failed (Check logs)
rclcpp test_publisher Failed (Check logs)
rclcpp test_events_executor Timeout (Check logs)
rclcpp_action None None
rclcpp_components None None
rclcpp_lifecycle None -
rmw_zenoh_cpp None None
test_cli None None
test_cli_remap test_cli_remapping TBD
rclpy None

@clalancette
Copy link
Collaborator

Because of #298, this now needs a rebase and conflicts resolved. Once that is done, I'll do another review pass over it.

@Yadunund
Copy link
Member Author

@clalancette i've resolved the conflicts!

rmw_zenoh_cpp/src/detail/graph_cache.cpp Show resolved Hide resolved
rmw_zenoh_cpp/src/rmw_zenoh.cpp Outdated Show resolved Hide resolved
type_support,
qos_profile))
{
// Error already handled.
return nullptr;
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a step below fails, do we need to remove this client from the node? In other words, do we need a rcpputils::scope_exit block here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nope! The shared_ptr we created within create_client_data will go out of scope and the destructor will clean everything up. The free_rmw_client block will take care of deallocating the rmw_client_t object.

rmw_zenoh_cpp/src/rmw_zenoh.cpp Outdated Show resolved Hide resolved
if (client_data == nullptr) {
RMW_SET_ERROR_MSG_WITH_FORMAT_STRING(
"Unable to retreive client_data from client for service %s", client->service_name);
return RMW_RET_INVALID_ARGUMENT;
}

std::string service_type = client_data->request_type_support->get_name();
size_t suffix_substring_position = service_type.find("Request_");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice that we are no longer checking whether the service_type has Request_ in it. Do we need to add that back in?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this check was moved inside ClientData::make() but there was typo there anyways which I have fixed 4c9b106. It would not have caused any problems since the substring we extract will be the same but now it's semantically correct.

rmw_zenoh_cpp/src/detail/rmw_client_data.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@clalancette clalancette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I like the new direction here. I have some specific concerns on some of the data handling inline.

rmw_zenoh_cpp/src/detail/rmw_client_data.cpp Outdated Show resolved Hide resolved
rmw_zenoh_cpp/src/detail/rmw_client_data.cpp Outdated Show resolved Hide resolved
rmw_zenoh_cpp/src/detail/rmw_client_data.cpp Outdated Show resolved Hide resolved
@Yadunund
Copy link
Member Author

Yadunund commented Nov 1, 2024

nav2 demo crashes when the following steps are performed

  1. ros2 launch nav2_bringup tb3_simulation_launch.py headless:=False
  2. Localize the robot and wait for startup to complete
  3. Click Reset from the nav2 panel in RViz
[component_container_isolated-8] malloc(): unsorted double linked list corrupted
[component_container_isolated-8] Magick: abort due to signal 6 (SIGABRT) "Abort"...
[ERROR] [component_container_isolated-8]: process has died [pid 2257204, exit code -6, cmd '/opt/ros/rolling/lib/rclcpp_components/component_container_isolated --ros-args --log-level info --ros-args -r __node:=nav2_container --params-file /tmp/launch_params_xwa4_atv --params-file /tmp/launch_params_3tgew0_b -r /tf:=tf -r /tf_static:=tf_static'].

@Yadunund
Copy link
Member Author

Yadunund commented Nov 1, 2024

It seems like 082e64c introduced this behavior as I can't reproduce the crash if I drop that commit.

Copy link
Contributor

@ahcorde ahcorde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conflicts

@clalancette
Copy link
Collaborator

conflicts

I've now rebased it onto the latest.

@clalancette
Copy link
Collaborator

Besides rebasing this, I also did a local test where I rebased this onto #307 , and also reverted 19b84e9 . But I am running into shutdown issues in client_data_handler where it is deadlocking, so I'll need to look into that further.

Yadunund and others added 10 commits November 12, 2024 18:59
Signed-off-by: Yadunund <[email protected]>
* Additional fixes for yadu/raii-client.

Signed-off-by: Chris Lalancette <[email protected]>

* Apply suggestions from code review

Co-authored-by: yadunund <[email protected]>
Signed-off-by: Chris Lalancette <[email protected]>

* Fixups.

Signed-off-by: Chris Lalancette <[email protected]>

---------

Signed-off-by: Chris Lalancette <[email protected]>
Co-authored-by: yadunund <[email protected]>
Signed-off-by: Yadunund <[email protected]>
This just makes it more consistent with the rest of the
APIs.

Signed-off-by: Chris Lalancette <[email protected]>
We basically use the pointer as a key to lookup the
shared_ptr, and then always use the shared_ptr.  That
ensures that even if the pointer was dropped from the
node while the client callback is operating, it will
still be valid and won't be destroyed until that method
returns.

Signed-off-by: Chris Lalancette <[email protected]>
@clalancette
Copy link
Collaborator

All right, I've just rebased this and also pushed another commit to vastly simplify our handling of in-flight data. I think this should work a lot more reliably now. With that, I'm going to approve and merge this. Thanks for the long slog on this one.

@clalancette clalancette merged commit 91edead into rolling Nov 12, 2024
8 checks passed
@clalancette clalancette deleted the yadu/raii-client branch November 12, 2024 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants