Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Session close deadlock #35

Open
yellowhatter opened this issue Oct 18, 2024 · 1 comment
Open

Session close deadlock #35

yellowhatter opened this issue Oct 18, 2024 · 1 comment
Assignees

Comments

@yellowhatter
Copy link

@fuzzypixelz : I finally understand the deadlock. It's between a TaskController::wait in zenoh (Rust) and a C++ std::recursive_mutex in rmw_zenoh.

  • some other peer shuts down
  • zenoh rust receives a token undeclaration from a peer
  • the callback reaches into rmw_zenoh while on the RX runtime
  • the subscriber tries to get the context lock but the rmw_shutdown function gets the lock first and then tries to close the session
  • the session close hangs because it's waiting on the RX task to complete but it can't because it's waiting on the lock we just took

@yellowhatter : Okay, let me take care of this
I think we need to solve it on Rust side
to make our core more safe in terms of user-dependent deadlocks like this

@YuanYuYuan
Copy link
Collaborator

To reproduce, compile the test_communication and run the test with the following.

colcon test --event-handlers console_cohesion+ console_direct+ --packages-select test_rclcpp --ctest-args -R test_n_nodes__rmw_zenoh_cpp

The deadlock happens upon acquiring the lock here

std::lock_guard<std::recursive_mutex> lock(data_ptr->mutex_);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants