Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Restarting nodes causes inconsistent service states #326

Open
eliasdc opened this issue Nov 6, 2024 · 1 comment
Open

[Bug] Restarting nodes causes inconsistent service states #326

eliasdc opened this issue Nov 6, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@eliasdc
Copy link

eliasdc commented Nov 6, 2024

Describe the bug

When killing (SIGKILL) and restarting a node (or launch file with multiple nodes) quickly the Zenoh queryable can be removed from the newly discovered services. I noticed this by inspecting the zenoh admin space of the zenoh-bridge-ros2dds plugin from the router and client. We connect a zenoh-bridge-ros2dds in client mode to a zenoh-bridge-ros2dds in router mode to use ROS2 over WiFi.

When dynamically killing and spawning nodes their services sometimes fails to be picked up. What happens is the ROS2 srv and cli are being picked up and visible in the adminspace. But the queryable is initially spawned and than destroyed after 10s (from when the previous process was killed). I ivestigated where this 10s was coming from but can't seem to find a parameter to adjust it (tested queries_default_timeout and link lease without success).

I would expect the service to work even though a previously killed process had the exact same service server.

To reproduce

To reproduce we can show this just with a single router and a single node from the demo_nodes_cpp package.

Terminal 1:

ros2 run zenoh_bridge_ros2dds zenoh_bridge_ros2dds -l tcp/localhost:7447 --no-multicast-scouting --rest-http-port 8000

Terminal 2:

ros2 run demo_nodes_cpp add_two_ints_server; ros2 run demo_nodes_cpp add_two_ints_server

I use twice the same command with a ; to run the same command again after the first one is killed with SIGKILL.

Terminal 3:

pkill -SIGKILL -f add_two_ints_server

Result

Open http://localhost:8000/@/**/add_two_ints to monitor if the server and queryable is shown, but in this case the service is being removed after the new node has already started.

Logs

Logs only shown from start of demo node and redacted to only show relevant services and logs

2024-11-06T13:04:07.398437Z DEBUG ThreadId(22) zenoh_plugin_ros2dds::dds_discovery: Discovered DDS Participant 0110fc28cf444780c3d7adb9000001c1)
2024-11-06T13:04:07.400068Z DEBUG ThreadId(22) zenoh_plugin_ros2dds::dds_discovery: Discovered DDS publication 0110fc28cf444780c3d7adb900000403 from Participant 0110fc28cf444780c3d7adb9000001c1 on ros_discovery_info with type rmw_dds_common::msg::dds_::ParticipantEntitiesInfo_ (keyless: true)
2024-11-06T13:04:07.400174Z DEBUG ThreadId(22) zenoh_plugin_ros2dds::dds_discovery: Discovered DDS subscription 0110fc28cf444780c3d7adb900000504 from Participant 0110fc28cf444780c3d7adb9000001c1 on ros_discovery_info with type rmw_dds_common::msg::dds_::ParticipantEntitiesInfo_ (keyless: true)
2024-11-06T13:04:07.400433Z DEBUG ThreadId(22) zenoh_plugin_ros2dds::dds_discovery: Discovered DDS publication 0110fc28cf444780c3d7adb900001503 from Participant 0110fc28cf444780c3d7adb9000001c1 on rr/add_two_intsReply with type example_interfaces::srv::dds_::AddTwoInts_Response_ (keyless: true)
2024-11-06T13:04:07.400446Z DEBUG ThreadId(22) zenoh_plugin_ros2dds::dds_discovery: Discovered DDS subscription 0110fc28cf444780c3d7adb900001604 from Participant 0110fc28cf444780c3d7adb9000001c1 on rq/add_two_intsRequest with type example_interfaces::srv::dds_::AddTwoInts_Request_ (keyless: true)
2024-11-06T13:04:07.497810Z DEBUG tokio-runtime-worker ThreadId(14) zenoh_plugin_ros2dds::discovery_mgr: Received ros_discovery_info from participant 0110fc28cf444780c3d7adb9000001c1 with nodes: [/add_two_ints_server]
2024-11-06T13:04:07.497843Z  INFO tokio-runtime-worker ThreadId(14) zenoh_plugin_ros2dds::discovered_entities: Discovered ROS Node /add_two_ints_server
2024-11-06T13:04:07.497906Z DEBUG tokio-runtime-worker ThreadId(14) zenoh_plugin_ros2dds::discovered_entities: ROS Node /add_two_ints_server declares a new Writer on rr/add_two_intsReply
2024-11-06T13:04:07.498231Z  INFO tokio-runtime-worker ThreadId(14) zenoh_plugin_ros2dds: Node /add_two_ints_server declares Service Server /add_two_ints: example_interfaces/srv/AddTwoInts - Allowed
2024-11-06T13:04:07.498234Z DEBUG tokio-runtime-worker ThreadId(14) zenoh_plugin_ros2dds::route_service_srv: Route Service Server (ROS:/add_two_ints <-> Zenoh:add_two_ints): creation with type example_interfaces/srv/AddTwoInts
2024-11-06T13:04:07.498417Z DEBUG tokio-runtime-worker ThreadId(14) zenoh_plugin_ros2dds::route_service_srv: Route Service Server (ROS:/add_two_ints <-> Zenoh:add_two_ints): (local client_guid=10ec6480a573dd75)  id='1.10.b5.29.cb.7a.60.83.28.b7.15.f6.0.0.0.0' => USER_DATA=[99, 108, 105, 101, 110, 116, 105, 100, 61, 32, 49, 46, 49, 48, 46, 98, 53, 46, 50, 57, 46, 99, 98, 46, 55, 97, 46, 54, 48, 46, 56, 51, 46, 50, 56, 46, 98, 55, 46, 49, 53, 46, 102, 54, 46, 48, 46, 48, 46, 48, 46, 48, 59]
2024-11-06T13:04:07.498468Z  INFO tokio-runtime-worker ThreadId(14) zenoh_plugin_ros2dds::routes_mgr: Route Service Server (ROS:/add_two_ints <-> Zenoh:add_two_ints) created
2024-11-06T13:04:07.498474Z DEBUG tokio-runtime-worker ThreadId(14) zenoh_plugin_ros2dds::route_service_srv: Route Service Server (ROS:/add_two_ints <-> Zenoh:add_two_ints) now serving local nodes {"/add_two_ints_server"}
2024-11-06T13:04:07.498480Z DEBUG tokio-runtime-worker ThreadId(14) zenoh::net::routing::dispatcher::resource: Register resource add_two_ints
2024-11-06T13:04:07.498491Z DEBUG tokio-runtime-worker ThreadId(14) zenoh::net::routing::dispatcher::queries: Face{2, 616c459718f2192e62347ee552a02162} Declare queryable 19 (add_two_ints)
2024-11-06T13:04:07.498505Z DEBUG tokio-runtime-worker ThreadId(14) zenoh_plugin_ros2dds::route_service_srv: Route Service Server (ROS:/add_two_ints <-> Zenoh:add_two_ints) announce via token @/616c459718f2192e62347ee552a02162/@ros2_lv/SS/add_two_ints/example_interfaces§srv§AddTwoInts
2024-11-06T13:04:07.498507Z DEBUG tokio-runtime-worker ThreadId(14) zenoh::net::routing::dispatcher::token: Face{2, 616c459718f2192e62347ee552a02162} Declare token 20 (@/616c459718f2192e62347ee552a02162/@ros2_lv/SS/add_two_ints/example_interfaces§srv§AddTwoInts)
2024-11-06T13:04:07.498513Z DEBUG tokio-runtime-worker ThreadId(14) zenoh::net::routing::dispatcher::resource: Register resource @/616c459718f2192e62347ee552a02162/@ros2_lv/SS/add_two_ints/example_interfaces§srv§AddTwoInts
2024-11-06T13:04:07.502348Z DEBUG tokio-runtime-worker ThreadId(14) zenoh_plugin_ros2dds::ros_discovery: Publish update on 'ros_discovery_info' with 8 writers and 7 readers

kill using SIGKILL, killing the process and automatically restarting the node in terminal 2.

2024-11-06T13:04:32.828578Z DEBUG ThreadId(22) zenoh_plugin_ros2dds::dds_discovery: Discovered DDS Participant 0110be83cde1f5d46b537c0f000001c1)
2024-11-06T13:04:32.830165Z DEBUG ThreadId(22) zenoh_plugin_ros2dds::dds_discovery: Discovered DDS publication 0110be83cde1f5d46b537c0f00000403 from Participant 0110be83cde1f5d46b537c0f000001c1 on ros_discovery_info with type rmw_dds_common::msg::dds_::ParticipantEntitiesInfo_ (keyless: true)
2024-11-06T13:04:32.830290Z DEBUG ThreadId(22) zenoh_plugin_ros2dds::dds_discovery: Discovered DDS subscription 0110be83cde1f5d46b537c0f00000504 from Participant 0110be83cde1f5d46b537c0f000001c1 on ros_discovery_info with type rmw_dds_common::msg::dds_::ParticipantEntitiesInfo_ (keyless: true)
2024-11-06T13:04:32.830925Z DEBUG ThreadId(22) zenoh_plugin_ros2dds::dds_discovery: Discovered DDS publication 0110be83cde1f5d46b537c0f00001503 from Participant 0110be83cde1f5d46b537c0f000001c1 on rr/add_two_intsReply with type example_interfaces::srv::dds_::AddTwoInts_Response_ (keyless: true)
2024-11-06T13:04:32.830937Z DEBUG ThreadId(22) zenoh_plugin_ros2dds::dds_discovery: Discovered DDS subscription 0110be83cde1f5d46b537c0f00001604 from Participant 0110be83cde1f5d46b537c0f000001c1 on rq/add_two_intsRequest with type example_interfaces::srv::dds_::AddTwoInts_Request_ (keyless: true)
2024-11-06T13:04:32.898372Z DEBUG tokio-runtime-worker ThreadId(10) zenoh_plugin_ros2dds::discovery_mgr: Received ros_discovery_info from participant 0110be83cde1f5d46b537c0f000001c1 with nodes: [/add_two_ints_server]
2024-11-06T13:04:32.898384Z  INFO tokio-runtime-worker ThreadId(10) zenoh_plugin_ros2dds::discovered_entities: Discovered ROS Node /add_two_ints_server
2024-11-06T13:04:32.898419Z DEBUG tokio-runtime-worker ThreadId(10) zenoh_plugin_ros2dds::discovered_entities: ROS Node /add_two_ints_server declares a new Writer on rr/add_two_intsReply
2024-11-06T13:04:32.898566Z  INFO tokio-runtime-worker ThreadId(10) zenoh_plugin_ros2dds: Node /add_two_ints_server declares Service Server /add_two_ints: example_interfaces/srv/AddTwoInts - Allowed
2024-11-06T13:04:32.898567Z DEBUG tokio-runtime-worker ThreadId(10) zenoh_plugin_ros2dds::route_service_srv: Route Service Server (ROS:/add_two_ints <-> Zenoh:add_two_ints) now serving local nodes {"/add_two_ints_server"}
2024-11-06T13:04:32.898569Z DEBUG tokio-runtime-worker ThreadId(10) zenoh::net::routing::dispatcher::queries: Face{2, 616c459718f2192e62347ee552a02162} Declare queryable 43 (add_two_ints)
2024-11-06T13:04:32.898573Z DEBUG tokio-runtime-worker ThreadId(10) zenoh::net::routing::dispatcher::queries: Face{2, 616c459718f2192e62347ee552a02162} Undeclare queryable 19 (add_two_ints)
2024-11-06T13:04:32.898577Z DEBUG tokio-runtime-worker ThreadId(10) zenoh_plugin_ros2dds::route_service_srv: Route Service Server (ROS:/add_two_ints <-> Zenoh:add_two_ints) announce via token @/616c459718f2192e62347ee552a02162/@ros2_lv/SS/add_two_ints/example_interfaces§srv§AddTwoInts
2024-11-06T13:04:32.898581Z DEBUG tokio-runtime-worker ThreadId(10) zenoh::net::routing::dispatcher::token: Face{2, 616c459718f2192e62347ee552a02162} Declare token 44 (@/616c459718f2192e62347ee552a02162/@ros2_lv/SS/add_two_ints/example_interfaces§srv§AddTwoInts)

some delay before the next happens

2024-11-06T13:04:41.498829Z  INFO tokio-runtime-worker ThreadId(04) zenoh_plugin_ros2dds::discovered_entities: Undiscovered ROS Node /add_two_ints_server
2024-11-06T13:04:41.499103Z DEBUG tokio-runtime-worker ThreadId(04) zenoh::net::routing::dispatcher::pubsub: Undeclare subscription Face{2, 616c459718f2192e62347ee552a02162}
2024-11-06T13:04:41.505248Z  INFO tokio-runtime-worker ThreadId(04) zenoh_plugin_ros2dds: Node /add_two_ints_server undeclares Service Server /add_two_ints: example_interfaces/srv/AddTwoInts - Allowed
2024-11-06T13:04:41.505264Z DEBUG tokio-runtime-worker ThreadId(04) zenoh_plugin_ros2dds::route_service_srv: Route Service Server (ROS:/add_two_ints <-> Zenoh:add_two_ints) now serving local nodes {}
2024-11-06T13:04:41.505274Z DEBUG tokio-runtime-worker ThreadId(04) zenoh_plugin_ros2dds::route_service_srv: Route Service Server (ROS:/add_two_ints <-> Zenoh:add_two_ints) retire
2024-11-06T13:04:41.505303Z DEBUG tokio-runtime-worker ThreadId(04) zenoh::net::routing::dispatcher::queries: Face{2, 616c459718f2192e62347ee552a02162} Undeclare queryable 43 (add_two_ints)
2024-11-06T13:04:41.505338Z DEBUG tokio-runtime-worker ThreadId(04) zenoh::net::routing::dispatcher::token: Face{2, 616c459718f2192e62347ee552a02162} Undeclare token 44 (@/616c459718f2192e62347ee552a02162/@ros2_lv/SS/add_two_ints/example_interfaces§srv§AddTwoInts)
2024-11-06T13:04:41.505345Z  INFO tokio-runtime-worker ThreadId(04) zenoh_plugin_ros2dds::routes_mgr: Route Service Server (ROS:/add_two_ints <-> Zenoh:add_two_ints) removed
2024-11-06T13:04:41.598112Z DEBUG tokio-runtime-worker ThreadId(14) zenoh_plugin_ros2dds::ros_discovery: Publish update on 'ros_discovery_info' with 0 writers and 0 readers

repeat kill command from terminal 3 to kill last node.

2024-11-06T13:05:14.929353Z  INFO tokio-runtime-worker ThreadId(04) zenoh_plugin_ros2dds::discovered_entities: Undiscovered ROS Node /add_two_ints_server
2024-11-06T13:05:14.929778Z  INFO tokio-runtime-worker ThreadId(04) zenoh_plugin_ros2dds: Node /add_two_ints_server undeclares Service Server /add_two_ints: example_interfaces/srv/AddTwoInts - Allowed

System info

  • Platform: Ubuntu 22.04.5 LTS - 64bit
  • Zenoh version: tag 1.0.1 (from source)
  • ROS2 version: humble (binaries)
@eliasdc eliasdc added the bug Something isn't working label Nov 6, 2024
@JEnoch
Copy link
Member

JEnoch commented Dec 3, 2024

The root of the problem is that the bridge has not been designed to support several Nodes with the same name. Internally it uses maps with Node names as indexes.

In you scenario, when the 1st Node is killed, the 2nd Node starts, is discovered by the bridge and replaces the 1st Node in the internal maps. But as the 1st Node was not properly unregistering itself, its disappearance is detected later by the bridge (when CycloneDDS lease expires). And when it does, the bridge removes all about the Node with this name...

A workaround is to use different Node names. But I agree it's not realistic for a crashing Node which is automatically restarted.

The solution is to update the bridge to make it support several Nodes with the same name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants