Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds base support for services. #86

Merged
merged 24 commits into from
Jan 12, 2024

Conversation

francocipollone
Copy link
Collaborator

@francocipollone francocipollone commented Dec 15, 2023

Summary

  • Provides support for services
  • Graph cache update is not tackled in this PR.

Notes on the implementation

  • Queryables are being used for the service client interaction. (instead of topics as dds implementations)
  • Main rmw methods to implement from the server side
    • rmw_create_service
    • rmw_take_request
    • rmw_send_response
  • Main rmw methods to implement from the clilent side:
    • rmw_create_client
    • rmw_send_request
    • rmw_take_response

Pendings

  • z_query_value is failing to obtain the message when using z_owned_query approach, even replicated in simpler script (Notified zettascale folks) ❗(blocker)
  • Define headers (request/response) and sequence id information.
  • Verify waits added to the rmw_wait method.
  • Verify keyexpression for the service
  • Fix style
  • rmw_service_server_is_available method needs graph information.
  • Update graph cache

Signed-off-by: Franco Cipollone <[email protected]>
@francocipollone francocipollone force-pushed the francocipollone/service_support_via_queryable branch from 15b6ea4 to 81b4f70 Compare December 21, 2023 21:08
@francocipollone
Copy link
Collaborator Author

francocipollone commented Dec 21, 2023

Test it!

Terminal 1
ros2 run demo_nodes_cpp add_two_ints_server

Terminal 2
ros2 run demo_nodes_cpp add_two_ints_client

The client is calling the service with 2 and 3 so the expected result is 5

Output

[INFO] [1703192805.637515098] [add_two_ints_client]: Result of add_two_ints: 5

Issue (FIXED)

This is more or less the logging:

[INFO] [1703192805.637014300] [rmw_zenoh_cpp]: [client_data_handler] triggered for /add_two_ints
[INFO] [1703192805.637093573] [rmw_zenoh_cpp]: [client_data_handler] triggered for /add_two_ints
[ERROR] [1703192805.637109494] [rmw_zenoh_cpp]: z_reply_is_ok returned False
[INFO] [1703192805.637376320] [rmw_zenoh_cpp]: [rmw_take_response]

I added a logging message to get when the client_data_handler callback is triggered and for some reason, it is being called twice, with the second one containing a garbage reply. client_data_handler is the callback that is executed for a z_get called in the client side. Here is some explanation of the architecture so far

image

I added some comments in the code as I still have to verify if this is the correct workflow to be used with Zenoh queryable or if the use of Zenoh reply channels would be better in this case.

Copy link
Member

@Yadunund Yadunund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments before I do a deeper review!

rmw_zenoh_cpp/src/detail/rmw_data_types.hpp Outdated Show resolved Hide resolved
rmw_zenoh_cpp/src/detail/rmw_data_types.hpp Outdated Show resolved Hide resolved
rmw_zenoh_cpp/src/detail/rmw_data_types.hpp Outdated Show resolved Hide resolved
rmw_zenoh_cpp/src/detail/rmw_data_types.hpp Outdated Show resolved Hide resolved
rmw_zenoh_cpp/src/rmw_zenoh.cpp Outdated Show resolved Hide resolved
rmw_zenoh_cpp/src/rmw_zenoh.cpp Outdated Show resolved Hide resolved
rmw_zenoh_cpp/src/rmw_zenoh.cpp Show resolved Hide resolved
rmw_zenoh_cpp/src/rmw_zenoh.cpp Show resolved Hide resolved
rmw_zenoh_cpp/src/rmw_zenoh.cpp Show resolved Hide resolved
rmw_zenoh_cpp/src/rmw_zenoh.cpp Show resolved Hide resolved
Signed-off-by: Franco Cipollone <[email protected]>
Copy link
Member

@Yadunund Yadunund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Franco,

Here's my second round of review. Most of my suggestion are around how we parse and store the replies from zenoh. On the client side, instead of relying on a zenoh callback that retains ownership of the reply, I think we could adopt the pattern in the non-blocking z_get example and directly store the z_owned_reply_ts within rmw_client_data_t.

rmw_zenoh_cpp/src/detail/rmw_data_types.hpp Outdated Show resolved Hide resolved
rmw_zenoh_cpp/src/detail/rmw_data_types.hpp Outdated Show resolved Hide resolved
rmw_zenoh_cpp/src/detail/rmw_data_types.hpp Outdated Show resolved Hide resolved
rmw_zenoh_cpp/src/detail/rmw_data_types.hpp Outdated Show resolved Hide resolved
rmw_zenoh_cpp/src/detail/rmw_data_types.cpp Show resolved Hide resolved
rmw_zenoh_cpp/src/rmw_zenoh.cpp Outdated Show resolved Hide resolved
rmw_zenoh_cpp/src/rmw_zenoh.cpp Show resolved Hide resolved
rmw_zenoh_cpp/src/rmw_zenoh.cpp Outdated Show resolved Hide resolved
rmw_zenoh_cpp/src/rmw_zenoh.cpp Show resolved Hide resolved
rmw_zenoh_cpp/src/rmw_zenoh.cpp Show resolved Hide resolved
@francocipollone
Copy link
Collaborator Author

I think we could adopt the pattern in the non-blocking z_get example and directly store the z_owned_reply_ts within rmw_client_data_t.

Yes, I was actually planning to try that approach as I am still dealing with the double-response issue.
Regarding the channel vs closure approach. The recommendations were on using closures: "In general, if your system is designed such that only one response should be sent to any query, and unless you want to go for an actor model, I'd suggest sticking to closures" from discord chat. So I was hesitant to change the approach.

I will switch to using that and see how it goes.

Signed-off-by: Franco Cipollone <[email protected]>
Signed-off-by: Franco Cipollone <[email protected]>
Signed-off-by: Franco Cipollone <[email protected]>
@francocipollone
Copy link
Collaborator Author

Hey @Yadunund . Regarding the use of the fifo channels and replicate the behavior in the example

I've been taking a look at this pattern and finding a way to bring it in the implementation. My concern here is that this pattern creates a necessity to continuously checking whether a request has been made (on the service side) and the reply is made (on the client side):

service side:

for (z_call(channel.recv, &oquery); z_check(oquery); z_call(channel.recv, &oquery)) {

(from z_queryable_with_channels.c)
client side:

for (bool call_success = z_call(channel.recv, &reply); !call_success || z_check(reply);

(from z_non_blocking_get.c)

So for example, after creating the service, I'd need to kick off a thread that only has that check in a loop. So once I get the result I can unblock the condition variable that is holding on to that rmw wait-set event and then rmw_take_request will be triggered.

That's why the callback pattern fits better as it is analogous to the use of publisher and subscriber for the imeplentation of the service as it is for other implementations.

@Yadunund
Copy link
Member

I see thanks for explaining those differences. I misunderstood that we are not able to take ownership of the reply with the callback approach. It makes more sense to stick to this approach in that case.

@Yadunund
Copy link
Member

I ran into a couple issues when running some examples. Sepcifically,
The client node panicked when taking a response

[INFO] [1703683133.857969473] [rmw_zenoh_cpp]: [client_data_handler] triggered for /add_two_ints
thread 'async-std/runtime' panicked at library/core/src/panicking.rs:136:5:
panic in a function that cannot unwind
stack backtrace:
   0:     0x7f1fdf3ed740 - std::backtrace_rs::backtrace::libunwind::trace::h67a838aed1f4d6ec
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:     0x7f1fdf3ed740 - std::backtrace_rs::backtrace::trace_unsynchronized::h1d1786bb1962baf8
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x7f1fdf3ed740 - std::sys_common::backtrace::_print_fmt::h5a0b1f807a002d23
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/sys_common/backtrace.rs:67:5
   3:     0x7f1fdf3ed740 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hf84ab6ad0b91784c
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/sys_common/backtrace.rs:44:22
   4:     0x7f1fdf1c632c - core::fmt::rt::Argument::fmt::h28f463bd1fdabed5
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/fmt/rt.rs:138:9
   5:     0x7f1fdf1c632c - core::fmt::write::ha37c23b175e921b3
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/fmt/mod.rs:1114:21
   6:     0x7f1fdf3bd2dd - std::io::Write::write_fmt::haa1b000741bcbbe1
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/io/mod.rs:1763:15
   7:     0x7f1fdf3eee4e - std::sys_common::backtrace::_print::h1ff1030b04dfb157
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/sys_common/backtrace.rs:47:5
   8:     0x7f1fdf3eee4e - std::sys_common::backtrace::print::hb982056c6f29541c
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/sys_common/backtrace.rs:34:9
   9:     0x7f1fdf3eea30 - std::panicking::default_hook::{{closure}}::h11f92f82c62fbd68
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panicking.rs:272:22
  10:     0x7f1fdf3efa1a - std::panicking::default_hook::hb8810fe276772c66
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panicking.rs:292:9
  11:     0x7f1fdf3efa1a - std::panicking::rust_panic_with_hook::hd2f0efd2fec86cb0
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panicking.rs:731:13
  12:     0x7f1fdf3ef4f8 - std::panicking::begin_panic_handler::{{closure}}::h3651b7fc4f61d784
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panicking.rs:601:13
  13:     0x7f1fdf3ef486 - std::sys_common::backtrace::__rust_end_short_backtrace::hbc468e4b98c7ae04
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/sys_common/backtrace.rs:170:18
  14:     0x7f1fdf3ef471 - rust_begin_unwind
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panicking.rs:597:5
  15:     0x7f1fdf08b077 - core::panicking::panic_nounwind_fmt::h802148bc84c1e34d
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/panicking.rs:106:14
  16:     0x7f1fdf08b0e8 - core::panicking::panic_nounwind::h13bc2cb514e17671
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/panicking.rs:136:5
  17:     0x7f1fdf08b092 - core::panicking::panic_cannot_unwind::hf0e1d75c7a4ef91a
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/panicking.rs:213:5
  18:     0x7f1fdf3f0784 - std::sys::unix::thread::Thread::new::thread_start::hd28b46dbf5673d17
                               at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/sys/unix/thread.rs:102:9
  19:     0x7f1fe0a94ac3 - start_thread
                               at ./nptl/pthread_create.c:442:8
  20:     0x7f1fe0b26660 - clone3
                               at ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
  21:                0x0 - <unknown>
thread caused non-unwinding panic. aborting.

Also seeing undefined symbols when terminating the server node

^C[INFO] [1703683208.856754851] [rclcpp]: signal_handler(signum=2)
/opt/ros/iron/lib/demo_nodes_cpp/add_two_ints_server: symbol lookup error: /home/yadunund/ws_rmw/install/rmw_zenoh_cpp/lib/librmw_zenoh_cpp.so: undefined symbol: _Z6z_dropISt10unique_ptrI15z_owned_query_tSt14default_deleteIS1_EEEN15zenoh_drop_typeIT_E4typeEPS6_
[ros2run]: Process exited with failure 127

I've opened #88 to fix these problems.

@francocipollone
Copy link
Collaborator Author

I ran into a couple issues when running some examples. Sepcifically, The client node panicked when taking a response

Just in case, have you deleted zenohc_vendor builds? and build again? Throughout the last weeks, some fixes to the zenoh-c api were pushed.

I've opened #88 to fix these problems.

Great I will take a look

@Yadunund
Copy link
Member

Just in case, have you deleted zenohc_vendor builds? and build again? Throughout the last weeks, some fixes to the zenoh-c api were pushed.

Yep this is on a fresh build from a few hours ago.

* Rely on channels for sending requests

Signed-off-by: Yadunund <[email protected]>

* Revert to callback for client with fixes

Signed-off-by: Yadunund <[email protected]>

* Cleanup service cb

Signed-off-by: Yadunund <[email protected]>

* Style

Signed-off-by: Yadunund <[email protected]>

* Cleanup comments

Signed-off-by: Yadunund <[email protected]>

---------

Signed-off-by: Yadunund <[email protected]>
@francocipollone francocipollone marked this pull request as ready for review January 3, 2024 15:17
Copy link
Collaborator

@clalancette clalancette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this is a fantastic implementation. I've left a few thoughts and suggestions inline.

rmw_zenoh_cpp/src/detail/rmw_data_types.cpp Show resolved Hide resolved
rmw_zenoh_cpp/src/rmw_zenoh.cpp Outdated Show resolved Hide resolved
rmw_zenoh_cpp/src/rmw_zenoh.cpp Show resolved Hide resolved
rmw_zenoh_cpp/src/rmw_zenoh.cpp Outdated Show resolved Hide resolved
rmw_zenoh_cpp/src/rmw_zenoh.cpp Outdated Show resolved Hide resolved
rmw_zenoh_cpp/src/rmw_zenoh.cpp Outdated Show resolved Hide resolved
rmw_zenoh_cpp/src/rmw_zenoh.cpp Outdated Show resolved Hide resolved
rmw_zenoh_cpp/src/rmw_zenoh.cpp Outdated Show resolved Hide resolved
This just makes sure we always clean it up.

Signed-off-by: Chris Lalancette <[email protected]>
Copy link
Member

@Yadunund Yadunund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting the sequence numbers to be generated and managed correctly! I have one clarification comment but it's not a blocker so will go ahead and merge this in.

RMW_CHECK_FOR_NULL_WITH_MSG(
service->data, "Unable to retrieve service_data from service", RMW_RET_INVALID_ARGUMENT);

z_owned_query_t query;
Copy link
Member

@Yadunund Yadunund Jan 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we want to "move" the z_owned_query_t from query_queue.front() to sequence_to_query_map when some conditions are valid. I'm a bit concerned about creating this z_owend_query_t object and whether we need to drop it if the function returns before emplacing into the map?
Wondering if we should instantiate this as z_owned_query_t * query so that we won't have to drop anything and we can just std::move(*query) when emplacing into the map?

Also would it be better to emplace into the map AFTER serializing the payload? Wondering if the loaned query created below this will have a dangling reference after moving the z_owned_query_t from queue to map.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, the map and queue can store unique_ptr<z_owned_query_t> to make sure we're transferring ownership?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned about creating this z_owend_query_t object and whether we need to drop it if the function returns before emplacing into the map?

So I don't think this is actually a worry. That's because we only pop_front the request if we totally succeeded in this function. Thus, if we were to fail somewhere else in here, we would just leave it at the front of the replies. That means we would reprocess it next time; that may or may not be what we want to do, but it is what we do now.

All of that said, we can make this a little clearer by using unique_ptr here, so I've gone ahead and done that (I'll push some fixes to your other PR). Also I realized that the locking in here was completely bogus, so I'm going to fix that up as well.

@Yadunund Yadunund merged commit b201d81 into rolling Jan 12, 2024
5 checks passed
@delete-merged-branch delete-merged-branch bot deleted the francocipollone/service_support_via_queryable branch January 12, 2024 03:32
Yadunund added a commit that referenced this pull request Jan 12, 2024
* Adds base support for services.

Signed-off-by: Franco Cipollone <[email protected]>

* Addresses Yadu's comments.

Signed-off-by: Franco Cipollone <[email protected]>

* Addresses Yadu's comments.

Signed-off-by: Franco Cipollone <[email protected]>

* Fixes memory leak.

Signed-off-by: Franco Cipollone <[email protected]>

* Removes unnecessary declaration.

Signed-off-by: Franco Cipollone <[email protected]>

* Cleanup services implementation (#88)

* Rely on channels for sending requests

Signed-off-by: Yadunund <[email protected]>

* Revert to callback for client with fixes

Signed-off-by: Yadunund <[email protected]>

* Cleanup service cb

Signed-off-by: Yadunund <[email protected]>

* Style

Signed-off-by: Yadunund <[email protected]>

* Cleanup comments

Signed-off-by: Yadunund <[email protected]>

---------

Signed-off-by: Yadunund <[email protected]>

* Use anynomous space instead of static functions.

Signed-off-by: Franco Cipollone <[email protected]>

* Fix style.

Signed-off-by: Franco Cipollone <[email protected]>

* Use zero_allocate where needed.

Signed-off-by: Franco Cipollone <[email protected]>

* Use a scope_exit to drop the keystr.

This just makes sure we always clean it up.

Signed-off-by: Chris Lalancette <[email protected]>

* Rename find_type_support to find_message_type_support.

Signed-off-by: Chris Lalancette <[email protected]>

* Add error checking into ros_topic_name_to_zenoh_key

Signed-off-by: Chris Lalancette <[email protected]>

* Make sure to always free response_bytes.

Signed-off-by: Chris Lalancette <[email protected]>

* Remove unnecessary TODO comment.

Signed-off-by: Chris Lalancette <[email protected]>

* Remember to free request_bytes.

Signed-off-by: Chris Lalancette <[email protected]>

* Small change to take requests from the front of the deque.

Signed-off-by: Chris Lalancette <[email protected]>

* Initial work for attachments and sequence numbers.

Signed-off-by: Chris Lalancette <[email protected]>

* Add in error checking for getting attachments.

Signed-off-by: Chris Lalancette <[email protected]>

* More error checking for attachments.

Signed-off-by: Chris Lalancette <[email protected]>

* Further cleanup.

Signed-off-by: Chris Lalancette <[email protected]>

* Remove unnecessary (and incorrect) copy of sequence_number

Signed-off-by: Chris Lalancette <[email protected]>

* Style

Signed-off-by: Yadunund <[email protected]>

---------

Signed-off-by: Franco Cipollone <[email protected]>
Signed-off-by: Yadunund <[email protected]>
Signed-off-by: Chris Lalancette <[email protected]>
Co-authored-by: Yadu <[email protected]>
Co-authored-by: Chris Lalancette <[email protected]>
@@ -2055,13 +2075,35 @@ rmw_take_response(

*taken = true;

for (z_owned_reply_t & reply : client_data->replies) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clalancette looking back at this diff, I wonder if we should keep the original. If there is more than one server present, replies from other servers will still be stored in client_data->replies. We need to drop other replies to let the zenoh queryable that we're done with the reply?

@@ -2341,8 +2383,8 @@ rmw_destroy_service(rmw_node_t * node, rmw_service_t * service)
// CLEANUP ================================================================
z_drop(z_move(service_data->keyexpr));
z_drop(z_move(service_data->qable));
for (auto & id_query : service_data->id_query_map) {
z_drop(z_move(id_query.second));
for (z_owned_query_t & query : service_data->query_queue) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clalancette since we move ownership of taken requests from query_queue to sequence_id_map, should we also iterate over the map and properly drop the z_owned_query_t?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants