Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add XRCE DDS client #19326

Merged
merged 1 commit into from
Mar 22, 2022
Merged

add XRCE DDS client #19326

merged 1 commit into from
Mar 22, 2022

Conversation

bkueng
Copy link
Member

@bkueng bkueng commented Mar 14, 2022

This adds eProsima's XRCE DDS bridge in order to bridge uORB to DDS.

Notes

  • on stm32f7 with 58KB/s TX rate, the CPU load is 4.5%. Seems quite high.
  • some of the microRTPS build infra is re-used (urtps_bridge_topics.yaml), which will need some cleanup.
  • during testing I experienced an uncaught exception in the agent (terminate called after throwing an instance of 'eprosima::fastcdr::exception::NotEnoughMemoryException').
  • when running via serial: pinging the agent after an FMU reboot does not work reliably
  • I also did not see a reliable way to re-init the client when the agent restarts

Serialization

CDR serialization code is directly generated from .msg files and templates instead of using XRCE-DDS Gen. This has some advantages:

  • Reduced flash overhead: e.g. for vehicle_odometry, Micro-XRCE-DDS-Gen adds 1.4KB flash whereas in this PR, the extra flash for the serialization method is 400B.
  • Reduced build dependencies (no java and gradle required) and complexity
  • Faster build

The only caveat is that the ucdrBuffer is accessed directly (somewhat internal API).

Tested

TODO

  • support nested topics
  • instead of using xml to setup the agent, use the binary interface

Overall I'm not yet convinced to go with XRCE, but with this it's in a state where we can test it further.

@spiderkeys @pablogs9

@dagar dagar self-requested a review March 14, 2022 13:35
@TSC21 TSC21 self-requested a review March 14, 2022 14:38
@spiderkeys
Copy link
Contributor

Trying to figure out how to get this built, but am running into an issue around what seems to be an improperly generated expected size for a static assertion check for certain message types:

PX4-Autopilot/build/matek_h743-slim_default/uORB/ucdr/vehicle_trajectory_waypoint.h:28:40: error: static assertion failed: size mismatch
   28 |  static_assert(sizeof(topic.waypoints) == 0, "size mismatch");
      |                ~~~~~~~~~~~~~~~~~~~~~~~~^~~~

The uORB input generated code for vehicle_trajectory_waypoint_s is:

	uint64_t timestamp;
	uint8_t type;
	uint8_t _padding0[7]; // required for logger
	struct trajectory_waypoint_s waypoints[5];

Whereas the generated ucdr code has a size of 0 for the waypoints field:

bool ucdr_serialize_vehicle_trajectory_waypoint(const vehicle_trajectory_waypoint_s& topic, ucdrBuffer& buf)
{
	if (ucdr_buffer_remaining(&buf) < 9) {
		return false;
	}
	static_assert(sizeof(topic.timestamp) == 8, "size mismatch");
	memcpy(buf.iterator, &topic.timestamp, sizeof(topic.timestamp));
	buf.iterator += sizeof(topic.timestamp);
	buf.offset += sizeof(topic.timestamp);
	static_assert(sizeof(topic.type) == 1, "size mismatch");
	memcpy(buf.iterator, &topic.type, sizeof(topic.type));
	buf.iterator += sizeof(topic.type);
	buf.offset += sizeof(topic.type);
	static_assert(sizeof(topic.waypoints) == 0, "size mismatch");
	memcpy(buf.iterator, &topic.waypoints, sizeof(topic.waypoints));
	buf.iterator += sizeof(topic.waypoints);
	buf.offset += sizeof(topic.waypoints);
	return true;
}

Looking at actuator_outputs, which also has an array field, the size values are correct (output = 16*4 = 64), so I'm wondering if the difference is that the field in the trajectory message is a struct array, rather than a primitive array.

print('\tstatic_assert(sizeof(topic.{0}) == {1}, "size mismatch");'.format(field.name, field_size))

@spiderkeys
Copy link
Contributor

spiderkeys commented Mar 14, 2022

Got things replicated on my side using the Matek H743 Wing. Seeing 2% CPU usage on this board at 25.8KB/s.

Configured the serial link to run at 1MBaud (required minor tweak to allowed bauds in microdds_client).

 712 microdds_client               144  2.099  6684/ 7904  96 ( 96)  READY  5

Processes: 30 total, 5 running, 25 sleeping
CPU usage: 24.80% tasks, 0.71% sched, 74.48% idle
DMA Memory: 5120 total, 1024 used 1536 peak
Uptime: 73.551s total, 56.700s idle
nsh> microdds_client status
INFO  [microdds_client] Running, connected
INFO  [microdds_client] Payload tx: 25868 B/s
INFO  [microdds_client] Payload rx: 0 B/s

Topics are showing up as expected in RTI Admin Console (and using only single Participant/Pub/Sub entities, which is good):
Screenshot from 2022-03-14 10-48-29

@spiderkeys
Copy link
Contributor

spiderkeys commented Mar 14, 2022

On the DDS/ROS2 side of things, users ultimately need to convert the .msg files to .idl files via rosidl_adapter. There is a convenient script for this:

ros2 run rosidl_adapter msg2idl.py px4/msg

However, there are two issues with the current formatting of the message files:

  1. ROS2 requires that the filenames conform to upper camel case
  2. Once this is done, the references to nested types in the messages (like trajectory_waypoint waypoints[5]) need to be updated to match the same upper camel case.

I wrote a quick python script that does the first step and manually corrected the 4-5 cases of nested type usage, which satisfied rosidl_adapter msg2idl.py

From there, I can now use the IDL type definitions in both pure DDS or ROS2.

I believe @dagar has taken care of this in: #18911

@spiderkeys
Copy link
Contributor

spiderkeys commented Mar 14, 2022

Edit: Disregard - This was a discrepancy based on the typenames contained within the micro client XML description vs the typesupport generated from the msg->IDL conversion (px4_msgs::msg::dds_::SensorCombined_ vs px4::msg::SensorCombined)

Unfortunately, there seems to be some kind of delta between the CDR representation of the type being published by the MicroXRCEAgent on behalf of the client, as neither RTI Admin Console, nor example code generated via fastrtps-gen are able to match their datareaders with the type published by the agent:

Screenshot from 2022-03-14 12-16-13

The example fastrtps-gen publisher for the given type is able to match with admin console:
Screenshot from 2022-03-14 12-12-41

Using the generated IDL file from rosidl_adapter msg2idl.py, I passed it into fastrtps-gen to generate example pub/sub code for the type using:

fastrtpsgen -example CMake ~/z/mr/gitlab/nadir/PX4-Autopilot/msg/idl/tmp/SensorCombined.idl
mkdir build
cd build
cmake ..
# edit topic names to match those published by the XRCE agent
make
./SensorCombined subscriber
./SensorCombined publisher

@spiderkeys
Copy link
Contributor

spiderkeys commented Mar 14, 2022

I think that what may need to happen is the replacement of

bool ucdr_serialize_sensor_combined(const sensor_combined_s& topic, ucdrBuffer& buf)
{
	if (ucdr_buffer_remaining(&buf) < 45) {
		return false;
	}
	static_assert(sizeof(topic.timestamp) == 8, "size mismatch");
	memcpy(buf.iterator, &topic.timestamp, sizeof(topic.timestamp));
	buf.iterator += sizeof(topic.timestamp);
	buf.offset += sizeof(topic.timestamp);
	static_assert(sizeof(topic.gyro_rad) == 12, "size mismatch");
	memcpy(buf.iterator, &topic.gyro_rad, sizeof(topic.gyro_rad));
	buf.iterator += sizeof(topic.gyro_rad);
	buf.offset += sizeof(topic.gyro_rad);
	static_assert(sizeof(topic.gyro_integral_dt) == 4, "size mismatch");
	memcpy(buf.iterator, &topic.gyro_integral_dt, sizeof(topic.gyro_integral_dt));
	buf.iterator += sizeof(topic.gyro_integral_dt);
	buf.offset += sizeof(topic.gyro_integral_dt);
	static_assert(sizeof(topic.accelerometer_timestamp_relative) == 4, "size mismatch");
	memcpy(buf.iterator, &topic.accelerometer_timestamp_relative, sizeof(topic.accelerometer_timestamp_relative));
	buf.iterator += sizeof(topic.accelerometer_timestamp_relative);
	buf.offset += sizeof(topic.accelerometer_timestamp_relative);
	static_assert(sizeof(topic.accelerometer_m_s2) == 12, "size mismatch");
	memcpy(buf.iterator, &topic.accelerometer_m_s2, sizeof(topic.accelerometer_m_s2));
	buf.iterator += sizeof(topic.accelerometer_m_s2);
	buf.offset += sizeof(topic.accelerometer_m_s2);
	static_assert(sizeof(topic.accelerometer_integral_dt) == 4, "size mismatch");
	memcpy(buf.iterator, &topic.accelerometer_integral_dt, sizeof(topic.accelerometer_integral_dt));
	buf.iterator += sizeof(topic.accelerometer_integral_dt);
	buf.offset += sizeof(topic.accelerometer_integral_dt);
	static_assert(sizeof(topic.accelerometer_clipping) == 1, "size mismatch");
	memcpy(buf.iterator, &topic.accelerometer_clipping, sizeof(topic.accelerometer_clipping));
	buf.iterator += sizeof(topic.accelerometer_clipping);
	buf.offset += sizeof(topic.accelerometer_clipping);
	return true;
}

with use of the ucdr_serialize_x() methods:

bool ucdr_serialize_sensor_combined(const sensor_combined_s& topic, ucdrBuffer& buf)
{
	bool ret = true;
	ret &= ucdr_serialize_uint64_t( &buf, topic.timestamp );
	ret &= ucdr_serialize_array_float( &buf, topic.gyro_rad, 3 );
	ret &= ucdr_serialize_uint32_t( &buf, topic.gyro_integral_dt );
	ret &= ucdr_serialize_int32_t( &buf, topic.accelerometer_timestamp_relative );
	ret &= ucdr_serialize_array_float( &buf, topic.accelerometer_m_s2, 3 );
	ret &= ucdr_serialize_uint32_t( &buf, topic.accelerometer_integral_dt );
	ret &= ucdr_serialize_uint8_t( &buf, topic.accelerometer_clipping );
	return ret;
}

@spiderkeys
Copy link
Contributor

spiderkeys commented Mar 14, 2022

Unfortunately, even after the above changes, the types are still mismatching. I've taken steps to ensure that the type names, module names, and CDR fields are matching on both sides, but there still seems to be some delta. Wondering if a flag needs to be used to relax type consistency checks relating to member names on the DDS side. I know RTI has the following options:

ignore_member_names: Controls whether member names are taken into consideration for type assignability.
force_type_validation: Controls whether type information must be available in order to complete matching between a DataWriter and this DataReader.

Given that the XRCE client code is not currently declaring type information for the topics, including their field names, this may be what is failing.

@bkueng as part of your testing, did you validate ROS2/DDS-side communications to the Agent? If so, can you share your steps?

@spiderkeys
Copy link
Contributor

spiderkeys commented Mar 14, 2022

I've now confirmed communication on the DDS side with both FastDDS and CycloneDDS. ConnextDDS is reporting that it is successfully matching, but samples don't seem to get delivered from XRCE Agent when requested by a Connext Datareader. Not sure what is going on here yet. Maybe XCDR vs XCDR2 settings. Opened an issue here: eProsima/Micro-XRCE-DDS-Agent#307

On the PX4 side, it looks like the remaining steps are generally:

  1. Land [WIP] uORB split topics and msg and full ROS2 compatibility #18911, allowing straightforward conformance with ROS2 msg naming conventions to support msg > IDL generation.
  2. (If actually necessary, depending on XCDR v XCDR2) Update code-gen to use ucdr_serialize_x() functions and improve the topic_size calculation code to follow the process used in XRCE Client examples (this can be codegen'd):https://github.com/eProsima/Micro-XRCE-DDS-Client/blob/master/examples/PublishHelloWorld/HelloWorld.c#L49
  3. If desired, update PX4 client code to use binary entity creation instead of xml
  4. Support for nested struct array types within PX4 messages

@pablogs9
Copy link

Hello @bkueng,

I have been checking this PR and I have some comments.

  1. Regarding the serialization, using memcpy in the buffer iterator is not a good idea since the Micro-CDR library is designed to be used using the API. In fact, by using this approach you are not taking into account the alignment requirements of the CDR standard and you are not using the fragmentation mechanisms provided by Micro XRCE-DDS. You will need to use ucdr_serialize_[type]() API in order to ensure correctness. If you need a reference on how to generate typesupport code from .msg files using python templates take a look at the rosidl_typesupport_microxrcedds.
  2. In order to avoid 'eprosima::fastcdr::exception::NotEnoughMemoryException' you will need to add <historyMemoryPolicy>PREALLOCATED_WITH_REALLOC</historyMemoryPolicy> to the Datawriter XML. You can find more details here.
  3. In order to use the ping functionality to ensure reconnection you will need to use uxr_ping_agent() when there is no established session and uxr_ping_agent_session() when you have an already available session. We can provide examples here if you need.

@bkueng
Copy link
Member Author

bkueng commented Mar 15, 2022

Thanks for checking @pablogs9.

In fact, by using this approach you are not taking into account the alignment requirements of the CDR standard and you are not using the fragmentation mechanisms provided by Micro XRCE-DDS.

Alignment is considered, and the message size is below the MTU, or rather I used the best-effort streams, and we're only using little-endian systems, thus I don't see a problem here. Can you be specific what you think is problematic?

In order to avoid 'eprosima::fastcdr::exception::NotEnoughMemoryException' you will need to add PREALLOCATED_WITH_REALLOC to the Datawriter XML. You can find more details here.

My concern here is reliability: if some (corrupt) data can cause an uncaught exception (crash), it is a problem.

In order to use the ping functionality to ensure reconnection you will need to use uxr_ping_agent() when there is no established session and uxr_ping_agent_session() when you have an already available session. We can provide examples here if you need.

uxr_ping_agent_session is a blocking call, is there a method to continue sending data?

@pablogs9
Copy link

pablogs9 commented Mar 15, 2022

Alignment is considered, and the message size is below the MTU, or rather I used the best-effort streams, and we're only using little-endian systems, thus I don't see a problem here. Can you be specific what you think is problematic?

Not sure if your alignment calculation is taking into account the CDR standard, check Micro CDR or Fast CDR
Edit: not a problem here. But, maybe as mentioned here if XCDR2 is expected, the max alignment is 4 B instead of 8 B (page 148 OMG standard)

My concern here is reliability: if some (corrupt) data can cause an uncaught exception (crash), it is a problem.

Corrupt data should not raise any uncaught exception that ends in a crash, but if you detect any case where it happens we will fix it ASAP.

uxr_ping_agent_session is a blocking call, is there a method to continue sending data?

There is not an active API for doing ping in a non-blocking manner. A workaround can be: uxr_ping_agent_session(&session, 0, 1); and then check session->on_pong_flag != NO_PONG_STATUS;. In any case, we are completely open to extending our APIs to your need and maybe creating a uxr_ping_agent_check_status() or even a callback approach, let's discuss this.

@pablogs9
Copy link

Possible fix for Fast DDS & RTI Connext issue here: eProsima/Fast-DDS#2580

Once it is merged we will update the Micro XRCE-DDS Agent

@bkueng
Copy link
Member Author

bkueng commented Mar 16, 2022

But, maybe as mentioned eProsima/Micro-XRCE-DDS-Agent#307 if XCDR2 is expected, the max alignment is 4 B instead of 8 B (page 148 OMG standard)v

Noted.

There is not an active API for doing ping in a non-blocking manner. A workaround can be: uxr_ping_agent_session(&session, 0, 1); and then check session->on_pong_flag != NO_PONG_STATUS;. In any case, we are completely open to extending our APIs to your need and maybe creating a uxr_ping_agent_check_status() or even a callback approach, let's discuss this.

Ok, I'm directly checking for check session->on_pong_flag for now, which is working. Ideally there would be an API to detect an agent restart, as a quick agent restart could be missed using a ping.

@pablogs9
Copy link

Noted.

According to what we have been talked here there should not be any alignment error in your implementation. Probably the issue cames from eProsima/Fast-DDS#2580

Ok, I'm directly checking for check session->on_pong_flag for now, which is working. Ideally there would be an API to detect an agent restart, as a quick agent restart could be missed using a ping.

Let us know which kind of API would fit your requirements and we will make a proposal.

@bkueng bkueng changed the title [WIP] add XRCE DDS client add XRCE DDS client Mar 17, 2022
@bkueng
Copy link
Member Author

bkueng commented Mar 18, 2022

This is getting ready to merge (I can change from xml to binary later, mostly a flash optimization).

Let us know which kind of API would fit your requirements and we will make a proposal.

I just realized I can also check for PONG_IN_SESSION_STATUS, so explicitly exposing the pong result would be enough (pong_status_t uxr_latest_pong_status(uxrSession* session)).

@bkueng bkueng merged commit 06a9be7 into master Mar 22, 2022
@bkueng bkueng deleted the xrce-dds branch March 22, 2022 08:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants