Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shared memory performance loss compared with standalone iceoryx #2155

Open
OscarMrZ opened this issue Jan 9, 2025 · 3 comments
Open

Shared memory performance loss compared with standalone iceoryx #2155

OscarMrZ opened this issue Jan 9, 2025 · 3 comments

Comments

@OscarMrZ
Copy link

OscarMrZ commented Jan 9, 2025

Hello everyone,

I’ve been testing the performance of CycloneDDS with shared memory (SHM) enabled to demonstrate the performance improvements it should provide.

Using the APEX performance test package and this dockerfile they provide, I ran the following test:

---
experiments:
  -
    com_mean: CycloneDDS
    process_configuration: INTER_PROCESS
    execution_strategy: INTER_THREAD
    sample_transport: 
      - BY_COPY
      - SHARED_MEMORY
      - LOANED_SAMPLES
    msg: Array8m
    pubs: 1
    subs: 
     - 1
     - 2
     - 4
     - 6
     - 8
     - 10
     - 12
     - 14
     - 16
     - 18
     - 20
     - 22
     - 24
     - 26
     - 28
     - 30
     - 32
    rate: 30
    reliability: BEST_EFFORT
    durability: VOLATILE
    history: KEEP_LAST
    history_depth: 5
    max_runtime: 30
    ignore_seconds: 5

This test measures the average latency between a publisher publishing an 8MB payload at 30Hz and an increasing number of subscribers, for the three different available transports available for Cyclone: UDP(copy), SHM, and SHM with loaned samples. The QoS are the ones specified in the YAML.

These are the results:

image

I anticipated a substantial performance boost with SHM enabled, regardless of message size or the number of subscribers. However, from the previous plot we can see that:

  1. The latency is "only" reduced by about half compared to the default copy transport, but it remains in the range of tens of milliseconds. This is many orders of magnitude bigger than the results for standalone iceoryx.
  2. Latency increases as the number of subscribers grows. As far as my understanding goes, this could happen with shared memory, but the increase should be marginal.

For reference, these are the results of the same test with standalone iceoryx, using this dockerfile as a testing environment. In this case, only the latency for LOANED transport was measured as the other two are not supported.

image

Both experiments were run with the RouDi mempools config provided here.

Notice that while the latency seems to increase, it is on the order of magnitude of hundredths of milliseconds and the differences are minimal. While I could understand the latency increasing slightly due to the increased management needed for the subscriber queues, the differences of magnitude escape my understanding.

To the best of my knowledge, when enabling shared memory in Cyclone, the behavior should be very similar to the one in the second graph, as it uses iceoryx behind the scenes. Are you guys aware of any reason this could not be the case?

Many thanks in advance!

@OscarMrZ OscarMrZ changed the title Shared memory performance loss over standalone iceoryx Shared memory performance loss compared with standalone iceoryx Jan 9, 2025
@eboasson
Copy link
Contributor

Hi @OscarMrZ, thank you for the detailed information. I know it took a while to respond, but that's because it took a while to get everything in place to reproduce it. (I am using macOS, so I can't trivially reproduce it.) At least I think this counts as a reproduction 😀:

Image

Just from the general shape and numbers, it has be to (de)serializing. The flame graph we got fits with that:

Image

(serdata_to_sample is the deserialization; the other costly one is zero_samples, which is also really not necessary.)

This is with Cyclone 0.9, the current master should definitely do better, and I think the 0.10 will also be better. Now that we have everything set up, the next step is to have a look at whether it got fixed already or whether there's a bug in there. More to come.

@eboasson
Copy link
Contributor

eboasson commented Jan 21, 2025

We tested, I was overly optimistic about the 0.10 version, it still has the old code for taking samples that doesn't properly support loans. In master it finally got fixed:

Image

The API is the same in principle, but we deprecated dds_loan_sample and recommend dds_request_loan. What really changed is that the hard-coded Iceoryx integration got replaced by a loading a plugin with different configuration settings. The standard XML-based configuration supports the old-style configuration, but unfortunately, performance_test uses a lower-level way of setting the configuration if you enable shared memory: https://gitlab.com/ApexAI/performance_test/-/blob/master/performance_test/plugins/cyclonedds/cyclonedds_communicator.hpp?ref_type=heads#L133

If you replace the dds_create_domain_with_rawconfig with:

        char *config_str;
        (void) ddsrt_asprintf(&config_str, "${CYCLONEDDS_URI},<General><Interfaces><PubSubMessageExchange name=\"iox\" library=\"psmx_iox\" /></Interfaces></General>");
        char *config_exp = ddsrt_expand_envvars (config_str, ec.dds_domain_id);
        (void) dds_create_domain (ec.dds_domain_id, config_exp);
        ddsrt_free (config_exp);
        ddsrt_free (config_str);

it'll work. Not super pretty, it is only a quick hack. (And to be honest, I hacked the quick hack we used without trying out the hacked hack first, so ... 🤞)

The Cyclone DDS RMW layer for ROS 2 needs updating for master, that's sitting in ros2/rmw_cyclonedds#501 in case you want to play with it. (I've promised to merge that PR and do a release from master too many times now.)

@OscarMrZ
Copy link
Author

Hello @eboasson,

Thank you for this amazing response!

I'll try your patch for the performance tool and try to replicate. That release you mention from master would definetely be amazing, I'll play around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants