[Discussion] Poor scalability by thread number in PF #6

facontidavide · 2019-10-21T15:35:39Z

This is just a brainstorming, not really an "issue". You don't need to "solve" it, it is just an open discussion between nerds :)

I noticed that the scalability of the PF Slam is quite poor with the number of threads.

For instance, moving from 4 threads to 8 increase performance only by 50%. Note that the profiler still say that we are using 100% of 8 CPU!

I do know that there isn't such a thing as perfect scalability, but in this case I think there "might" be a bottleneck somewhere.

I inspected the code and I couldn't find any mutex or potential false sharing, but of course I haven't done an exhaustive search.

eupedrosa · 2019-10-22T13:00:01Z

I have an image that can help the discussing:

The number of particles is 30.

In my opinion there are a few things that can explain these behavior:

Multi-threading does not speedup the full execution path. It parallels scan matching and ray integration (i.e. mapping). Normalizing the weights for resampling is a sequential action. Thus doubling the threads does not provide ~2x speedup.
More threads can results in an execution penalty while handling the thread-pool. From the image you can see that there is an asymptotic speedup. It can even degrade performance.
Each particle has a map with implicit data sharing (Copy-On-Write). Writing to a map can result in concurrent access to data: mutex lock -> duplicate data -> mutex unlock. The more data is shared between particles the more times this happens.
CPU affinity? If I am not mistaken the linux kernel will hop logical cores when setting a process ready to run. This can result in cache misses.

facontidavide · 2019-10-30T17:25:56Z

I have the feeling that it is mostly related to point 3, but I might be wrong.

Anyway, performance gain decrease rapidly above 4 threads

…-set [Hybrid SLAM] Add support to mapping pause and custom map setting

MatjazBostic pushed a commit to UbiquityRobotics/lama_core that referenced this issue Oct 1, 2024

Merge pull request iris-ua#6 from UbiquityRobotics/hybrid-reset-pause…

c474747

…-set [Hybrid SLAM] Add support to mapping pause and custom map setting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Discussion] Poor scalability by thread number in PF #6

[Discussion] Poor scalability by thread number in PF #6

facontidavide commented Oct 21, 2019

eupedrosa commented Oct 22, 2019

facontidavide commented Oct 30, 2019

[Discussion] Poor scalability by thread number in PF #6

[Discussion] Poor scalability by thread number in PF #6

Comments

facontidavide commented Oct 21, 2019

eupedrosa commented Oct 22, 2019

facontidavide commented Oct 30, 2019