-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[filters] The relationship between Convolution3D running speed and number of threads #6131
Comments
@Ru1yi I did a quick test with a different cloud with GCC on Linux, and I got 5141ms - 4105ms - 2448ms - 1426ms (1 thread - 2 threads - 4 threads - 8 threads respectively). |
I installed PCL using PCL-1.12.1-AllInOne-msvc2019-win64.exe from the official release. If I want to use PCL with OpenMP, do I need to configure anything specifically? I enabled the OPENMP support in the VS project configuration.
It looks like it was closed successfully. Could it be a problem with my data? Can you help me confirm the support of msvc2019 for openmp? Thanks a lot. |
@Ru1yi Can you post the point cloud you are using as a zipped PCD or PLY file? I will also try to test |
@mvieth 3644.500049090.zip Here is the data I used for testing, which was collected using the Robosense 128-line mechanical lidar. Thank you for your support. |
I did some testing (VS2022, PCL 1.13.0, with 3644.100042090.pcd), but I didn't notice any increasing run time with more threads. 2 threads are always faster than 1 thread. I did notice that at some point, more threads did not make it any faster, maybe around 4 threads (even though my computer has 6 physical cores with 2 hyperthreads each). I only had to enable OpenMP support at one place in the project configuration (set to |
Because I use Qt5 and VS2019 for joint development, the version of PCL is 1.12.1. If you use VS2022 and PCL1.13.0 or above, you need to develop with Qt6. In my previous tests, PCL1.12.1 does not support MSVC2022, and migrating the entire program to Qt6 is a large workload, so I want to know whether the version will affect this problem. I build this project in Debug and run with debugging. |
I don't think that PCL 1.12.1 vs PCL 1.13.0 or VS2019 vs VS2022 makes any difference for the multithreading performance. I would suggest to also test with your project built in Release configuration and run without debugging. I could imagine that those two lead to more thread management overhead. |
I tested the Release runtime in both debugging and non-debugging modes. debugging mode
non-debugging mode
It seems that the running time is much shorter than in Debug mode and the shortest running time is achieved when the number of threads is 2. But I still don't see that increasing the number of threads will speed up the algorithm. |
@Ru1yi Okay, this is getting closer to the results I had on MSVC. At least now, using 2 threads is faster than using 1 thread, which is promising. Can you say which CPU you are testing on? (manufacturer, model number etc) |
Ohh and as number of threads increase, the more time is spent on allocating/deallocating and waiting 😢 |
This is the CPU model I used for testing:Intel(R) Core(TM) i7-14700HX 2.10 GHz |
That is good to know, because then your CPU is one of the new models that has "Performance-cores" and "Efficient-cores" (see https://ark.intel.com/content/www/us/en/ark/products/235997/intel-core-i7-processor-14700hx-33m-cache-up-to-5-50-ghz.html ). Basically, P-cores are faster but more power hungry while E-cores are slower but more energy efficient. In contrast, my CPU is an older one where all cores are the same. Maybe on your computer there is some weird effect that if you run with one thread, a P-core is used, but if you run with more threads, E-cores are used. Here is some information I found about OpenMP with P-cores/E-cores:
The consensus seems to be to choose an appropriate work scheduling to keep all cores busy (dynamic schedule). |
Thank you very much for your help. I will continue to pay attention to this issue when I have time. |
I want to emphasise this comment by Lars again: I just checked and with the search radius setting from above (0.01) and the point clouds you uploaded, almost all radius searches find no neighbours, only the search point itself. Setting the search radius to 0.1 seems to be more appropriate, but probably even higher. Not sure what good settings for the gaussian kernel are. |
Convolution3D now uses a dynamic schedule which should result in a better work balancing between the threads/cores: #6155 |
Describe the bug
I was trying to use the Gaussian convolution filtering algorithm below the filters module. When I set the number of threads, I found that when setNumberOfThreads input 1, Gaussian convolution runs the fastest, and the more threads, the slower the running speed. Below is my source code:
Test Result
My Environment:
Possible Solution
It looks like the time spent on thread management exceeds the time spent on the algorithm itself. So is it that my parameter settings are wrong or Convolution3D cannot be used for organized point clouds?
The text was updated successfully, but these errors were encountered: