-
Notifications
You must be signed in to change notification settings - Fork 332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for limiting num_threads in tbb task #252
Conversation
Hello @zzodo Thanks for the contribution. Do you have any dataset showing the behavior where KISS-ICP consumes all the resources? One of the design principles of It might be beneficial to expose the number of threads through the configuration file but this will add yet another parameter to tune. I will reply the other questions later :) Thanks again for opening the PR |
The some conditions that I mentioned in the earlier comment was not the dataset, but other parallelized pipelines. I can elaborate my point as below:
And I also agree to your concern that number of threads has to be configured outside.
|
@zzodo sorry for the late replies and back and forth. I'm a bit concerned about introducing this change (@tizianoGuadagnino , @benemer, please tell me if you guys think differently). Mainly because I will not have the time in the next weeks to fully test it and make sure nothing is slipping our hands. But I have a proposal for you to acknowledge all the great work you have done. Something we can do as a first step to address the problems you report is to completely disable multi-threading. In most cases, the pipeline will still run at the sensor frame rate or even faster, so it's still deployable. For this, we would need to merge this branch I always try to keep up-to-date: https://github.com/PRBonn/kiss-icp/tree/nacho/single_thread_implementation. I feel like this task is a bit more challenging but might be more beneficial for the community. The task should involve
What do you think? Would yo be interested in doing this bit? Thanks again! |
Well, I also think that those tasks may require days to be applied on this pull request. A few concerns immediately came up to my mind, and those concerns are mainly due to some kind of philosophy I can feel in this repository, such as cmake packaging rules, coding styles, and header/source file hierarchy. For detail, I got the impression that you have been tried to avoid using macros, conditional compilation, or other (unnecessary) utilizations, which could influence 'keep it simple'. Is that right? Nevertheless, I think I can deliever you a draft If other contributors agree with my proposal. And it would be my pleasure to hear your preferences, or some ideas that single-threaded version can be merged into master branch. Best. |
Additional details to my previous comment:
Do you want to remove tbb dependency in the single-threaded cases? |
For backwards compatibility, use a default value of "automatic"
@zzodo sorry for being SO late. I think I'm a bit in favor of changing this default behavior from the pipeline. The problem I'm currently having is a bit about our API design. We can indeed make the registration part a class, but we always tried to keep stuff simple and small... I'm trying to figure out if there is a better way to do this, since this would propagate a a ton of changes of the C++, Python, and ROS 1/2 APIs But in general I'm in favor of adding a parametrization for this bit. BTW: Where are all the nice plots? they are not on GH but I saw them on the email thread @tizianoGuadagnino , @benemer It would be nice if you guys could comment a bit on this one |
To test the single thread (or other number) with the Python API I used the following env variables
and pick I also kept an eye on The on a private dataset, testing with a
Please take these results with a grain of salt, more experiments would need to be conducted to extract valuable conclusions And @zzodo would be more than nice if you could add your experiments here!! |
Hi @nachovizzo, long time no see! It has been a while and trying to remind the previous issues I claimed. Re-cloned kiss-icp and built both cpp and python packages and tested the odometry pipeline on MulRan dataset. It still works perfectly. But one question here, I cannot understand your environment settings:
In my machine, the environmental variable And I will share my experiment within a couple of days. BTW: I deleted some of my past comments since I thought that the results of my private datasets were not enough 😅 |
@zzodo the environment thing is something new we automatically pulled with the new pydantic release. I'd say pip install - U pydantic should do the trick |
Understood, but does it work without any Python interfaces that set the maximum concurrency of tbb? I expected something like |
Oh I found |
Here is my experiment results. The experiment was conducted on my desktop computer, which has 12th Gen Intel i5-12600KF with example test scriptimport os
from pathlib import Path
import kiss_icp
import numpy as np
from kiss_icp.datasets import dataset_factory
from kiss_icp.pipeline import OdometryPipeline
from multiprocessing import Process
from kiss_icp_eval import run_sequence
pipelines = []
def run_mulran_sequence(sequence: str):
sequence_dir = data_dir / sequence
return OdometryPipeline(
dataset=dataset_factory(
dataloader="mulran",
data_dir=sequence_dir,
),
deskew=False,
n_scans=100,
jump=1500,
)
if __name__ == '__main__':
data_root = "/home/dohoon/datasets"
data_dir = Path(os.path.join(data_root, "MulRan"))
print(f"Reading datasets from : {data_root}")
all_sequences = {
"dcc": ["DCC01"], # or dcc/DCC01
}
num_processes = 4
processes = []
for sequence in all_sequences["dcc"]:
for _ in range(num_processes):
process = Process(target=run_sequence, args=(run_mulran_sequence, {}), kwargs={'sequence':sequence})
processes.append(process)
process.start()
for process in processes:
process.join() As you can see on the figure, there exists some kind of 'racing condition' in terms of computing resource management when we allow the CPU to manage them. |
Hey, Sorry for not getting back to you sooner, but I didn't find the time to dig through this PR yet. In general, I like the idea of merging the single and multi-thread implementations! Setting a maximum number of threads seems to make sense according to your experiments, thanks a lot for this insight. Also, having a separate Registration class makes sense in my view. I use the voxel hash map in a different project and do not always need to find correspondences, so this is more a feature the Registration requires. Of course, this would introduce a lot of changes in our Python and ROS wrappers... How do we proceed? |
I will split this into two PRs I guess so we review the two changes separately |
I am following up your discussions and feeling comfortable with the changes. Thanks for all your effort! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from my side now
Update: this pr has no effect in ROS, unless we merge #294. It's just a warning, this does not affect the scope of this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self-approve? lol
Assuming 10 logical cores: Before Registration::max_num_threads_ == 0, and now Registration::max_num_threads_ == 10. This way we hide less what's going under the hood
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me now, I renamed the global control variable in the Registration class
, although it was maybe clear I will rather not have single letter named variables. We can merge for me.
btw, a small detail that I post-poned is the fact that the deksew+preprocessing module also make use of tbb, luckily we've picked the right namespace for the max_trhreads params, but something to pay attention in upcoming ticket about this |
FYI: kiss-icp/cpp/kiss_icp/core/Registration.cpp Line 173 in 5c1d64a
This is not problematic at this moment, but may affect other tbb-parallelized pipelines.
We should keep our eyes on this! |
Thanks for noticing it. Yes, I know it's a nasty hack to control this variable with static duration.... but I didn't want to corrupt the registration class with minor details. This is the way I found to "hide' this implementation detail. If you have a better idea, please propose! |
EDIT for possible confusion: I would say it is better to keep this variable as a private member variable. class Registration {
public:
explicit Registration(int max_num_iteration, double convergence_criterion, int max_num_threads)
: max_num_iterations_(max_num_iteration),
convergence_criterion_(convergence_criterion),
tbb_control_(tbb::global_control::max_allowed_parallelism, max_num_threads) {}
Sophus::SE3d AlignPointsToMap(const std::vector<Eigen::Vector3d> &frame,
const VoxelHashMap &voxel_map,
const Sophus::SE3d &initial_guess,
double max_correspondence_distance,
double kernel);
private:
int max_num_iterations_;
double convergence_criterion_;
tbb::global_control tbb_control_;
}; The lifetime of this global settings for tbb pipelines ends when the dtor of Can I ask you about the reason why you are trying to avoid declaring |
kiss-icp uses tbb library to parallelize loop in correspondence search and point registration, and this have helped to achieve real-time capabilities in the application levels.
In some conditions, however, I was able to experience this tbb-parallelized lines consume all the computing resources.
So this feature, configuring a number of threads to tbb tasks, let us to leave some room for other processes outside of kiss-icp.
Can you confirm this?
And I have some more suggestions: