Add support for limiting num_threads in tbb task #252

dch4o · 2023-11-02T06:53:58Z

kiss-icp uses tbb library to parallelize loop in correspondence search and point registration, and this have helped to achieve real-time capabilities in the application levels.
In some conditions, however, I was able to experience this tbb-parallelized lines consume all the computing resources.

So this feature, configuring a number of threads to tbb tasks, let us to leave some room for other processes outside of kiss-icp.

Can you confirm this?

And I have some more suggestions:

Move correspondence search method to Registration.cpp.
- This can be done by adding an interface (e.g. voxel_hash_map.begin() and voxel_hash_map.end()) as a const_iterator.
- In this way, registration takes whole responsibility of searching correspondences between the point clouds.
- Migrating the responsibility to the registration class may be useful to further customization, while removing hard-coded hyper-parameters; such as voxel search radius at here
Declare registration as a class.
- This is also related to customizability of const-expressed variable in this line.
- I personally believe that declaring registration as a class can help cpp library users.

nachovizzo · 2023-11-02T20:06:13Z

Hello @zzodo Thanks for the contribution.

Do you have any dataset showing the behavior where KISS-ICP consumes all the resources? One of the design principles of TBB is to avoid exposing the number of threads to the user, since if I'm running this on a cluster of thousands of CPUs I'd also like to use all of them.

It might be beneficial to expose the number of threads through the configuration file but this will add yet another parameter to tune.

I will reply the other questions later :)

Thanks again for opening the PR

dch4o · 2023-11-03T00:47:22Z

The some conditions that I mentioned in the earlier comment was not the dataset, but other parallelized pipelines.
This behavior would not happen, if we use kiss-icp alone.

I can elaborate my point as below:

Let us assume we have kiss-icp for Lidar odometry pipeline and relocalization pipeline from another open source package for autonomous mobile robot application.
And these two pipelines are constructed asynchronously.
In the case we want to run this on low power computing unit, each pipelines construct threads in some kind of 'racing condition' manner. (I don't know how to call this)
If the thread-building priorities of the pipelines are set to same level, the concurrency of kiss-icp can be affected by relocalization module
This can lead to possible frame drop on kiss-icp pipeline

And I also agree to your concern that number of threads has to be configured outside.
Then how about keeping max_concurrency() (which let tbb to determine maximum num_threads) unless the user configure themselves?
For example:

const std::size_t num_threads = num_threads_ ? num_threads_: max_concurrency();

nachovizzo · 2023-11-06T09:43:03Z

@zzodo sorry for the late replies and back and forth.

I'm a bit concerned about introducing this change (@tizianoGuadagnino , @benemer, please tell me if you guys think differently). Mainly because I will not have the time in the next weeks to fully test it and make sure nothing is slipping our hands.

But I have a proposal for you to acknowledge all the great work you have done. Something we can do as a first step to address the problems you report is to completely disable multi-threading. In most cases, the pipeline will still run at the sensor frame rate or even faster, so it's still deployable.

For this, we would need to merge this branch I always try to keep up-to-date: https://github.com/PRBonn/kiss-icp/tree/nacho/single_thread_implementation. I feel like this task is a bit more challenging but might be more beneficial for the community. The task should involve

Merging the single_trhread and multi-thread implementation
Allowing at build time to the user to select between one or the other (eg, I don't need multithreaded, so no need to pull the entire TBB library, but let me decide if I want to)
Expose this in the configuration of the pipeline

What do you think? Would yo be interested in doing this bit?

Thanks again!

dch4o · 2023-11-06T18:05:06Z

Well, I also think that those tasks may require days to be applied on this pull request.

A few concerns immediately came up to my mind, and those concerns are mainly due to some kind of philosophy I can feel in this repository, such as cmake packaging rules, coding styles, and header/source file hierarchy.

For detail, I got the impression that you have been tried to avoid using macros, conditional compilation, or other (unnecessary) utilizations, which could influence 'keep it simple'. Is that right?

Nevertheless, I think I can deliever you a draft If other contributors agree with my proposal. And it would be my pleasure to hear your preferences, or some ideas that single-threaded version can be merged into master branch.

Best.

dch4o · 2023-11-06T18:27:35Z

Additional details to my previous comment:

For the reason asking your preference is that you mentioned disabling tbb library.
There is another way to restrict maximum number of concurrency by using tbb::global_control

Do you want to remove tbb dependency in the single-threaded cases?

For backwards compatibility, use a default value of "automatic"

nachovizzo · 2024-02-11T20:39:08Z

@zzodo sorry for being SO late.

I think I'm a bit in favor of changing this default behavior from the pipeline. The problem I'm currently having is a bit about our API design.

We can indeed make the registration part a class, but we always tried to keep stuff simple and small... I'm trying to figure out if there is a better way to do this, since this would propagate a a ton of changes of the C++, Python, and ROS 1/2 APIs

But in general I'm in favor of adding a parametrization for this bit.

BTW: Where are all the nice plots? they are not on GH but I saw them on the email thread

@tizianoGuadagnino , @benemer It would be nice if you guys could comment a bit on this one

cpp/kiss_icp/core/Registration.cpp

cpp/kiss_icp/core/VoxelHashMap.hpp

nachovizzo · 2024-02-11T21:02:48Z

To test the single thread (or other number) with the Python API I used the following env variables

export OPENBLAS_NUM_THREADS=N && export MKL_NUM_THREADS=N && export mapping='{"max_threads": N}'

and pick N

I also kept an eye on htop making sure the changes were having effects

The on a private dataset, testing with a 20 core Intel i7 CPU, processing 1000 frames this was the output

`max_threads`	Avg. `Hz`
1	26
4	61
8	79
16	96
20	103
--1	99

Please take these results with a grain of salt, more experiments would need to be conducted to extract valuable conclusions

And @zzodo would be more than nice if you could add your experiments here!!

dch4o · 2024-02-13T21:40:11Z

Hi @nachovizzo, long time no see!

It has been a while and trying to remind the previous issues I claimed.
And experiment results(plots) too.

Re-cloned kiss-icp and built both cpp and python packages and tested the odometry pipeline on MulRan dataset. It still works perfectly.

But one question here, I cannot understand your environment settings:

export OPENBLAS_NUM_THREADS=N && export MKL_NUM_THREADS=N && export mapping='{"max_threads": N}'

In my machine, the environmental variable OPENBLAS_NUM_THREADS and MKL_NUM_THREADS actually do not affect tbb's maximum concurrenty nor average frequency of the registration. And the variable mapping='{"max_threads": N}' seems to be your customized config parameters.

And I will share my experiment within a couple of days.
Thanks!

BTW: I deleted some of my past comments since I thought that the results of my private datasets were not enough 😅

nachovizzo · 2024-02-13T21:42:43Z

@zzodo the environment thing is something new we automatically pulled with the new pydantic release. I'd say pip install - U pydantic should do the trick

dch4o · 2024-02-13T21:46:36Z

Understood, but does it work without any Python interfaces that set the maximum concurrency of tbb?

I expected something like config.mapping.max_threads

dch4o · 2024-02-13T22:08:39Z

Oh I found b327258 that does not pushed into this merge request
I'll try later!

dch4o · 2024-02-14T09:11:32Z

Here is my experiment results.

The experiment was conducted on my desktop computer, which has 12th Gen Intel i5-12600KF with 16 cores.
For the dataset, I used KAIST MulRan DCC01 sequence starting from 1500th frame to 1600th frame.
You can reproduce similar results using this:

example test script

import os
from pathlib import Path

import kiss_icp
import numpy as np
from kiss_icp.datasets import dataset_factory
from kiss_icp.pipeline import OdometryPipeline
from multiprocessing import Process

from kiss_icp_eval import run_sequence

pipelines = []
def run_mulran_sequence(sequence: str):
  sequence_dir = data_dir / sequence
  return OdometryPipeline(
      dataset=dataset_factory(
          dataloader="mulran",
          data_dir=sequence_dir,
      ),
      deskew=False,
      n_scans=100,
      jump=1500,
  )

if __name__ == '__main__':
  data_root = "/home/dohoon/datasets"
  data_dir = Path(os.path.join(data_root, "MulRan"))
  print(f"Reading datasets from : {data_root}")

  all_sequences = {
      "dcc": ["DCC01"], # or dcc/DCC01
  }
  
  num_processes = 4
  processes = []
  
  for sequence in all_sequences["dcc"]:
      for _ in range(num_processes):
          process = Process(target=run_sequence, args=(run_mulran_sequence, {}), kwargs={'sequence':sequence})
          processes.append(process)
          process.start()

  for process in processes:
      process.join()

As you can see on the figure, there exists some kind of 'racing condition' in terms of computing resource management when we allow the CPU to manage them.

benemer · 2024-03-01T10:28:53Z

Hey,

Sorry for not getting back to you sooner, but I didn't find the time to dig through this PR yet.

In general, I like the idea of merging the single and multi-thread implementations! Setting a maximum number of threads seems to make sense according to your experiments, thanks a lot for this insight.

Also, having a separate Registration class makes sense in my view. I use the voxel hash map in a different project and do not always need to find correspondences, so this is more a feature the Registration requires. Of course, this would introduce a lot of changes in our Python and ROS wrappers...

How do we proceed?

config/advanced.yaml

nachovizzo · 2024-03-01T11:17:16Z

Hey,

Sorry for not getting back to you sooner, but I didn't find the time to dig through this PR yet.

In general, I like the idea of merging the single and multi-thread implementations! Setting a maximum number of threads seems to make sense according to your experiments, thanks a lot for this insight.

Also, having a separate Registration class makes sense in my view. I use the voxel hash map in a different project and do not always need to find correspondences, so this is more a feature the Registration requires. Of course, this would introduce a lot of changes in our Python and ROS wrappers...

How do we proceed?

I will split this into two PRs I guess so we review the two changes separately

dch4o · 2024-03-05T13:47:25Z

@nachovizzo

I am following up your discussions and feeling comfortable with the changes.
Although I have different preference on naming the methods, the overall changes on Core library seems nice.

Thanks for all your effort!
If there is anything I can help you, let me know 😄

benemer

Looks good from my side now

nachovizzo · 2024-03-11T17:58:24Z

Update: this pr has no effect in ROS, unless we merge #294. It's just a warning, this does not affect the scope of this PR

nachovizzo

self-approve? lol

Assuming 10 logical cores: Before Registration::max_num_threads_ == 0, and now Registration::max_num_threads_ == 10. This way we hide less what's going under the hood

tizianoGuadagnino

Looks good to me now, I renamed the global control variable in the Registration class, although it was maybe clear I will rather not have single letter named variables. We can merge for me.

nachovizzo · 2024-03-18T09:41:47Z

btw, a small detail that I post-poned is the fact that the deksew+preprocessing module also make use of tbb, luckily we've picked the right namespace for the max_trhreads params, but something to pay attention in upcoming ticket about this

dch4o · 2024-04-04T08:52:08Z

FYI:
I just noticed that the tbb::global_control variable is declared as static:

kiss-icp/cpp/kiss_icp/core/Registration.cpp

Line 173 in 5c1d64a

static const auto tbb_control_settings = tbb::global_control(

This is not problematic at this moment, but may affect other tbb-parallelized pipelines.
From the official document:

The current set of parameters that you can modify is defined by the global_control::parameter enumeration. The parameter and the value it should take are specified as arguments to the constructor of a control variable. The impact of the control variable ends when its lifetime is complete.

We should keep our eyes on this!

nachovizzo · 2024-04-04T16:09:29Z

FYI: I just noticed that the tbb::global_control variable is declared as static:

kiss-icp/cpp/kiss_icp/core/Registration.cpp

Line 173 in 5c1d64a

static const auto tbb_control_settings = tbb::global_control(

This is not problematic at this moment, but may affect other tbb-parallelized pipelines. From the official document:

The current set of parameters that you can modify is defined by the global_control::parameter enumeration. The parameter and the value it should take are specified as arguments to the constructor of a control variable. The impact of the control variable ends when its lifetime is complete.

We should keep our eyes on this!

Thanks for noticing it. Yes, I know it's a nasty hack to control this variable with static duration.... but I didn't want to corrupt the registration class with minor details. This is the way I found to "hide' this implementation detail. If you have a better idea, please propose!

dch4o · 2024-04-05T04:55:55Z

EDIT for possible confusion:
I think it is beneficial declare Registration as a class for hiding implementation details and restricting the accessibility of those variables out of class.

I would say it is better to keep this variable as a private member variable.
We can replace max_num_threads_ into tbb::global_contol.
e.g. Registration.hpp:

class Registration {
  public:
    explicit Registration(int max_num_iteration, double convergence_criterion, int max_num_threads)
        : max_num_iterations_(max_num_iteration),
          convergence_criterion_(convergence_criterion),
          tbb_control_(tbb::global_control::max_allowed_parallelism, max_num_threads) {}

    Sophus::SE3d AlignPointsToMap(const std::vector<Eigen::Vector3d> &frame,
                                  const VoxelHashMap &voxel_map,
                                  const Sophus::SE3d &initial_guess,
                                  double max_correspondence_distance,
                                  double kernel);

  private:
    int max_num_iterations_;
    double convergence_criterion_;
    tbb::global_control tbb_control_;
};

The lifetime of this global settings for tbb pipelines ends when the dtor of Registration class variable is called.

Can I ask you about the reason why you are trying to avoid declaring tbb::global_control variable in private of Registration class?

Add support for limiting num_threads in tbb task

5e08b68

nachovizzo requested review from nachovizzo and tizianoGuadagnino November 6, 2023 09:43

nachovizzo added 3 commits February 11, 2024 20:18

Merge remote-tracking branch 'origin/main' into feature/set-num-threads

fae5b46

Make num_threads a paramter of the voxel hash map

24964c8

For backwards compatibility, use a default value of "automatic"

This got a bit more hacky but is to avoid hardocding a praamter

51decb1

nachovizzo requested a review from benemer February 11, 2024 20:39

nachovizzo reviewed Feb 11, 2024

View reviewed changes

cpp/kiss_icp/core/Registration.cpp Outdated Show resolved Hide resolved

cpp/kiss_icp/core/Registration.cpp Outdated Show resolved Hide resolved

cpp/kiss_icp/core/VoxelHashMap.hpp Outdated Show resolved Hide resolved

Propagate max_threads to Python config

b327258

nachovizzo mentioned this pull request Feb 28, 2024

Nacho/remove ros 1 support #280

Merged

4 tasks

Merge remote-tracking branch 'origin/main' into feature/set-num-threads

a262e9f

benemer reviewed Mar 1, 2024

View reviewed changes

config/advanced.yaml Outdated Show resolved Hide resolved

nachovizzo added 4 commits March 1, 2024 14:15

First draft on core library

e5044f8

Update python API

6c7e505

Rearange

80248bb

Remove type alias

505ab61

nachovizzo closed this Mar 5, 2024

nachovizzo reopened this Mar 5, 2024

nachovizzo changed the base branch from nacho/strip_nn_search_from_voxel_hash_map to main March 5, 2024 13:01

nachovizzo added 2 commits March 5, 2024 14:03

Merge remote-tracking branch 'origin/main' into feature/set-num-threads

cbc831c

reduce diff

af2c31a

nachovizzo added ros core python and removed ros labels Mar 5, 2024

nachovizzo mentioned this pull request Mar 7, 2024

Expose number of icp iterations in ros #292

Merged

benemer previously approved these changes Mar 11, 2024

View reviewed changes

nachovizzo mentioned this pull request Mar 11, 2024

Auto 2257 fix double construction kiss icp cpp #294

Merged

nachovizzo previously approved these changes Mar 11, 2024

View reviewed changes

Small improvement. max_num_threads_ always represents what it says

f550a1a

Assuming 10 logical cores: Before Registration::max_num_threads_ == 0, and now Registration::max_num_threads_ == 10. This way we hide less what's going under the hood

nachovizzo dismissed stale reviews from benemer and themself via f550a1a March 11, 2024 18:15

Remove single letter variable

bb880fd

tizianoGuadagnino approved these changes Mar 18, 2024

View reviewed changes

benemer approved these changes Mar 18, 2024

View reviewed changes

nachovizzo merged commit 994d232 into PRBonn:main Mar 18, 2024
17 checks passed

nachovizzo mentioned this pull request Mar 18, 2024

Modularize Preprocessing Pipeline #295

Closed

3 tasks

dch4o deleted the feature/set-num-threads branch March 18, 2024 13:21

nachovizzo mentioned this pull request Jul 9, 2024

Fix Ubuntu 20.04 dev build #361

Merged

nachovizzo mentioned this pull request Aug 8, 2024

Move NN search back to the VoxelHashMap #388

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for limiting num_threads in tbb task #252

Add support for limiting num_threads in tbb task #252

dch4o commented Nov 2, 2023

nachovizzo commented Nov 2, 2023

dch4o commented Nov 3, 2023 •

edited

Loading

nachovizzo commented Nov 6, 2023

dch4o commented Nov 6, 2023

dch4o commented Nov 6, 2023

nachovizzo commented Feb 11, 2024

nachovizzo commented Feb 11, 2024

dch4o commented Feb 13, 2024

nachovizzo commented Feb 13, 2024

dch4o commented Feb 13, 2024 •

edited

Loading

dch4o commented Feb 13, 2024

dch4o commented Feb 14, 2024

benemer commented Mar 1, 2024

nachovizzo commented Mar 1, 2024

dch4o commented Mar 5, 2024

benemer left a comment

nachovizzo commented Mar 11, 2024

nachovizzo left a comment

tizianoGuadagnino left a comment

nachovizzo commented Mar 18, 2024

dch4o commented Apr 4, 2024

nachovizzo commented Apr 4, 2024

dch4o commented Apr 5, 2024 •

edited

Loading

Add support for limiting num_threads in tbb task #252

Add support for limiting num_threads in tbb task #252

Conversation

dch4o commented Nov 2, 2023

nachovizzo commented Nov 2, 2023

dch4o commented Nov 3, 2023 • edited Loading

nachovizzo commented Nov 6, 2023

dch4o commented Nov 6, 2023

dch4o commented Nov 6, 2023

nachovizzo commented Feb 11, 2024

nachovizzo commented Feb 11, 2024

dch4o commented Feb 13, 2024

nachovizzo commented Feb 13, 2024

dch4o commented Feb 13, 2024 • edited Loading

dch4o commented Feb 13, 2024

dch4o commented Feb 14, 2024

benemer commented Mar 1, 2024

nachovizzo commented Mar 1, 2024

dch4o commented Mar 5, 2024

benemer left a comment

Choose a reason for hiding this comment

nachovizzo commented Mar 11, 2024

nachovizzo left a comment

Choose a reason for hiding this comment

tizianoGuadagnino left a comment

Choose a reason for hiding this comment

nachovizzo commented Mar 18, 2024

dch4o commented Apr 4, 2024

nachovizzo commented Apr 4, 2024

dch4o commented Apr 5, 2024 • edited Loading

dch4o commented Nov 3, 2023 •

edited

Loading

dch4o commented Feb 13, 2024 •

edited

Loading

dch4o commented Apr 5, 2024 •

edited

Loading