Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to change default behavior for num.thread = NULL in ranger() #513

Open
cderv opened this issue May 19, 2020 · 4 comments
Open

Allow to change default behavior for num.thread = NULL in ranger() #513

cderv opened this issue May 19, 2020 · 4 comments

Comments

@cderv
Copy link

cderv commented May 19, 2020

Hi,

Some users in our teams are using ranger on our shared RStudio Server Pro cluster. As many R users are not familiar with threading and paralellization so they use the default behavior in ranger.
This means that they will use all the hardware available threads

ranger/src/Forest.cpp

Lines 198 to 208 in d1ecade

// Set number of threads
if (num_threads == DEFAULT_NUM_THREADS) {
#ifdef OLD_WIN_R_BUILD
this->num_threads = 1;
#else
this->num_threads = std::thread::hardware_concurrency();
#endif
} else {
this->num_threads = num_threads;
}

This is not ideal on shared servers where several datascientists needs to share ressources.
Currently we have some documentation to warn them so that they do not forget the num.threads argument when calling ranger().

However, it would be nice if, as an analytic admin of the service we provide to our user, I could change the default behavior so that ranger() does not use the full available capacity on the server for one user.

I think it could be done :

  • either on the R package only in ranger function by dealing with a R option or an environment variable that would be used if num. thread is NULL (the default).
    Something like (with more control I guess)
if (is.null(num.thread)) {
    num.thread <- as.numeric(Sys.getenv("R_RANGER_NUM_THREAD", getOption("ranger.num.thread", 0L)))
} 
  • or on the C++ side by an environment variable that would be used if set when using default value num.thread = 0

This type of configuration are already done in other R package like

  • data.table => see ?getDTthread and associated C file. they use a combination of data.table specific environment variables on the C side, or using openMP control feature.
  • xgboost => using OpenMP that allow to change the value returned by omp_get_* functions using an environment variable

This may be a specific use case but it would help a lot in some shared environment.

Would you consider something like that ?

Thank you very much.

@mnwright
Copy link
Member

Thanks, that's a very good idea. I like your first idea (R side) and will have a look on how other packages solve this.

@mgoplerud
Copy link

Hi! Has there been any update on this? I found this really helpful in figuring out some issues I was having with using ranger in conjunction with doParallel

@mnwright
Copy link
Member

Sorry, no update yet. A PR would be very welcome!

@mnwright
Copy link
Member

mnwright commented Dec 6, 2023

Done in #713.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants