You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some users in our teams are using ranger on our shared RStudio Server Pro cluster. As many R users are not familiar with threading and paralellization so they use the default behavior in ranger.
This means that they will use all the hardware available threads
This is not ideal on shared servers where several datascientists needs to share ressources.
Currently we have some documentation to warn them so that they do not forget the num.threads argument when calling ranger().
However, it would be nice if, as an analytic admin of the service we provide to our user, I could change the default behavior so that ranger() does not use the full available capacity on the server for one user.
I think it could be done :
either on the R package only in ranger function by dealing with a R option or an environment variable that would be used if num. thread is NULL (the default).
Something like (with more control I guess)
if (is.null(num.thread)) {
num.thread<- as.numeric(Sys.getenv("R_RANGER_NUM_THREAD", getOption("ranger.num.thread", 0L)))
}
or on the C++ side by an environment variable that would be used if set when using default value num.thread = 0
This type of configuration are already done in other R package like
data.table => see ?getDTthread and associated C file. they use a combination of data.table specific environment variables on the C side, or using openMP control feature.
xgboost => using OpenMP that allow to change the value returned by omp_get_* functions using an environment variable
This may be a specific use case but it would help a lot in some shared environment.
Would you consider something like that ?
Thank you very much.
The text was updated successfully, but these errors were encountered:
Hi! Has there been any update on this? I found this really helpful in figuring out some issues I was having with using ranger in conjunction with doParallel
Hi,
Some users in our teams are using
ranger
on our shared RStudio Server Pro cluster. As many R users are not familiar with threading and paralellization so they use the default behavior in ranger.This means that they will use all the hardware available threads
ranger/src/Forest.cpp
Lines 198 to 208 in d1ecade
This is not ideal on shared servers where several datascientists needs to share ressources.
Currently we have some documentation to warn them so that they do not forget the
num.threads
argument when callingranger()
.However, it would be nice if, as an analytic admin of the service we provide to our user, I could change the default behavior so that
ranger()
does not use the full available capacity on the server for one user.I think it could be done :
num. thread
isNULL
(the default).Something like (with more control I guess)
num.thread = 0
This type of configuration are already done in other R package like
?getDTthread
and associated C file. they use a combination of data.table specific environment variables on the C side, or using openMP control feature.omp_get_*
functions using an environment variableThis may be a specific use case but it would help a lot in some shared environment.
Would you consider something like that ?
Thank you very much.
The text was updated successfully, but these errors were encountered: