-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to config max number of threads? #241
Comments
Good question The default worker pool size is likely the double of number of cores. I did not know it was possible to modify, but I found this function in py-polars docs and learned you can use
> polars:::test_get_pool_size()
[1] 8
Restarting R session...
> library(polars)
Restarting R session...
> Sys.setenv("POLARS_MAX_THREADS"=3)
> library(polars)
> polars:::test_get_pool_size()
[1] 3 The thread pool could be used I guess to limit cores, but it is not completely exact: on top of this r-polars always reserves one extra thread for running the R interpreter. If not injecting R code into polars with $map() or $apply(), that thread will mostly be sleeping. extendr also uses a thread for catching rust panics , but that should be very light weight rust-polars could potentially here and there spawns a few threads for special tasks |
Hi again, thanks for the quick response. That is helpful to know that we can set an environment variable to set the max number of threads before polars is loaded, but is there a way to change it after it has already been loaded/attached via library(polars)? (similar to data.table::setDTthreads) |
POLARS_MAX_THREADS could have been named POLARS_RAYON_POOL_SIZE. I don't know what would happen on 0. Either 0 is not allowed or rayon hang I guess. Rayon is a rust crate which implements a thread pool. 1 would be lowest meaningful value I guess. in the rust-polars for the csv reader and other readers it is possible to specifically decide on threads. Likely because these readers come from external crates with their own threadding implementation. These are likely not bounded by POLARS_MAX_THREADS. I will look into if r-polars could easily expose these reader thread settings. Let's also keep this issue open until these thread findings have been added to docs. |
This issue may be necessary for the CRAN release. The CRAN policy requires that the number of logical threads used on a CRAN machine be limited to 2. (For example, see this thread on the R package devel mailing list https://stat.ethz.ch/pipermail/r-package-devel/2023q3/009454.html) When submitting polars to CRAN, we may need to limit the number of threads, e.g. for tests. |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as resolved.
This comment was marked as resolved.
Useful resources:
It's really surprising that CRAN is forcing package developers to set the default to 2. |
I came up with a solution of the default thread number: introduce a new feature flag Something like: if (!check_feature("not_set_max_threads") && Sys.getenv("POLARS_MAX_THREADS") == "") Sys.setenv(POLARS_MAX_THREADS = 2) @sorhawell @etiennebacher What do you think about this? |
Shouldn't the We want to set the number of threads to 2 if |
In my proposal [features]
default = []
not_set_max_threads = [] In other words, for builds that do not explicitly include if (!check_feature("not_set_max_threads") && Sys.getenv("POLARS_MAX_THREADS") == "") Sys.setenv(POLARS_MAX_THREADS = 2) |
Hi I was little confused of "not_set_max_threads", I think of it as "do_not_limit_max_threads". I hope I got it right. "not_set_max_threads" on R-universe channel?As I read the email exchange, the CRAN team was given plenty of good reasons from smart people, why not to enforce such a limit - but it is what it is. I recommend also running with "not_set_max_threads" for R-universe, as that is our main recommended channel to install from. I guess data.table would neither limit thread for any non CRAN release channel. There are some other threads, we may need to deal with eventuallyThe POLARS_MAX_THREADS likely only apply to the polars thread pool size. Our own rust code also spawns threads. When ever using suggestion how about not CRAN is the general default?I understand CRAN refuses to have any "IS_CRAN" environment variable, and therefore everybody else just have to hardcode CRAN policies to the default and revert backwards with environment variables. This opens up for some user/dev foot guns. Just a suggestion. How about using an automated CRAN release script/macro which alters the hardcoded defaults of any code bundle submitted to CRAN? Then CRAN can have it in one way, alongside a repository that retain the most sensible defaults. Of course there are also many wise / non-contoversial CRAN policies, which should just be adapted immediately everywhere. |
@eitsupi it is you who put a lot of hard work and thought into this. If you see this as the best way, then I applaud that too. |
Thank you for your reply. I strongly oppose patching source code submitted to CRAN. In other words, the issue of whether the default number of threads is unlimited or 2 is irrelevant to CRAN, and I think users should be able to choose between them no matter where they install from. |
Once #693 is merged, users will be able to freely select features during installation, making it easier to add features. |
Implemented in #720. |
great, thanks! |
btw, gsoc'24 is coming up, so if you want to participate again with R project, please add an idea to our wiki, https://github.com/rstats-gsoc/gsoc2024/wiki/table%20of%20proposed%20coding%20projects |
I won't be able to participate, maybe @eitsupi ? |
@tdhock I was blessed with one more child this year, and somehow any spare time went up in smoke for me :) It was pleasure to mentor @Sicheng-Pan last year! |
Sicheng may consider mentoring this year |
I don't think I'm good enough to be a mentor here. |
I would like to help, but I'm sure that I am way less experienced than anyone here |
well I'm sure you know more about polars than I do! I could co-mentor again, if you need somebody with basic R expertise. |
hi @sorhawell is there a function that I can call to tell polars the max number of threads I want it to use?
something like data.table::setDTthreads(4) if I only want to use 4 threads max.
Thanks!
I searched the documentation for "thread" and got a few hits but nothing that helped.
The text was updated successfully, but these errors were encountered: