-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the calculation for the number of threads to be used #39
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @haoyu-zc. I left some comments requesting a few more changes. See if it makes sense for you.
libs/ccc/coef/impl.py
Outdated
@@ -640,7 +640,7 @@ def ccc( | |||
|
|||
# get number of cores to use | |||
n_jobs = os.cpu_count() if n_jobs is None else n_jobs | |||
default_n_threads = (os.cpu_count() - n_jobs) if n_jobs < 0 else n_jobs | |||
default_n_threads = (os.cpu_count() + n_jobs) if n_jobs < 0 else n_jobs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes the problem described in the PR. However, if n_jobs
has an absolute value equal or larger than os.cpu_count()
it will be zero or negative, right? We should check if the final value for default_n_threads
is within a valid range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I think the docstring of this function needs to be updated, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's what I missed! I'll update the code and add test cases. Thanks!
Changes made:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Should we also update the version in setup.py
for a new release, or you plan to do that in another PR? (whatever you prefer is fine for me).
Also, after installing this version (pip install -e .
), I tested again the last issue #37 (with Python 3.12, not Python 3.9) and now everything works as expected.
Another issue that seems to be fixed now is #38 :) I wonder what happened.
n_workers = os.cpu_count() + n_jobs if n_jobs < 0 else n_jobs | ||
|
||
if n_workers < 1: | ||
raise ValueError(f"The number of threads/processes to use must be greater than 0. Got {n_workers}." | ||
"Please check the n_jobs argument provided") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. If a user specifies more threads than actual CPU cores, that's a problem for the user (using more resources than actual ones) and we should not check for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep! As ThreadPoolExecutor(max_workers=n_workers)
also only complains when n_workers < 0. We don't need to check for that case
I think it might be some changes with Python itself. I don't have a clue now either... I'll do a new PR for the new release to clarify things. |
There's a bug when n_jobs is provided as a negative number. Here's the original contract and code:
Docstring:
n_jobs: number of CPU cores to use for parallelization. The value
None will use all available cores (
os.cpu_count()
), and negativevalues will use
os.cpu_count() - n_jobs
. Default is 1.Code:
It's clear that we should fix the subtraction to an addition: