Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add threads options, why and how #54

Open
jfouret opened this issue Sep 25, 2024 · 2 comments
Open

Add threads options, why and how #54

jfouret opened this issue Sep 25, 2024 · 2 comments
Assignees

Comments

@jfouret
Copy link

jfouret commented Sep 25, 2024

Hi,

You could easily add an option to control the number of threads.

A lots of people use an HPC cluster with job scheduler systems, (SLurm, Nextflow, AWS batch etc...) where one need to reserve a precise number of threads (e.g. 8) but ultimately the jobs runs on machines where the CPU count is higher.

Sometimes it can be very tricky to setup the number of cpus to reserve depending on the number of samples, mostly in a context where we integrate your tool in an automated workflow.

It appears to me that you could easily add this option, for example:

parser.add_argument(
  '--threads',
  type=int,
  default=os.cpu_count(),
  help='Number of threads to use (default: number of CPU cores)'
)

with:

            with mpctx.Pool(processes=min(min(os.cpu_count(), args.threads),
                                          len(input_files))) as pool:

You can Also combine the args sequential with threads where you switch to sequential when threads is equal to 1.

Bests,

@khyox khyox self-assigned this Sep 25, 2024
@khyox
Copy link
Owner

khyox commented Sep 25, 2024

Hi @jfouret— Thanks for the suggestion! At the beginning, typically, cores > samples, but his has changes drastically over time and now it's the opposite situation which is common, so it makes sense to add that argument. The combination with sequential can be a good alternative too. Do you want to send a PR for those changes?

@jfouret
Copy link
Author

jfouret commented Sep 25, 2024

I can try when I have some time soon.
For backward compatibility, let's keep --sequential with priority over --threads that default to os.cpu_count().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants