Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First steps to make protocols scheduler agnostic #1011

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

t-reents
Copy link
Contributor

Addresses #1010

The current protocols implicitly rely on the Slurm scheduler, thus causing errors when users try to use ad different one, e.g. SGE as described in the issue.
Here, the documentation of the get_builder_from_protocol method is extended to make the users aware of this behavior. Moreover, a warning is added in case no explicit resources are provided via the options argument, as this will cause errors in future versions.

@sphuber @mbercx Suggestions for a different phrasing or place where to put the additional documentation are very welcome. Moreover, I used a UserWarning because a DeprecationWarning somehow didn't appear. I'm not sure whether they are filtered at some point?

@t-reents t-reents force-pushed the fix/slurm_independent_protocols branch from 844efbb to 142b74c Compare February 19, 2024 10:30
@sphuber
Copy link
Contributor

sphuber commented Feb 19, 2024

@sphuber @mbercx Suggestions for a different phrasing or place where to put the additional documentation are very welcome.

It would be great to also add this to the documentation in addition to the docstrings. I noticed that the get_builder_from_protocol functionality isn't really documented though.

Moreover, I used a UserWarning because a DeprecationWarning somehow didn't appear. I'm not sure whether they are filtered at some point?

Deprecation warnings are ignored by default in Python (https://docs.python.org/3/library/exceptions.html#DeprecationWarning). A user has to enable them manually to see them. I think a UserWarning is fine for now as an alternative.

@t-reents t-reents force-pushed the fix/slurm_independent_protocols branch from 142b74c to e5f1148 Compare May 16, 2024 08:28
@t-reents
Copy link
Contributor Author

Hi @sphuber ! Sorry for picking this up just now.

I added a general description of the get_builder_from_protocol in the How to - Workflows section. To be honest, I wasn't 100% sure where to put it, but since it's a general remark about the workflows, I decided to put it there. Happy to discuss/adjust this.

Moreover, I realized that the issue of the protocols depending on SLURM also applies to the other workflows, not just the pw ones. Hence, I also added the warning to those.

@t-reents t-reents force-pushed the fix/slurm_independent_protocols branch from 538bc59 to 300e4cc Compare May 16, 2024 08:45
t-reents and others added 3 commits January 30, 2025 13:56
The current protocols implicitly rely on the `Slurm` scheduler by specifying the number of machines
in `metadata.options.resources`. This can cause problems when using different schedulers
in combination with `get_builder_from_protocol`. Here, we add this information to the documentation
and add a warning that the options need to be explicitly specified in future versions.
In addition to the previous changes to the `pw` `WorkChain`s, this commit adds a warning to all the other `WorkChain`s that provide the `get_builder_from_protcol` method.
This commit adds some general remarks about the `get_builder_from_protocol` methods in the `How to - Workflows` section.
@mbercx
Copy link
Member

mbercx commented Feb 6, 2025

Thanks @t-reents, sorry for only looking at this PR now. A few preliminary comments, also after reading the corresponding issue:

  1. If we don't remove the Slurm scheduler defaults from the protocol files, trying to use a code on an SGE-scheduler computer in e.g. the PwRelaxWorkChain.get_builder_from_protocol method will still fail outright, even if the correct SGE resources are specified through the options, because the num_machines input is still there.

  2. Removing both these lines:

    will result in a failure in case the resources dictionary is not provided by the user. However, this should be a perfectly legitimate use case: for the direct scheduler specifying the resources is not really necessary. I think you can also set defaults for Slurm when setting up/configuring the computer (have to double-check, don't have Slurm locally).

  3. I think a custom warning is not necessary. If we simply set the resources option to an empty dictionary in the protocol YAML files (removing the num_machines line), the scheduler inputs will still be validated before the calculation is run. When this will happen depends if you are running the PwBaseWorkChain or a high-level work chain like the PwRelaxWorkChain. The former will only fail upon submission, but the latter will fail at this line:


    Because the validation does seem to happen at this point. Either way, a warning that checks the options doesn't seem necessary, because in case the user doesn't specify the correct resources the calculation will not run anyways?

@t-reents
Copy link
Contributor Author

t-reents commented Feb 6, 2025

Thanks @mbercx for your comments!

I agree that simply removing the num_machines would basically fix it and the validation would do the rest of the job. I think that I also mentioned this as a second option in the issue. However, the idea of the warnings is just to have them for an intermediate period, to make the users aware that in one of the next releases, we'll expect explicitly defined resources (btw, I agree with you that it should be fine for the local scheduler).

To be honest, I don't have a strong opinion on this, so if I understand your comments correctly, we can also directly adjust the protocol files and only add the information to the documentation, without introducing these warnings.

@mbercx
Copy link
Member

mbercx commented Feb 7, 2025

To be honest, I don't have a strong opinion on this, so if I understand your comments correctly, we can also directly adjust the protocol files and only add the information to the documentation, without introducing these warnings.

I think this should be the best way forward, considering that the warnings are sometimes unnecessary, and in case they are, the validation would stop the calculation from running anyways? Maybe I'm missing something though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants