-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compute selection: deviceIndex & enforce 1 thread in vacuum #752
base: main
Are you sure you want to change the base?
Conversation
Hello @IAlibay! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2024-07-04 00:06:38 UTC |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #752 +/- ##
==========================================
- Coverage 92.84% 92.83% -0.02%
==========================================
Files 134 134
Lines 9940 9961 +21
==========================================
+ Hits 9229 9247 +18
- Misses 711 714 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
@mikemhenry when you get a chance please do have a look at this - I suspect it'll make life a bit easier in some cases. |
String with the platform name. If None, it will use the fastest | ||
platform supporting mixed precision. | ||
Default ``None``. | ||
gpu_device_index : Optional[list[str]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we should probably have a chat about how we handle this long term - this is a bit like MPI settings, where technically we shouldn't make this immutable but maybe something we pick up at run time?
How can we go about handling this properly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something I don't think we abstracted well are "run time arguments". Like we have the split for settings that change thermo, but didn't consider a category of non-thermo settings that make the most sense to pick at runtime, I haven't looked at the code yet and will update this comment, but I suspect what we should do is
- have some default
- read this setting
- read in an environmental variable
If we do things in that order, it means we don't break anything old, then when configuring your system you can make some choices, but then when running on HPC you can still set things if needed and override the settings
|
||
**Changed:** | ||
|
||
* `openfe.protocols.openmm_rfe._rfe_utils.compute` has been moved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good example of a nice changelog entry but since this was a private API, no symver major bump needed
**Changed:** | ||
|
||
* `openfe.protocols.openmm_rfe._rfe_utils.compute` has been moved | ||
to `openfe.protocols.openmm_utils.omm_compute`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want a private namespace or is this now in our public API?
to `openfe.protocols.openmm_utils.omm_compute`. | |
to `openfe.protocols.openmm_utils._omm_compute`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say public developer API is fine, private was because we were directly vendoring from perses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is good, have a few notes but nothing blocking.
platform = compute.get_openmm_platform( | ||
settings['engine_settings'].compute_platform | ||
# Restrict CPU count if running vacuum simulation | ||
restrict_cpu = settings['forcefield_settings'].nonbonded_method.lower() == 'nocutoff' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another argument we really should have an explicit way of saying this is a vacuum simulation. I propose we add a setting somewhere for that #904
In the meantime, this seems like a pretty good heuristic.
We could do more logging, I it would be nice to do a hackathon on it but in the mean time I will just suggest as I see it. It would be good to log what is going on here, maybe could be more verbose than what I suggest but this seems like a spot where if someone was like "why is this running on the CPU and not the GPU?" a log message could hep
restrict_cpu = settings['forcefield_settings'].nonbonded_method.lower() == 'nocutoff' | |
restrict_cpu = settings['forcefield_settings'].nonbonded_method.lower() == 'nocutoff' | |
logging.info(f"{restrict_cpu=}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay so I think that the device index selection is good but I am non the fence if we should be doing anything other than warning users when they are running a vacuum on a GPU or with more than 1 thread.
I think there is an argument for this being a "sane default" that we provide, but we need some user doc explaining how by default this is how a vacuum transformation works for these protocol(s)
Co-authored-by: Mike Henry <[email protected]>
Are you saying that you don't want to go with this enforced 1 thread approach? I'm happy to reconsider this change (and just stick to the deviceIndex stuff) - what do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only bit needed -- we should respect whatever the user has set for OPENMM_CPU_THREADS
if they have set it
@mikemhenry could you have a look and check that the latest change is what you meant? |
Fixes #739 #704
Checklist
news
entryDevelopers certificate of origin