Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute selection: deviceIndex & enforce 1 thread in vacuum #752

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

IAlibay
Copy link
Member

@IAlibay IAlibay commented Mar 4, 2024

Fixes #739 #704

Checklist

  • Added a news entry

Developers certificate of origin

@pep8speaks
Copy link

pep8speaks commented Mar 4, 2024

Hello @IAlibay! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 195:80: E501 line too long (93 > 79 characters)
Line 739:80: E501 line too long (81 > 79 characters)

Line 620:80: E501 line too long (81 > 79 characters)
Line 623:80: E501 line too long (80 > 79 characters)

Line 935:80: E501 line too long (81 > 79 characters)
Line 938:80: E501 line too long (80 > 79 characters)

Line 30:80: E501 line too long (136 > 79 characters)

Line 178:80: E501 line too long (132 > 79 characters)

Comment last updated at 2024-07-04 00:06:38 UTC

@IAlibay IAlibay linked an issue Mar 4, 2024 that may be closed by this pull request
Copy link

codecov bot commented Mar 4, 2024

Codecov Report

Attention: Patch coverage is 89.74359% with 4 lines in your changes missing coverage. Please review.

Project coverage is 92.83%. Comparing base (f24a252) to head (7f4d421).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
openfe/protocols/openmm_utils/omm_compute.py 63.63% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #752      +/-   ##
==========================================
- Coverage   92.84%   92.83%   -0.02%     
==========================================
  Files         134      134              
  Lines        9940     9961      +21     
==========================================
+ Hits         9229     9247      +18     
- Misses        711      714       +3     
Flag Coverage Δ
fast-tests 92.83% <89.74%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@IAlibay IAlibay requested a review from mikemhenry July 4, 2024 00:08
@IAlibay
Copy link
Member Author

IAlibay commented Jul 4, 2024

@mikemhenry when you get a chance please do have a look at this - I suspect it'll make life a bit easier in some cases.

String with the platform name. If None, it will use the fastest
platform supporting mixed precision.
Default ``None``.
gpu_device_index : Optional[list[str]]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we should probably have a chat about how we handle this long term - this is a bit like MPI settings, where technically we shouldn't make this immutable but maybe something we pick up at run time?

How can we go about handling this properly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I don't think we abstracted well are "run time arguments". Like we have the split for settings that change thermo, but didn't consider a category of non-thermo settings that make the most sense to pick at runtime, I haven't looked at the code yet and will update this comment, but I suspect what we should do is

  1. have some default
  2. read this setting
  3. read in an environmental variable

If we do things in that order, it means we don't break anything old, then when configuring your system you can make some choices, but then when running on HPC you can still set things if needed and override the settings


**Changed:**

* `openfe.protocols.openmm_rfe._rfe_utils.compute` has been moved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good example of a nice changelog entry but since this was a private API, no symver major bump needed

**Changed:**

* `openfe.protocols.openmm_rfe._rfe_utils.compute` has been moved
to `openfe.protocols.openmm_utils.omm_compute`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want a private namespace or is this now in our public API?

Suggested change
to `openfe.protocols.openmm_utils.omm_compute`.
to `openfe.protocols.openmm_utils._omm_compute`.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say public developer API is fine, private was because we were directly vendoring from perses.

Copy link
Contributor

@mikemhenry mikemhenry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is good, have a few notes but nothing blocking.

platform = compute.get_openmm_platform(
settings['engine_settings'].compute_platform
# Restrict CPU count if running vacuum simulation
restrict_cpu = settings['forcefield_settings'].nonbonded_method.lower() == 'nocutoff'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another argument we really should have an explicit way of saying this is a vacuum simulation. I propose we add a setting somewhere for that #904

In the meantime, this seems like a pretty good heuristic.

We could do more logging, I it would be nice to do a hackathon on it but in the mean time I will just suggest as I see it. It would be good to log what is going on here, maybe could be more verbose than what I suggest but this seems like a spot where if someone was like "why is this running on the CPU and not the GPU?" a log message could hep

Suggested change
restrict_cpu = settings['forcefield_settings'].nonbonded_method.lower() == 'nocutoff'
restrict_cpu = settings['forcefield_settings'].nonbonded_method.lower() == 'nocutoff'
logging.info(f"{restrict_cpu=}")

Copy link
Contributor

@mikemhenry mikemhenry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so I think that the device index selection is good but I am non the fence if we should be doing anything other than warning users when they are running a vacuum on a GPU or with more than 1 thread.

I think there is an argument for this being a "sane default" that we provide, but we need some user doc explaining how by default this is how a vacuum transformation works for these protocol(s)

news/compute_selection_fixes.rst Outdated Show resolved Hide resolved
openfe/tests/protocols/test_rfe_tokenization.py Outdated Show resolved Hide resolved
@mikemhenry mikemhenry self-assigned this Nov 14, 2024
@IAlibay
Copy link
Member Author

IAlibay commented Nov 20, 2024

Okay so I think that the device index selection is good but I am non the fence if we should be doing anything other than warning users when they are running a vacuum on a GPU or with more than 1 thread.

I think there is an argument for this being a "sane default" that we provide, but we need some user doc explaining how by default this is how a vacuum transformation works for these protocol(s)

Are you saying that you don't want to go with this enforced 1 thread approach?
The main thing is that we don't have a way to control CPU count either at run time or via our settings, so we're relying on folks knowing they should use OPENMM_CPU_THREADS to set this to 1 (which isn't super well documented).

I'm happy to reconsider this change (and just stick to the deviceIndex stuff) - what do you think?

Copy link
Contributor

@mikemhenry mikemhenry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only bit needed -- we should respect whatever the user has set for OPENMM_CPU_THREADS if they have set it

@IAlibay
Copy link
Member Author

IAlibay commented Nov 21, 2024

@mikemhenry could you have a look and check that the latest change is what you meant?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add cuda DeviceIndex in engine_settings Make get_openmm_platform set threads to 1 if using NoCutoff
3 participants