Compute selection: deviceIndex & enforce 1 thread in vacuum #752

IAlibay · 2024-03-04T11:33:15Z

Fixes #739 #704

Checklist

Added a news entry

Developers certificate of origin

I certify that this contribution is covered by the MIT License here and the Developer Certificate of Origin at https://developercertificate.org/.

pep8speaks · 2024-03-04T11:33:26Z

Hello @IAlibay! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file openfe/protocols/openmm_afe/base.py:

Line 195:80: E501 line too long (93 > 79 characters)
Line 739:80: E501 line too long (81 > 79 characters)

In the file openfe/protocols/openmm_md/plain_md_methods.py:

Line 620:80: E501 line too long (81 > 79 characters)
Line 623:80: E501 line too long (80 > 79 characters)

In the file openfe/protocols/openmm_rfe/equil_rfe_methods.py:

Line 935:80: E501 line too long (81 > 79 characters)
Line 938:80: E501 line too long (80 > 79 characters)

In the file openfe/protocols/openmm_utils/omm_compute.py:

Line 30:80: E501 line too long (136 > 79 characters)

In the file openfe/protocols/openmm_utils/omm_settings.py:

Line 178:80: E501 line too long (132 > 79 characters)

Comment last updated at 2024-07-04 00:06:38 UTC

codecov · 2024-03-04T11:45:23Z

Codecov Report

Attention: Patch coverage is 89.74359% with 4 lines in your changes missing coverage. Please review.

Project coverage is 92.83%. Comparing base (f24a252) to head (7f4d421).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
openfe/protocols/openmm_utils/omm_compute.py	63.63%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #752      +/-   ##
==========================================
- Coverage   92.84%   92.83%   -0.02%     
==========================================
  Files         134      134              
  Lines        9940     9961      +21     
==========================================
+ Hits         9229     9247      +18     
- Misses        711      714       +3

Flag	Coverage Δ
fast-tests	`92.83% <89.74%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚨 Try these New Features:

Flaky Tests Detection - Detect and resolve failed and flaky tests

IAlibay · 2024-07-04T00:08:33Z

@mikemhenry when you get a chance please do have a look at this - I suspect it'll make life a bit easier in some cases.

IAlibay · 2024-07-04T00:09:31Z

openfe/protocols/openmm_utils/omm_compute.py

        String with the platform name. If None, it will use the fastest
        platform supporting mixed precision.
+        Default ``None``.
+    gpu_device_index : Optional[list[str]]


Actually we should probably have a chat about how we handle this long term - this is a bit like MPI settings, where technically we shouldn't make this immutable but maybe something we pick up at run time?

How can we go about handling this properly?

Something I don't think we abstracted well are "run time arguments". Like we have the split for settings that change thermo, but didn't consider a category of non-thermo settings that make the most sense to pick at runtime, I haven't looked at the code yet and will update this comment, but I suspect what we should do is

have some default

read this setting

read in an environmental variable

If we do things in that order, it means we don't break anything old, then when configuring your system you can make some choices, but then when running on HPC you can still set things if needed and override the settings

mikemhenry · 2024-07-23T23:14:45Z

news/compute_selection_fixes.rst

+
+**Changed:**
+
+* `openfe.protocols.openmm_rfe._rfe_utils.compute` has been moved


good example of a nice changelog entry but since this was a private API, no symver major bump needed

mikemhenry · 2024-07-23T23:15:15Z

news/compute_selection_fixes.rst

+**Changed:**
+
+* `openfe.protocols.openmm_rfe._rfe_utils.compute` has been moved
+  to `openfe.protocols.openmm_utils.omm_compute`.


Do we want a private namespace or is this now in our public API?

Suggested change

to `openfe.protocols.openmm_utils.omm_compute`.

to `openfe.protocols.openmm_utils._omm_compute`.

I'd say public developer API is fine, private was because we were directly vendoring from perses.

mikemhenry

This one is good, have a few notes but nothing blocking.

mikemhenry · 2024-07-23T23:26:09Z

openfe/protocols/openmm_afe/base.py

-        platform = compute.get_openmm_platform(
-            settings['engine_settings'].compute_platform
+        # Restrict CPU count if running vacuum simulation
+        restrict_cpu = settings['forcefield_settings'].nonbonded_method.lower() == 'nocutoff'


another argument we really should have an explicit way of saying this is a vacuum simulation. I propose we add a setting somewhere for that #904

In the meantime, this seems like a pretty good heuristic.

We could do more logging, I it would be nice to do a hackathon on it but in the mean time I will just suggest as I see it. It would be good to log what is going on here, maybe could be more verbose than what I suggest but this seems like a spot where if someone was like "why is this running on the CPU and not the GPU?" a log message could hep

Suggested change

restrict_cpu = settings['forcefield_settings'].nonbonded_method.lower() == 'nocutoff'

restrict_cpu = settings['forcefield_settings'].nonbonded_method.lower() == 'nocutoff'

logging.info(f"{restrict_cpu=}")

mikemhenry

Okay so I think that the device index selection is good but I am non the fence if we should be doing anything other than warning users when they are running a vacuum on a GPU or with more than 1 thread.

I think there is an argument for this being a "sane default" that we provide, but we need some user doc explaining how by default this is how a vacuum transformation works for these protocol(s)

news/compute_selection_fixes.rst

openfe/tests/protocols/test_rfe_tokenization.py

Co-authored-by: Mike Henry <[email protected]>

IAlibay · 2024-11-20T14:40:18Z

Okay so I think that the device index selection is good but I am non the fence if we should be doing anything other than warning users when they are running a vacuum on a GPU or with more than 1 thread.

I think there is an argument for this being a "sane default" that we provide, but we need some user doc explaining how by default this is how a vacuum transformation works for these protocol(s)

Are you saying that you don't want to go with this enforced 1 thread approach?
The main thing is that we don't have a way to control CPU count either at run time or via our settings, so we're relying on folks knowing they should use OPENMM_CPU_THREADS to set this to 1 (which isn't super well documented).

I'm happy to reconsider this change (and just stick to the deviceIndex stuff) - what do you think?

mikemhenry

Only bit needed -- we should respect whatever the user has set for OPENMM_CPU_THREADS if they have set it

IAlibay · 2024-11-21T19:08:29Z

@mikemhenry could you have a look and check that the latest change is what you meant?

Fix openmm compute platform selection for issues 739 and 704

8062e72

Add rever entry

a6e41f1

IAlibay linked an issue Mar 4, 2024 that may be closed by this pull request

Make get_openmm_platform set threads to 1 if using NoCutoff #704

Open

fix typing

0405e85

Remove erroneous extra file

c349ceb

IAlibay mentioned this pull request Mar 4, 2024

add cuda DeviceIndex in engine_settings #739

Open

IAlibay added 3 commits April 21, 2024 12:49

Merge branch 'main' into fix-compute

d931c81

Merge branch 'main' into fix-compute

95146a7

fix gufe keys

48707e2

IAlibay requested a review from mikemhenry July 4, 2024 00:08

IAlibay commented Jul 4, 2024

View reviewed changes

mikemhenry reviewed Jul 23, 2024

View reviewed changes

mikemhenry approved these changes Jul 23, 2024

View reviewed changes

IAlibay added the priority:low label Oct 28, 2024

IAlibay added priority:medium and removed priority:low labels Nov 12, 2024

mikemhenry self-requested a review November 13, 2024 20:33

mikemhenry requested changes Nov 13, 2024

View reviewed changes

news/compute_selection_fixes.rst Outdated Show resolved Hide resolved

openfe/tests/protocols/test_rfe_tokenization.py Outdated Show resolved Hide resolved

mikemhenry self-assigned this Nov 14, 2024

IAlibay and others added 2 commits November 20, 2024 14:35

Merge branch 'main' into fix-compute

e6e02f2

Update news/compute_selection_fixes.rst

229322d

Co-authored-by: Mike Henry <[email protected]>

IAlibay requested a review from mikemhenry November 20, 2024 14:38

mikemhenry approved these changes Nov 21, 2024

View reviewed changes

Update omm_compute.py

b877e76

import os

7f4d421

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compute selection: deviceIndex & enforce 1 thread in vacuum #752

Compute selection: deviceIndex & enforce 1 thread in vacuum #752

IAlibay commented Mar 4, 2024 •

edited

Loading

pep8speaks commented Mar 4, 2024 •

edited

Loading

codecov bot commented Mar 4, 2024 •

edited

Loading

IAlibay commented Jul 4, 2024

IAlibay Jul 4, 2024

mikemhenry Jul 23, 2024

mikemhenry Jul 23, 2024

mikemhenry Jul 23, 2024

IAlibay Nov 20, 2024

mikemhenry left a comment

mikemhenry Jul 23, 2024

mikemhenry left a comment

IAlibay commented Nov 20, 2024

mikemhenry left a comment

IAlibay commented Nov 21, 2024


		Changed:

		* `openfe.protocols.openmm_rfe._rfe_utils.compute` has been moved

	to `openfe.protocols.openmm_utils.omm_compute`.
	to `openfe.protocols.openmm_utils._omm_compute`.

	restrict_cpu = settings['forcefield_settings'].nonbonded_method.lower() == 'nocutoff'
	restrict_cpu = settings['forcefield_settings'].nonbonded_method.lower() == 'nocutoff'
	logging.info(f"{restrict_cpu=}")

Compute selection: deviceIndex & enforce 1 thread in vacuum #752

Are you sure you want to change the base?

Compute selection: deviceIndex & enforce 1 thread in vacuum #752

Conversation

IAlibay commented Mar 4, 2024 • edited Loading

Developers certificate of origin

pep8speaks commented Mar 4, 2024 • edited Loading

Comment last updated at 2024-07-04 00:06:38 UTC

codecov bot commented Mar 4, 2024 • edited Loading

Codecov Report

IAlibay commented Jul 4, 2024

IAlibay Jul 4, 2024

Choose a reason for hiding this comment

mikemhenry Jul 23, 2024

Choose a reason for hiding this comment

mikemhenry Jul 23, 2024

Choose a reason for hiding this comment

mikemhenry Jul 23, 2024

Choose a reason for hiding this comment

IAlibay Nov 20, 2024

Choose a reason for hiding this comment

mikemhenry left a comment

Choose a reason for hiding this comment

mikemhenry Jul 23, 2024

Choose a reason for hiding this comment

mikemhenry left a comment

Choose a reason for hiding this comment

IAlibay commented Nov 20, 2024

mikemhenry left a comment

Choose a reason for hiding this comment

IAlibay commented Nov 21, 2024

IAlibay commented Mar 4, 2024 •

edited

Loading

pep8speaks commented Mar 4, 2024 •

edited

Loading

codecov bot commented Mar 4, 2024 •

edited

Loading