-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add small molecule repex consistency tests #1065
Conversation
@ijpulidos : I've pushed the small molecule neutral and charge tests. The charge test isn't done yet, but I was thinking you could take over from here? The charge test is currently yielding the error described here. Once we figure out how to fix that error, you may have to debug a bit more to make sure the test is running without any more errors and then check that its actually passing. Note that I had to make some changes to |
…/perses into add-small-molecule-test
@ijpulidos : Update: I've fixed the parametrization error in the charge test that I mentioned above |
@zhang-ivy Thanks for this! I'll take a look and let you know if I encounter anything. |
We need to switch to the new LangevinMiddelIntegrator once we cut a new release of openmmtools. |
To speed up this test, see #1058 (comment) |
To use the change in |
From discussions with @zhang-ivy , I think we want to change all of the tests to use |
As discussed in our dev sync, I saw the following results According to log output it would be taking more than 4 hours for completion, which seems like too much compared to what @zhang-ivy has obtained in other tests. The script to generate the data is attached. I basically just copied the We should also try to run this using the nightly versions of Openmm to see further performance boosts. |
…s into add-small-molecule-test
perses/tests/test_repex.py
Outdated
@@ -54,16 +54,15 @@ def test_RESTCapableHybridTopologyFactory_repex_neutral_mutation(): | |||
# Set up repex simulation | |||
reporter_file = os.path.join(temp_dir, f"{wt_name}-{mutant_name}.nc") | |||
reporter = MultiStateReporter(reporter_file, checkpoint_interval=10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can set the checkpoint_interval
to be 100 here. There's no need to checkpoint this frequently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we don't ever use the positions, so we probably don't need to write to the checkpoint file at all here.
The energies should be written every iteration regardless of the checkpoint_interval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using 100 here for the checkpoint_interval
worked okay in the preliminary tests
perses/tests/test_repex.py
Outdated
@@ -157,16 +155,15 @@ def test_RESTCapableHybridTopologyFactory_repex_charge_mutation(): | |||
# Set up repex simulation | |||
reporter_file = os.path.join(temp_dir, f"{wt_name}-{mutant_name}.nc") | |||
reporter = MultiStateReporter(reporter_file, checkpoint_interval=10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can set the checkpoint_interval
to 100 here.
Tests took 6hr30min https://github.com/choderalab/perses/actions/runs/4747425819/jobs/8432377018 with only one failure:
|
@mikemhenry that's great news. The failing test is just that we have the inverse order in the transformation, I made the changes but forgot to push them, these will add a few hours to the tests but I think we should be okay. I'll push them in a second. |
env: | ||
TEST_MODE: GPU | ||
OPENMM: ${{ matrix.cfg.openmm }} | ||
OPENMM: 8.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need this for when we merge in #1186
@ijpulidos : I just reviewed again and your changes look good to me. A couple of other things to address: |
@mikemhenry Last run of the manually run GPU CI tests was successful https://github.com/choderalab/perses/actions/runs/4766605648/jobs/8473881768#step:9:571 @zhang-ivy thanks for the comments/review. I'll be adressing these as soon as possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GPU tests took 10 hours (~$5 per run), with 8.5 hours spent in these two tests:
16633.61s call perses/tests/test_repex.py::test_RESTCapableHybridTopologyFactory_repex_charge_mutation
13661.38s call perses/tests/test_repex.py::test_RESTCapableHybridTopologyFactory_repex_charge_transformation
- Are we sure the tests are running on the GPU? We aren't logging debug output that would report the Platform that the tests are running on, but we should probably confirm this.
- Is there any way we can reduce the system size for these tests so that runtime is shortened?
- I've highlighted lines where we might be able to halve the number of steps per iteration, which would cut the runtime in half.
If we can't reduce the cost drastically, we might have to only enable these tests in special circumstances, such as once per day on main
.
…s into add-small-molecule-test
Good news, we can use 125 steps per iteration and still have the tests passing, we probably could do a bit of more refining here to lower the walltimes, but we moved from ~10 hours, to ~6.5 hours in the latest manual run of these tests here. |
And as we discussed in our dev sync, these tests would only be run on a nightly basis, and maybe we can discuss about running them less frequently if we need to. Monitoring the costs here shouldn't be hard on the AWS dashboard. |
Description
Adds neutral and charge changing small molecule tests
(the latter one has not been added yet).Motivation and context
Resolves #959
Resolves #1061
How has this been tested?
Change log