Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance stability of minimization #557

Closed
wants to merge 5 commits into from
Closed

Enhance stability of minimization #557

wants to merge 5 commits into from

Conversation

jfennick
Copy link
Contributor

@jfennick jfennick commented Mar 9, 2022

Description

I have been experiencing some instabilities which, after much difficulty, I was able to track down to the minimizer. The observed behavior is that the minimizer will finish 'successfully', but some time later, during equilibration and/or propagating replicas, the simulation may or may not randomly crash with NaNs or even throw CUDA_ERROR_ILLEGAL_ADDRESS (700).

Nearly identical behavior can be found here openmm/openmm#3414. In that ticket, the issue was that the box vectors were not being updated correctly. For stability, during minimization the box vectors should not be allowed to change at all. Subsequent equilibration at NPT should be used to adjust the box vectors.

In my case(s), the box vectors were changing only at the fourth significant figure, and yet that was sufficient to (usually) crash the simulations. Inserting Gradient Descent before FIRE seems to greatly ameliorate but not completely eliminate the instabilities. Replacing FIRE with L-BFGS (with or without Gradient Descent) also seems to fix the instabilities.

This PR adds three changes to improve the stability of minimization:

  1. First and foremost, it temporarily disables the barostat during minimization to keep the box vectors fixed. This is probably the only change that is absolutely necessary to fix the instabilities.
  2. It inserts a very short (24 timestep) Gradient Descent minimization before the main FIRE minimization. Gradient Descent is not asymptotically fast, but it is excellent for preconditioning a 'better' minimizer.
  3. I have replaced FIRE with L-BFGS. Unlike FIRE, L-BFGS does not modify the box vectors (even when the pressure is not None) and it's a fine minimizer anyway.

Todos

  • Implement feature / fix bug
  • Add tests
  • Update documentation as needed
  • Update changelogNotable points that this PR has either accomplished or will accomplish.

Status

  • Ready to go

@jfennick
Copy link
Contributor Author

jfennick commented Mar 9, 2022

See also choderalab/yank#1267

@mikemhenry
Copy link
Contributor

@jfennick Thanks for this! I'll have @jchodera take a look!

@jfennick
Copy link
Contributor Author

jfennick commented Mar 9, 2022

FYI I just realized that I accidentally named this branch stable_equilibration instead of stable_minimization.

@jfennick jfennick changed the title Stable equilibration Stable minimization Mar 10, 2022
@jchodera jchodera self-requested a review March 17, 2022 17:09
@jchodera jchodera self-assigned this Mar 17, 2022
@jchodera jchodera changed the title Stable minimization Enhance stability of minimization Mar 17, 2022
@ijpulidos ijpulidos self-requested a review April 4, 2022 20:39
@ijpulidos ijpulidos added this to the 0.21.3 milestone Apr 4, 2022
Copy link
Member

@jchodera jchodera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ijpulidos : This is a great step forward, but there are a few risky things in here we should address before pulling it into a bugfix release. I'll look into this in more detail.

thermodynamic_state.pressure = None

# Use Gradient Descent first for numerical stability
integrator_grad = GradientDescentMinimizationIntegrator()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial step size here is not guaranteed to be robust. I wonder if there's a way we can automatically select a more robust initial step size, or if this is not a critical issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some of my other work with gromacs, I use emstep 0.0001 (nanometers) which is 10 times smaller than the openmmtools default of 0.01 Angstroms. gromacs also has an adaptive step size, and I have some plots showing that starting with 0.001 Angstroms, the adaptive step size increases for the first ~25 iterations before stabilizing. In other words, you're only 'wasting' at most ~25 iterations, which isn't even a rounding error in the total runtime. Of course, those iterations are well spent if they can prevent minimization failures.


# Use Gradient Descent first for numerical stability
integrator_grad = GradientDescentMinimizationIntegrator()
context_grad = thermodynamic_state.create_context(integrator_grad)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good that we're creating the context and then cleaning it up again, since we don't repeatedly re-use the minimization context, but it would be useful if we also specified platform properties to ensure we use mixed precision if needed for the propagation or energy computation context caches.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@jchodera jchodera removed this from the 0.21.3 milestone Apr 4, 2022
@jfennick
Copy link
Contributor Author

I'm not sure why this is failing CI.
pytest openmmtools/tests/test_sampling.py -k 'test_minimize'
works on my machine.

@ijpulidos ijpulidos requested a review from jchodera June 1, 2022 22:40
@jfennick
Copy link
Contributor Author

jfennick commented Jun 3, 2022

Are there any thoughts on this PR? Again, disabling the barostat is really the only thing that is critical, which is why I originally separated it out into it's own 3-line commit. Perhaps it was a mistake to add additional enhancements. Can we at least merge the pressure bug in the first commit?

@ijpulidos ijpulidos added this to the 0.22.0 milestone Jun 3, 2022
@ijpulidos
Copy link
Contributor

@jfennick Thanks for your contributions. I'm tagging @jchodera for him to take another look into these, sine we need to see if the previously raised concerns are solved. I will also review this in detail, since this can potentially change many things in our calculations. For now I'm marking this to be released on our 0.22 milestone.

@ijpulidos
Copy link
Contributor

Resolves #668

@ijpulidos ijpulidos removed this from the 0.22.0 milestone Apr 6, 2023
@ijpulidos ijpulidos added this to the 0.22.1 milestone Apr 6, 2023
@ijpulidos ijpulidos removed this from the 0.22.1 milestone May 31, 2023
@jfennick jfennick closed this by deleting the head repository Aug 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants