Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

somd-freenrg crashes when ligand atoms are completely non-interacting #100

Open
fjclark opened this issue Aug 25, 2023 · 26 comments
Open

somd-freenrg crashes when ligand atoms are completely non-interacting #100

fjclark opened this issue Aug 25, 2023 · 26 comments
Labels
bug Something isn't working
Milestone

Comments

@fjclark
Copy link
Collaborator

fjclark commented Aug 25, 2023

Describe the bug
I've recently been trying to implement multiple distance restraints for absolute binding free energy calculations in BioSimSpace and SOMD (modified BSS, modified sire, and an example of how to run a complete calculation). The idea is that the ligand intermolecular interactions are removed with many distance restraints active, then there's a final stage where all but one of the restraints are released. The correction for releasing the ligand to the standard state volume can then easily be calculated.

During the final release stage, where the ligand is completely non-interacting I've been getting crashes. The config file I was using with my modified version of sire was:

save coordinates = True
ncycles = 50
nmoves = 25000
ncycles_per_snap = 4
buffered coordinates frequency = 500
timestep = 4 * femtosecond
reaction field dielectric = 78.4
cutoff type = cutoffperiodic
cutoff distance = 12 * angstrom
barostat = True
pressure = 1.00000 atm
thermostat = True
temperature = 300.00 kelvin
constraint = hbonds-notperturbed
energy frequency = 100
lambda array = (0.0, 0.125, 0.25, 0.375, 0.5, 1.0)
lambda_val = 0.0
perturbed residue number = 293
random seed = -1
gpu = 0
center solute = False
minimise = True
hydrogen mass repartitioning factor = 3
use distance restraints = True
use permanent distance restraints = True
distance restraints dictionary = {(1474, 4687): (4.432198728689354, 20.0, 0.5557321930560404), (1472, 4686): (3.164351608202915, 20.0, 0.6751681461642662), (1474, 4690): (4.700974930995326, 20.0, 0.6701990692583388), (1488, 4688): (6.7858681626072315, 20.0, 0.7488970818377805), (2444, 4677): (4.863475388033847, 20.0, 0.7671705708955612), (1472, 4691): (6.367026275053535, 20.0, 0.821252569114769), (2261, 4683): (6.773986513796716, 20.0, 0.8828616026118992), (1490, 4685): (7.720582022157998, 20.0, 0.9347068069890287), (286, 4674): (3.8127377709679453, 20.0, 0.9511268213071902), (374, 4680): (6.181106731821419, 20.0, 0.9637979205880889), (1521, 4692): (3.7598878700775633, 20.0, 0.9830902561404447), (1529, 4684): (7.87209938990775, 20.0, 1.0582870196044247), (374, 4693): (4.975946331072206, 20.0, 1.063773425343931), (346, 4676): (8.470528418491696, 20.0, 1.0492604125738865), (284, 4675): (4.725973194492003, 20.0, 1.1798376555744614), (390, 4682): (8.350568045090748, 20.0, 1.2118321370763052), (2436, 4679): (4.32826597759919, 20.0, 1.2565532487514752), (307, 4678): (7.55090821416431, 20.0, 1.3127014576568437), (1421, 4681): (8.80522670899781, 20.0, 1.3688291245090927), (259, 4673): (9.195983552051707, 20.0, 2.1342422686136686)}
permanent distance restraints dictionary = {(1474, 4689): (3.875183995573175, 20.0, 0.5185270232355128)}
turn on receptor-ligand restraints mode = True

And the pert file just mapped dummy atoms to dummy atoms, e.g.:

version 1
molecule LIG
    atom
        name           C
        initial_type   du
        final_type     du
        initial_LJ     0.00000 0.00000
        final_LJ       0.00000 0.00000
        initial_charge 0.00000
        final_charge   0.00000
    endatom
    atom
        name           C1
        initial_type   du
        final_type     du
        initial_LJ     0.00000 0.00000
        final_LJ       0.00000 0.00000
        initial_charge 0.00000
        final_charge   0.00000
    endatom
...

I still observed crashes at random values of lambda if I dropped the timestep to 1 fs and if I switched to Langevin dynamics.

To check if this was due to changes I had made to sire or the restraints, I ran some sets of calculations with Sire 2023.4.0 (a073701/GitHub/2023-08-07T17:45:41+01:00 | UNCLEAN) but with ligand still non-interacting and with no restraints. Please see the input files. For each test I ran four repeats of each window. I got two crashes at lambda = 0 (Nan) and all simulations at lambda =1 resulted in Segmentation fault (core dumped) somd-freenrg -C somd.cfg -p CUDA -m somd.pert -c somd.rst7 -t somd.prm7.

Have I chosen some configuration parameters poorly, or should this not be happening?

Also, when I watch the trajectories back, the ligand makes large jumps after every 50th frame. There are 600 frames, and 600 / 50 =12 and ncycles / ncycles_per_snap = 12.5, so I assume this is an artifact of how the trajectories are written?

To Reproduce
Please see the input files. Run run.sh. with sire 2023.4.0. (I observed two crashes in four repeats of the stage).

Expected behavior
Stable MD.

(please complete the following information):

  • OS: Ubuntu 22.04
  • Version of Python: 3.10.12
  • Version of sire: 2023.4.0 (a073701/GitHub/2023-08-07T17:45:41+01:00 | UNCLEAN)
  • I confirm that I have checked this bug still exists in the latest released version of sire: [no] (As I would rather avoid rerunning too many simulations unless there have been very recent changes which are likely to affect this)

Thank you.

@fjclark fjclark added the bug Something isn't working label Aug 25, 2023
@jmichel80
Copy link
Contributor

hi @fjclark - do you also observe crashes with MIF inputs from the MDR paper repo (that presumably did not cause runtime errors with earlier versions of sire used for the MDR paper) ?

@lohedges
Copy link
Contributor

Can confirm that I see the same segmentation fault at lambda = 1. Will see if I can trace it.

@lohedges
Copy link
Contributor

Just to note that I don't see a mapping from dummy ambertype to dummy ambertype in your example pert files, but this doesn't seem to be the source of the problem:

version 1
molecule LIG
    atom
        name           C
        initial_type   C3
        final_type     du
        initial_LJ     0.00000 0.00000
        final_LJ       0.00000 0.00000
        initial_charge 0.00000
        final_charge   0.00000
    endatom
    atom
        name           C1
        initial_type   C3
        final_type     du
        initial_LJ     0.00000 0.00000
        final_LJ       0.00000 0.00000
        initial_charge 0.00000
        final_charge   0.00000
    endatom
...

@fjclark
Copy link
Collaborator Author

fjclark commented Aug 25, 2023

hi @fjclark - do you also observe crashes with MIF inputs from the MDR paper repo (that presumably did not cause runtime errors with earlier versions of sire used for the MDR paper) ?

Hi @jmichel80 I'll check now (yes I did not observe these crashes previously).

@lohedges
Copy link
Contributor

lohedges commented Aug 25, 2023

For lambda = 1, I can confirm that the segfault occurs on this line of OpenMMMD.py. I'll add some debugging statements to the C++ code to see where it's blowing up. (Obviously this bit works at lambda = 0, so NaNs there might be completely unrelated.)

@fjclark
Copy link
Collaborator Author

fjclark commented Aug 25, 2023

Just to note that I don't see a mapping from dummy ambertype to dummy ambertype in your example pert files, but this doesn't seem to be the source of the problem:

version 1
molecule LIG
    atom
        name           C
        initial_type   C3
        final_type     du
        initial_LJ     0.00000 0.00000
        final_LJ       0.00000 0.00000
        initial_charge 0.00000
        final_charge   0.00000
    endatom
    atom
        name           C1
        initial_type   C3
        final_type     du
        initial_LJ     0.00000 0.00000
        final_LJ       0.00000 0.00000
        initial_charge 0.00000
        final_charge   0.00000
    endatom
...

Sorry, I should have explained - this is because the standard version of sire doesn't allow atoms to start and end as dummy atoms, so I've had to modify the pert file. For my modified version of sire, this is allowed (4513564) and the pert file is as shown in my original post.

@lohedges
Copy link
Contributor

Also, the pert files are missing an endmolecule record at the end, but again this doesn't seem to affect the segfault. Looking at the output of the parser it seems to get to the initial_charge of the final atom, then crashes:

...
"name O"
"initial_type O3"
"final_type du"
"initial_LJ 3.03981 0.21021"
"final_LJ 0.00000 0.00000"
"initial_charge -0.00000"
"final_charge -0.00000"
"endatom"
"atom"
"name O1"
"initial_type O3"
"final_type du"
"initial_LJ 0.00000 0.00000"
"final_LJ 0.00000 0.00000"
"initial_charge0.00000"
[1]    110167 segmentation fault (core dumped)  somd-freenrg -C somd.cfg -t somd.prm7 -c somd.rst7 -m somd.pert -p CUDA

@lohedges
Copy link
Contributor

Ah, I see, there is no space in initial_charge0.00000 for the final record. Printing the line and the words (the records after the line is split) makes it clear what's going wrong. Not sure why I didn't spot this above 🤦‍♂️

("atom")
"name O"
("name", "O")
"initial_type O3"
("initial_type", "O3")
"final_type du"
("final_type", "du")
"initial_LJ 3.03981 0.21021"
("initial_LJ", "3.03981", "0.21021")
"final_LJ 0.00000 0.00000"
("final_LJ", "0.00000", "0.00000")
"initial_charge -0.00000"
("initial_charge", "-0.00000")
"final_charge -0.00000"
("final_charge", "-0.00000")
"endatom"
("endatom")
"atom"
("atom")
"name O1"
("name", "O1")
"initial_type O3"
("initial_type", "O3")
"final_type du"
("final_type", "du")
"initial_LJ 0.00000 0.00000"
("initial_LJ", "0.00000", "0.00000")
"final_LJ 0.00000 0.00000"
("final_LJ", "0.00000", "0.00000")
"initial_charge0.00000"
("initial_charge0.00000")
[1]    114032 segmentation fault (core dumped)  somd-freenrg -C somd.cfg -t somd.prm7 -c somd.rst7 -m somd.pert -p CUDA

After fixing this, the lambda = 1 window runs. Not sure if will NaN at a later date, though.

@fjclark
Copy link
Collaborator Author

fjclark commented Aug 25, 2023

Sorry about this - it looks like I've messed up editing a few of the pert files so that initial LJ terms remain for O. The pert file does seem to be correct for lambda = 0 though, where the Nans occur.

@lohedges
Copy link
Contributor

No problem. I'll look at that one to see if I can get it to fail.

@lohedges
Copy link
Contributor

No crashes yet, but I can confirm that I see the ligand "jump" after 50 frames. It almost looks like everything is slightly rotated within the box. It's visible for the protein and ions too, but harder to tell when waters are included. Not sure if this is some buffering error, i.e. there is a larger jump in time between frames 49-50, 99-100, ..., than any other pair of frames, or whether something else is going on.

@fjclark
Copy link
Collaborator Author

fjclark commented Sep 15, 2023

It seems like this is due to the use of OFF vs GAFF. When I reparameterise the ligand with GAFF2, keeping the initial coordinates identical, I haven't yet observed any crashes. I think the original ligand FF is OFF2 (set up by a collaborator - I have asked for confirmation). Tests run:

  • Applying intermolecular restraints to the decoupled ligand (requires my modified version of Sire - happy to supply inputs if needed). OFF system: 2/ 6 simulations crash with Nan. GAFF2 system: 0 / 18 simulations crash. Using constraint=hbonds-notperturbed.
  • Setting constraint=allbonds. OFF system: immediately crashes in all cases. GAFF2 system: runs as expected in all cases.

The input systems are here.

This is another example of the issue discussed here, where constraint=allbonds causes OFF systems to crash.

Thanks.

@chryswoods chryswoods added this to the 2023.5.0 milestone Oct 7, 2023
@chryswoods
Copy link
Contributor

Is this still an open issue? I'm doing some spring cleaning ;-)

@fjclark
Copy link
Collaborator Author

fjclark commented Oct 26, 2023

I'm afraid this is still an open issue - using GAFF instead of any OFF seems to improve things substantially, but with more tests I still get the occasional crash with a fully decoupled ligand. I would also prefer to use OFF2 with constraint=allbonds, but this always causes crashes.

@chryswoods
Copy link
Contributor

Would it be worth trying this in somd2? If you don't see the crash then we can debug where in somd things are going wrong. And if you still see the crash then this is definitely something we need to fix in somd2 :-)

@fjclark
Copy link
Collaborator Author

fjclark commented Feb 12, 2024

Yes, definitely. As discussed, I'll have a shot in the next few weeks.

@fjclark
Copy link
Collaborator Author

fjclark commented Mar 19, 2024

I've tried some similar tests with SOMD2:

Overall Set-Up

Software

BioSimSpace: 2024.1.0.dev
Sire: 2024.1.0.dev
SOMD2: my modified SOMD2 branch, which has only very minor changes from SOMD2 main, which I merged in this morning before installing.
python: 3.11.8
OS: Ubuntu 22.04

Tests

GAFF2

Sanity check with many windows

Carried out once to check that we got a free energy change of roughly the expected size:

barostat_frequency: 25
charge_scale_factor: 0.2
checkpoint_frequency: 100 ps
com_reset_frequency: 10
constraint: h_bonds
coulomb_power: 0.0
cutoff: "12 \xC5"
cutoff_type: rf
dynamic_constraints: true
energy_frequency: 10 ps
equilibration_time: '0 '
equilibration_timestep: 0.001 ps
frame_frequency: 20 ps
h_mass_factor: 1.5
include_constrained_energies: false
integrator: langevin_middle
lambda_schedule: "LambdaSchedule(\n  morph: (-\u03BB + 1) * initial + \u03BB * final\n\
  \    restraint: 1\n)"
log_file: somd2.log
log_level: info
max_gpus: 1
max_threads: 20
minimise: true
num_lambda: 34
output_directory: output
overwrite: false
pert_file: null
**perturbable_constraint: bonds_not_heavy_perturbed**
platform: cuda
pressure: 1 atm
restart: false
**restraints: null**
run_parallel: true
**runtime: 100 ps**
save_trajectories: true
save_velocities: false
shift_delta: "2 \xC5"
somd1_compatibility: false
swap_end_states: false
temperature: 298 K
**timestep: 0.004 ps**
  • No crashes observed
  • Free energy change of roughly expected magnitude (~ 340 kcal / mol).

"Standard" Settings

10 repeats with two windows each:

barostat_frequency: 25
charge_scale_factor: 0.2
checkpoint_frequency: 100 ps
com_reset_frequency: 10
constraint: h_bonds
coulomb_power: 0.0
cutoff: "12 \xC5"
cutoff_type: rf
dynamic_constraints: true
energy_frequency: 10 ps
equilibration_time: '0 '
equilibration_timestep: 0.001 ps
frame_frequency: 20 ps
h_mass_factor: 1.5
include_constrained_energies: false
integrator: langevin_middle
lambda_schedule: "LambdaSchedule(\n  morph: (-\u03BB + 1) * initial + final * \u03BB\
  \n    restraint: 1\n)"
log_file: somd2.log
log_level: info
max_gpus: 1
max_threads: 20
minimise: true
num_lambda: 2
output_directory: output_1
overwrite: false
pert_file: null
**perturbable_constraint: bonds_not_heavy_perturbed**
platform: cuda
pressure: 1 atm
restart: false
**restraints: null**
run_parallel: true
**runtime: 100 ps**
save_trajectories: true
save_velocities: false
shift_delta: "2 \xC5"
somd1_compatibility: false
swap_end_states: false
temperature: 298 K
**timestep: 0.004 ps**
  • NaN crash observed in 1/10 runs
    • Crash observed at lambda = 1, so likely same issue as discussed in the somd2 ABFE issue where the molecule is not properly coupled to the heat bath (issue was slightly improved by using a Brownian integrator)

OFF2

"Standard" Settings

10 repeats with two windows each, same config as for GAFF2.

  • NaN crashes observed for all runs for all lambda windows during dynamics

With timestep of 0.5 fs

Config the same as "standard", but with a 0.5 fs timestep.

  • NaN crashes for all runs for all lambda windows during dynamics

With timestep of 0.5 fs and perturbable_constraint=none

Config the same as "standard", but with a 0.5 fs timestep and constraints removed via "perturbable_constraint=none".

  • NaN crash observed in 3/10 runs
  • Crashes only observed at lambda =1

Overall Points

  • Issues are broadly the same as SOMD1
  • GAFF2 simulations are generally much more stable than OFF2 simulations
  • GAFF2
    • Fairly stable with bonds_not_heavy_perturbed
    • A single crash observed when the ligand was fully decoupled, possibly the same cause as that observed for Benzene in the SOMD2 issue
  • OFF2
    • Very unstable with bonds_not_heavy_perturbed - all dynamics crash with NaN at lambda = 0 and 1. My understanding is that bonds_not_heavy_perturbed constrains heavy bonds, so this seems to be the same issue as when we use constraint=allbonds with SOMD1
    • Crashes at lambda=0 removed by using perturbable_constraint=none, but several crashes at lambda =1 persist, so still seems less stable than GAFF2.

Conclusion

Constraints on heavy bonds cause the OFF2 system to blow up at all values of lambda, but can be used on the GAFF2 system. Without constraints on heavy bonds, both systems seem stable when the ligand is fully interacting, but show crashes when the ligand is fully decoupled (and unrestrained).

@lohedges
Copy link
Contributor

Could you try with h_bonds_not_perturbed? I found that bonds_not_perturbed would end up constraining everything for the tyk2 set and I also couldn't run anything. (I think this was possibly due to minimising with constraints for a non-equilibrated system).

Also, it might be worth checking what constraints are applied for the different force fields, and whether things are better if you use the somd1_compatibility_mode. (Once a dynamics object is created, you can use d.get_constraints() to print what bonds are constrained.)

@jmichel80
Copy link
Contributor

hi @fjclark - you could try using a Nose Hoover integrator and add a sub-thermostat to control the temperature of the solute separately from the rest of the system, see http://docs.openmm.org/development/api-python/generated/openmm.openmm.NoseHooverIntegrator.html

see also openmm/openmm#2910

@fjclark
Copy link
Collaborator Author

fjclark commented Mar 20, 2024

Thanks for the suggestions.

h_bonds constraints

@lohedges:

Could you try with h_bonds_not_perturbed?

OFF2 h_bonds_not_heavy_perturbed

Same as "standard" protocol other than perturbable_constraint=h_bonds_not_heavy_perturbed and 20 repeats.

  • This removes the instabilities at lambda != 1. NaNs at lambda =1 observed in 2/20 runs.

GAFF2 h_bonds_not_perturbed

  • No NaNs observed over 20 repeats. So possibly more stable with with bonds_not_heavy_perturbed for GAFF2, but can't say for sure as we only saw one crash in that case.

OFF2 with SOMD1 Compatibility Mode

  • Everything blows up during every dynamics attempt as before. As expected given that this happened with SOMD1.

OFF2 with SOMD1 Compatibility Mode and h_bonds_not_perturbed

  • 0/10 repeats crash.

Constraints Applied

There are 33 bonds in the ligand: 11 to H and 22 with only heavy atoms:

image

OFF2/GAFF2 with bonds_not_heavy_perturbed: All 33 bonds constrained
OFF2/GAFF2 with h_bonds_not_heavy_perturbed: 11 bonds to H constrained only

So as with SOMD1, constraints between heavy atoms are unusable with OFF2 but don't seem to make the GAFF2 simulations any less stable.

@chryswoods
Copy link
Contributor

I'm going to bump this to 2024.2.0 as we've made a lot of progress, but more is needed before we put in all the fixes for 2024.1.0

@chryswoods chryswoods modified the milestones: 2024.1.0, 2024.2.0 Mar 31, 2024
@chryswoods
Copy link
Contributor

I am bumping this to 2024.3.0. Could you check if the latest fixes in 2024.2.0.dev have fixed the instabilities. They have fixed instabilities in other systems.

@chryswoods chryswoods modified the milestones: 2024.2.0, 2024.3.0 Jun 22, 2024
@fjclark
Copy link
Collaborator Author

fjclark commented Jun 25, 2024

I retested the OFF2 simulations above (10 repeats, "standard" settings) with sire 2024.2.0.dev, but unfortunately all 10 simulations fail with NaN, as before.

@lohedges
Copy link
Contributor

Just to check. Are you still using the somd1 compatibility mode? All recent fixes are applied in that layer.

@fjclark
Copy link
Collaborator Author

fjclark commented Jun 25, 2024

Thanks - I wasn't. However, they all still crash when I turn on somd1 compatibility mode. Full config:

barostat_frequency: 25
charge_scale_factor: 0.2
checkpoint_frequency: 100 ps
com_reset_frequency: 10
constraint: h_bonds
coulomb_power: 1.0
cutoff: "12 \xC5"
cutoff_type: rf
dynamic_constraints: true
energy_frequency: 10 ps
equilibration_constraints: false
equilibration_time: '0 '
equilibration_timestep: 0.001 ps
frame_frequency: 20 ps
h_mass_factor: 1.5
include_constrained_energies: false
integrator: langevin_middle
lambda_energy: null
lambda_schedule: "LambdaSchedule(\n  morph: initial * (-\u03BB + 1) + final * \u03BB\
  \n    restraint: 1\n)"
lambda_values: null
log_file: somd2.log
log_level: info
max_gpus: 1
max_threads: 20
minimise: true
num_lambda: 2
output_directory: output_1
overwrite: false
pert_file: null
perturbable_constraint: h_bonds_not_heavy_perturbed
platform: cuda
pressure: 1 atm
restart: false
restraints:
- !!python/object/apply:sire.legacy.MM._MM.BoreschRestraints
  state: !!python/tuple
  - sire_pickle_data: AAABSnheuxoUfZOBgYGxY+J8v1knHwQA2UxIbEaQnI1h6Tp0NUA+MxADAeshKH0YSh9B0g9SA1UnFAiRFwqC0sEO/Axx8QbP/yOpZwHZb//lZn3StnW19t+XrDgYcrINSZ4VZJ4DS0n7n53Kix3Ya3YqL5pTYf9t0ZzyWdmBDiIQk5HUs4HMg4giAJI8O8J96KoYGGzs2XpAoqs0n50G+R/keIYihlSGYoYSIJ3IkMmQx1ACUwUAAs1Nbw==
    sire_pickle_version: 1
run_parallel: true
runtime: 100 ps
save_energy_components: false
save_trajectories: true
save_velocities: false
shift_delta: "1 \xC5"
somd1_compatibility: true
swap_end_states: false
temperature: 298 K
timestep: 0.004 ps
write_config: true

@lohedges
Copy link
Contributor

No problem. Thanks for confirming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants