Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducibility and Stability of ABFE Calculations #18

Open
fjclark opened this issue Jan 11, 2024 · 7 comments
Open

Reproducibility and Stability of ABFE Calculations #18

fjclark opened this issue Jan 11, 2024 · 7 comments

Comments

@fjclark
Copy link
Contributor

fjclark commented Jan 11, 2024

I've had a quick shot at running ABFEs with SOMD2. The overall experience was a massive step up from SOMD! The recent sire documentation was also super clear and comprehensive. However, I've had a couple of issues:

  • Reproducing the results for the SOMD1 free leg
  • Crashes at lambda = 1 during the free leg

Overall Set-Up

  • The system is T4L lysozyme/ benzene, chosen since this is a standard ABFE test system, and I can get good results with 200 ps simulations with SOMD1. The input is the same as used for my SOMD1 calculations, and has been equilibrated for 5 ns.
  • I've created the merged systems by mocking RBFE - I created a copy of the ligand, deleted the charge and LJ properties, and set the atom types to du. I then followed the typical RBFE procedure in BSS. I did this because a) SOMD2 checks for perturbable, and not decoupled molecules, and b) I ran into errors relating to missing properties if I just added the "is_perturbable" property to the decoupled ligand (prepared as standard for ABFE through BSS). See the relevant script in the input files below.
  • I've used exactly the same Boresch restraints for both the SOMD1 and SOMD2 calculations, which were generated using the BSS algorithm.

To Reproduce

Input files and scripts: reproduce_somd2_abfe.tar.gz

  1. Install my feature_abfe branch of SOMD2 (tiny modifications to allow a Boresch restraint to be passed in)
  2. Uncompress input files above and cd reproduce_somd2_abfe
  3. Run the free leg with python run_somd2.py free input/pert_system_free.bss, or bound with python run_somd2.py bound input/pert_system_bound.bss

Optionally, recreate the perturbed systems from the SOMD1 input with:

cd input
python prep_pert_systems.py free free.prm7 free.rst7 
python prep_pert_systems.py bound bound.prm7 bound.rst7

Because I ran my tests before the issue with installing the 2024 conda packages was fixed, I'm using:

biosimspace               2023.5.0                py311_0    openbiosim
sire                      2023.5.1        py311h0666ab4_0    openbiosim
python                    3.11.7          hab00c5b_1_cpython    conda-forge

Running on Ubuntu 22.04.

SOMD1 Results

Experimental free energy: -5.19 +/- 0.16 kcal mol-1 (https://pubs.acs.org/doi/epdf/10.1021/ct060037v)

All uncertainties are 95 % t-based CIs from 5 replicate runs.
Used default lambda values shown here
Used these SOMD settings. 12 A cutoff with RF.
Analysed with SOMD "native" MBAR script, no subsampling.

Free energy of releasing restraint to standard state volume: -7.08 kcal mol-1
Symmetry correction for restraint: - kTln2 = -0.41 kcal mol-1 (as rotation around axis of 6-fold symmetry occurs as the restraints are turned on, but flipping of the ring is not sampled).

200 ps per window, no equilibration

Free: -5.07 +/- 0.08 kcal mol-1
Bound: 6.86 +/-0.30 kcal mol-1
Overall = free - bound - restraint_corr + symmetry_corr = -5.07 - 6.86 + 7.08 - 0.41 = -5.26 +/- 0.35 kcal mol-1

In good agreement with experiment.

30 ns per window, 10 ns of which discarded to equilibration

Free: -5.18 +/- 0.02 kcal mol-1
Bound: 5.52 +/- 0.80 kcal mol-1
Overall: -4.03 +/- 0.80 kcal mol-1

Larger error and drift to more positive free energies due to water starting to move into the binding site for some of the runs towards lambda = 1 in the bound vanish leg.

SOMD2 Results

Bound Leg

SOMD2 defaults other than RF with 12 A cutoff and 100 ps runs

Repeat 1: 6.2 +/- 0.3 kcal mol-1
Repeat 2: 6.9 +/- 0.4 kcal mol-1
Repeat 3: 7.1 +/- 0.4 kcal mol-1

In good agreement with the SOMD1 results for 200 ps, at least within the large errors.

The overlap is more than enough:

image

and the ligand clearly stays in the binding site when fully decoupled.

As above but without restraints

-1.8 +/- 0.2 kcal mol-1

The with_restraints - without_restraints difference is in the expected order of magnitude.

SOMD2 defaults other than RF with 12 A cutoff, 5 ns runs, and 10 ps energy frequency

Repeat 1: 5.5 +/- 0.2 kcal mol-1
Repeat 2: 5.3 +/- 0.2 kcal mol-1
Repeat 3: 5.3 +/- 0.2 kcal mol-1

A slight decrease compared to the shorter runs, as expected, and in good agreement with the SOMD1 30 ns runs.

Free Leg

SOMD2 defaults other than RF with 12 A cutoff and 100 ps runs

Repeat 1: -9.1 +/- 0.6 kcal mol-1
Repeat 2: -7.8 +/- 0.6 kcal mol-1
Repeat 3: -8.3 +/- 0.6 kcal mol-1

Substantially more negative than for SOMD1, producing a substantially more negative free energy of binding compared to experiment. Also, substantially greater variability in results than expected. As expected from the bound leg, the overlap was good:

image

SOMD2 defaults other than RF with 12 A cutoff, 5 ns runs, and 10 ps energy frequency

Repeat 1: -8.8 kcal mol-1

Attempted 5 more repeats, but all failed at lambda = 1.0 with:

image

or

image

Tried:

  • Dropping the timestep to 0.5 fs (still crashed)
  • Running with perturbable_constraints="none" and 0.5 fs timestep (still crashed)

Investigating lambda = 1 crashes with bound leg

  • Completed 5 further 5 ns runs with the bound leg at lambda = 1 - no crashes observed
  • Turned off the restraints - this introduced the lambda = 1 crashes (over three repeats, one crash and two warnings as above).

So we get crashes when the ligand is non-interacting which are prevented by coupling the non-interacting ligand to interacting particles - seems like energy is building up in the decoupled ligand. To try and dissipate this energy, I tried running tried free leg 5 ns runs with a Brownian integrator - this seemed to reduce the frequency of crashes - as 4 / 6 runs completed.

Questions and Comments on Software

A few questions and things I ran into while setting up and running the simulations, not all of which are related to SOMD2:

  • In sire, I got an error when trying to convert atoms in the perturbed molecule to vectors. This makes sense, but it would be helpful if the error message stated that this was why.
  • Might be helpful to have a more flexible BoreschRestraints object in sire which allows you to input the equilibrium values, rather than measuring them from the atoms supplied
  • How best to check the type of the restraints object in the SOMD2 config? BoreschRestraints didn't seem to be an instance of Restraints. Is there a better way than checking against all possible types of restraints?
  • Properties set by BSS for ABFEs. The standard Exscientia sandpit ABFE workflow sets the ligand property is_decoupled rather than is_perturbed and sets the charges, LJ terms, and atom types as end-state 1 properties in the ligand. However, as mentioned above, SOMD2 only checks for is_perturbed and I got a sire error (not all required properties present) when I just set the is_perturbed property in the decoupled ligand returned by the standard BSS workflow for ABFE. Given that the ABFE and RBFE workflows are very similar in BSS (handled by the same class) maybe it would be be better to have the properties set in a more similar way (e.g. have is_perturbed for the ABFE ligand).
  • Where should the restraint corrections be calculated? Given that sire can now hold Boresch restraints, maybe the correction calculation code should be moved from BSS to sire.
  • How will complex lambda schedules and Boresch restraints be input into the SOMD2 config?

Thanks!

@lohedges
Copy link
Contributor

Thanks for the details, this is really helpful. Hopefully we can discuss most of this in the meeting tomorrow.

Regarding the flexible lambda schedules, this is actually something that I've been meaning to talk to you about. I have a local feature branch that allows the user to explicitly specify the lambda windows. This means that you can run windows independently, or fill in gaps if you find that more sampling is required in certain regions. This works fine, but there are issues combining the data to form a consistent energy trajectory. (It works if all windows are run the first time around, but not if you later add additional windows.)

@fjclark
Copy link
Contributor Author

fjclark commented Mar 15, 2024

Following up on our discussion today:

As Christopher suggested, I attempted to rerun these simulations with SOMD2 to check for instabilities. Initially, I found that everything was crashing during minimisation. This was due to the mapping failing in prep_pert_systems.py. Printing the mapping for this ligand:

ghost_lig = decouple(lig, intramol=False)
mapping = BSS.Align.matchAtoms(lig, ghost_lig)

Showed that most of the atoms were not being matched ({16: 16, 17: 17, 18: 18, 19: 19, 21: 21, 29: 29, 30: 30, 31: 31}). However, when I instead did:

mapping = BSS.Align.matchAtoms(lig, lig)

I got the expected mapping with all 32 atoms matched, showing that the mapping failure was due to me changing the charge, LJ, element, or amber_type properties of the end state molecule in decouple. I couldn't see an obvious pattern in the atoms which were/ weren't matched:

[<BioSimSpace.Atom: name='C', molecule=13154, index=0>,
 <BioSimSpace.Atom: name='CHBR', molecule=13154, index=1>,
 <BioSimSpace.Atom: name='CPOI', molecule=13154, index=2>,
 <BioSimSpace.Atom: name='CG8F', molecule=13154, index=3>,
 <BioSimSpace.Atom: name='C1CB', molecule=13154, index=4>,
 <BioSimSpace.Atom: name='CFNO', molecule=13154, index=5>,
 <BioSimSpace.Atom: name='Cl', molecule=13154, index=6>,
 <BioSimSpace.Atom: name='C6B9', molecule=13154, index=7>,
 <BioSimSpace.Atom: name='O', molecule=13154, index=8>,
 <BioSimSpace.Atom: name='N', molecule=13154, index=9>,
 <BioSimSpace.Atom: name='CM80', molecule=13154, index=10>,
 <BioSimSpace.Atom: name='CO2R', molecule=13154, index=11>,
 <BioSimSpace.Atom: name='CAK1', molecule=13154, index=12>,
 <BioSimSpace.Atom: name='NVRJ', molecule=13154, index=13>,
 <BioSimSpace.Atom: name='CNVG', molecule=13154, index=14>,
 <BioSimSpace.Atom: name='CFYG', molecule=13154, index=15>,
 <BioSimSpace.Atom: name='NWWQ', molecule=13154, index=16>,
 <BioSimSpace.Atom: name='CC38', molecule=13154, index=17>,
 <BioSimSpace.Atom: name='OHYF', molecule=13154, index=18>,
 <BioSimSpace.Atom: name='C9SX', molecule=13154, index=19>,
 <BioSimSpace.Atom: name='ClME', molecule=13154, index=20>,
 <BioSimSpace.Atom: name='H', molecule=13154, index=21>,
 <BioSimSpace.Atom: name='HCOS', molecule=13154, index=22>,
 <BioSimSpace.Atom: name='HFOG', molecule=13154, index=23>,
 <BioSimSpace.Atom: name='HYR3', molecule=13154, index=24>,
 <BioSimSpace.Atom: name='HXKX', molecule=13154, index=25>,
 <BioSimSpace.Atom: name='HWNR', molecule=13154, index=26>,
 <BioSimSpace.Atom: name='HEK8', molecule=13154, index=27>,
 <BioSimSpace.Atom: name='HPK3', molecule=13154, index=28>,
 <BioSimSpace.Atom: name='HYR9', molecule=13154, index=29>,
 <BioSimSpace.Atom: name='HOUD', molecule=13154, index=30>,
 <BioSimSpace.Atom: name='HOCU', molecule=13154, index=31>]

For benzene (as in original example in this issue), mapping = BSS.Align.matchAtoms(lig, ghost_lig) worked as expected and matched all atoms.

Obviously, this is not a problem for me since it is easy to manually create the mapping, but it would be interesting to know why this is failing. The input is: gh_issue_mapping.tar.gz. To reproduce, run python prep_pert_systems.py bound somd.prm7 somd.rst7 (note that the ligand is fourth in this system, while benzene in the original issue is 1st, but I've changed the indices in the supplied prep_pert_systems.pyaccordingly).

Thanks.

@lohedges
Copy link
Contributor

I've tested and the mapping works if you don't modify the element property, i.e. don't convert the atoms to dummies. This must be used in the RDKit MCS matching criteria. Weirdly, it only appears to check the elements if AtomCompare.CompareAny is not set, which isn't the case for BioSimSpace. (See here for the RDKit setup.) Perhaps not having the correct elements messes up the inference of other atom properties that are required for the match.

@lohedges
Copy link
Contributor

For benzene, perhaps the MCS can correctly be mapped from the bonding alone.

@fjclark
Copy link
Contributor Author

fjclark commented Mar 22, 2024

I've done some more investigations into the crashes at lambda =1. It turned out that the integrator wasn't being passed though to the underlying sire dynamics before. For each set of parameters, I use 10 replicates, and make modifications with respect to the config:

    config = Config(
        log_file="somd2.log",
        runtime="5 ns",
        temperature="298 K",
        cutoff_type="RF",
        cutoff="12 A",
        num_lambda=2,
        energy_frequency="10 ps",
        restraints=boresch_rest,
        lambda_schedule=l,
        perturbable_constraint="h_bonds_not_heavy_perturbed",
        output_directory=output_dir,
        run_parallel=False,
        integrator="langevin_middle",
        timestep="4 fs",
    )

I am using sire and biosimspace 2024.1.0.dev, and all tests are for the free benzene in the inputs above. I am using two lambda windows to concentrate sampling at lambda =1, while keeping the sanity check of lambda = 0 .

  • Default config 8/10 Nan fails at lam =1. I could see that the decoupled ligand was making pretty large jumps between frames, so decided to try playing with the friction coefficient to see if slowing it down fixed things.
  • Friction coefficient of 1E6 ps-1 3/10 failures at lam =1. Possibly an improvement, but need more repeats to be sure. Ligand moves very little between frames.
  • Friction coefficient of 1E-5 ps -1 7/10 NaN failures at lam=1.
  • brownian integrator Everything blows up on the first step of dynamics for all windows

I also got crashes for some quick NoseHoover and Andersen runs.

I also tried reproducing the earlier results with and without restraints for the bound leg, with more repeats:

Bound leg, default config with restraints: No crashes, 10 runs.
Bound leg, no restraints: 6/10 runs NaN crash at lambda =1.

I tried a Nose Hoover integrator on the whole free system, with a view to implementing @jmichel80's suggestion (OpenBioSim/sire#100 (comment)). I realised that non-stochastic integrators aren't suitable for the decoupled ligand, though - it just floats in a straight line according to the initial velocity.

So the crashes are fully prevented by "stiff" coupling to the protein with intermolecular restraints, but not by strongly damping the velocities. I tried to see if leaving one carbon with full charges and LJ terms sorted the crashes, but I still got 4/10 runs crashing with NaN.

To see if I could see high bond energies in the ligand at lambda = 1, I tried printing out the energy components every 10 ps for a ns, with:

or i in range(100):
                        kinetic_e = self._dyn.current_kinetic_energy()
                        potential_e = self._dyn.current_potential_energy()
                        _logger.info(f"Curr KE: {kinetic_e}, Curr PE: {potential_e}")
                        lig_energies = self._system[0].energy().components()
                        _logger.info(f"Lig energies: {lig_energies}")
                        _logger.info(f"Taking step {i} of 100")
                        self._dyn.run("10 ps")
                        self._system = self._dyn.commit()

But I was confused by the output, which did not show any bonded component, but seemed to show substantial intramolecular coulomb and LJ terms which were similar at lambda =0 and 1:

image

I assume I'm doing something wrong when printing the system energies, and I need to tell it which state it's at?

I think the next thing to try is checking that the expected terms are in the lambda =1 Hamiltonian, and trying Chrisotopher's decouple/ annihilate code, as he suggested before.

@lohedges
Copy link
Contributor

The system attribute of the Runner is linked to the reference (lambda = 0) system (see here) so all properties will be linked to those with a suffix of 0, e.g. bond0. When the dynamics are commited, the current coordinates and velocities from the OpenMM system are copied into the system. It's the dynamics object itself that does the perturbation to different lambdas, but this is only applied to the OpenMM context.

To expose the perturbed state, you will probably want to do something like the following in your loop when the lambda index corresponds to the lambda = 1 state:

perturbed = sr.morph.link_to_perturbed(self._system)
lig_energies = perturbed[0].energy().components()

(For reference, the various sire perturbation functions are found here.)

I'm not sure about the missing bonded components. I'll look at some merged systems locally to see what I get.

@fjclark
Copy link
Contributor Author

fjclark commented Mar 22, 2024

Thanks for the quick response and links. perturbed[0].energy() gives me 0 (and hence an empty dict for perturbed[0].energy().components()). I'll take a look this again next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants