Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bespokefit gives the torsion a worse fit than the OpenFF #304

Open
xiki-tempula opened this issue Dec 4, 2023 · 19 comments
Open

bespokefit gives the torsion a worse fit than the OpenFF #304

xiki-tempula opened this issue Dec 4, 2023 · 19 comments

Comments

@xiki-tempula
Copy link

xiki-tempula commented Dec 4, 2023

Description

So I have this ligand lig_ejm_55 from the TYK2 set from openff-benchmark. I found that the torsion profile that I got from bespokefit is a lot worse than the standard OpenFF one.

Here is the Torsion 5-11-17-13

Screenshot 2023-12-04 at 15 17 37

I generate the coordinate of this torsion profile with TorsionDrive

torsiondrive-launch xtbopt.xyz dihedrals.txt -g 15 -e xtb -v --native_opt

Where scan.xyz is the coordinate of this torsion profile.

Then I use psi4 (psi4.py) to compute the single point energy based on these coordinates to make the energy profile into energy.npy.

Then I use OpenMM (openmm.py) to generate the MM potential profile based on the same coordinate, where I obtained the Bespokefit FF from @jthorton #175 (comment), the combined.offxml.

Which gives the MM potential as bespokefit.npy and openff_unconstrained-2.1.npy.

Then I plot them with plot.py. However, it seems that the openFF is much closer to the target QM profile (B3LYP-D3(BJ)/DZVP) than the bespokefit torsion profile.
dihedral
Archive.zip

@j-wags
Copy link
Member

j-wags commented Dec 4, 2023

Do I understand correctly that these are minimum energy structures produced by XTB (presumably using GFN-2), which have their energies evaluated using B3LYP? If so, what may be happening is that the GFN-2 minima are somewhat different from the B3LYP minima, and so while the bespoke force field performs "worse" on the GFN-2 minima, it would be expected to perform "better" around the "truer" B3LYP minima that it was trained on.

Do you have the B3LYP minima to compare to? I expect that the energy profile would look better for those, and I wonder how different the geometries are.

(In a perfect world we'd make force fields that fit well for all parts of the energy surface, but short of that, we prioritize getting the minima right)

@xiki-tempula
Copy link
Author

Do I understand correctly that these are minimum energy structures produced by XTB (presumably using GFN-2), which have their energies evaluated using B3LYP?

Yes, but this seems to be really different, where the potential difference between the two minimum goes from 6 kcal/mol to 2 kcal/mol.

I tried to use TorsionDrive to optimise the geometry with psi4 but kind of struggle to get it to run. Since this is a ligand in the openff-benchmark and thus the QM data is already in QCArchive, do you mind check how good is the torsion fit for this particular torsion? Thanks.

@xiki-tempula
Copy link
Author

Ok, I still cannot got TorsionDrive to work with Psi4 but I did this. I got the xTB geometry optimised coordinate from each point. Then do a constrained geometry optimised where the dihedral has been constrained. So the coordinate shall be the psi4 energy minimum, but the result is the same.
energy

@chapincavender
Copy link

I think this result is expected based on how BespokeFit weights grid points in the objective function via ForceBalance. Briefly, grid points are weighted according to the QM energy with the minimum QM energy set to zero, and the weights are given by

          {                        1 for E_QM < 1 kcal/mol
w(E_QM) = { 1/sqrt(1 + (E_QM - 1)^2) for 1 kcal/mol <= E_QM < 10 kcal/mol
          {                        0 for 10 kcal/mol <= E_QM

See section 2.6 Parameter Optimization and Eq. 3 in the BespokeFit paper for more details.

For grid points where the QM energy is 6 kcal/mol above the QM minimum, the weight of the grid points drops to ~0.2. The goal of this grid weighting is to achieve greater fidelity for the energy landscape close to the QM minimum at the cost of lower fidelity for barriers and high energy local minima. Indeed, your result shows that the BespokeFit energies in blue are closer to QM than the OpenFF 2 energies in orange when the QM energy is less than ~5 kcal/mol. In particular, the BespokeFit force field has eliminated the hump near 0 deg that does not appear in the QM energy.

You can change the delimiters in the weighting scheme by passing the energy_denominator (default 1 kcal/mol) and energy_cutoff (default 10 kcal/mol) parameters when you construct a TorsionProfileTargetSchema API docs here. You can also turn off the weighting entirely by passing attenuate_weights=False to the same target schema.

@xiki-tempula
Copy link
Author

@chapincavender Thanks for the explanation. I can see the logical here.
I guess in this specific example it is really bad in that the high energy conformer torsion at pi angle is being sampled in the actual simulation which is not being observed when one simulate with OpenFF2.

@djcole56
Copy link

For the record, a couple of further checks/tests that we suggest doing are:

  • Perform MM optimisation at each grid point when comparing the MM with the QM scan. This is because there will generally be a mismatch between MM and QM bonds/angles, so it is better to compare at their respective minima;
  • Compare with the fragmented QM scans. You could run this compound through bespokefit and/or we can pull down the record from QCA.

@xiki-tempula
Copy link
Author

xiki-tempula commented Dec 11, 2023

@djcole56 Thank you for the comments. I tried but I cannot get the first point to work. This is my code, where I add a harmonic restraint to the torsion and restrain it to the specific torsion angle and do a geometry minimisation. However, it seems that the torsion is geometry minimised but the rest of the compound is distorted. I wonder if you mind give me some help, please?

Archive.zip

About the second point, do you know how to check the fitting of each fragment? Thanks.

@djcole56
Copy link

About point 2, you mean once you've run bespokefit? You can find the plots of the reference and fit torsion scans in a directory like this one:
bespoke-executor/2b7de200-889e-4b27-94e4-a51d65d83e1f/stage_0/optimize.tmp/torsion-1/iter_0002/plot_torsion.pdf

Where the 2nd directory is the job id, the 5th is the torsion ID (torsion 1 in this case) and the 6th is for the final force balance iteration (0002 in this case). We have a TODO to move those somewhere more easily found!

@djcole56
Copy link

About point 1, I can't immediately see the issue with your script, but I have run simpler unrestrained geometry optimisations at the two end points with your files. That gives:
-180deg: -618.0402221679688 kJ/mol
0deg: -623.3138427734375 kJ/mol
But let's see what the bespokefit scans look like.

@xiki-tempula
Copy link
Author

@djcole56 Thanks for the comment. I guess this shows the issue right?
The QM energy different between the two minimum is around 6 kcal/mol. However, the MM energy difference is only 5.27 kJ/mol, so roughly 1.25 kcal/mol, which means the FF is really bad.

Do you know if it is possible to do the fitting not using fragment but rather using the whole molecule? Then I could directly check the fitting of the whole molecule.

@xiki-tempula
Copy link
Author

I have rerun the fitting with
openff-bespoke executor run --directory 'bespoke-executor' --file "lig_ejm_55.sdf" --workflow "default" --output "lig_ejm_55.json" --output-force-field "lig_ejm_55.offxml"
But it seems that in the bespoke-executor, there is only
celery-fragmenter.log celery-optimizer.log celery-qcgenerator.log gateway.log redis.db redis.log timer.dat There is no fragment fitting data.
bespoke-executor.zip

@xiki-tempula
Copy link
Author

I cannot figure out how to do MM geometry optimisation with OpenMM, so I did it with Gromacs.
Openff

@djcole56
Copy link

Hi @xiki-tempula,

For keeping the intermediate files, you need:

BEFLOW_OPTIMIZER_KEEP_FILES=True openff-bespoke executor ...

when you launch the executor and job. I've run this compound using xtb, and that gives the attached scan and coordinates (note it seems to be shifted by 180deg relative to your scan). So both reference and bespoke FF have an energy diff between minima of 6 kcal/mol, as you expected. I'll try re-running it with QM, but I don't expect a big difference.

If the QM doesn't change anything then the issue seems to be with the bespoke FF that you circulated.

bespoke-xtb-data.zip

@xiki-tempula
Copy link
Author

@djcole56 The bespoke FF comes from #175 (comment). I have derived it myself but they are the same.

With regard to the goal of the Bespokefit, I guess in the ideal case, we would hope the MM FF to give the same potential as the QM potential for any conformation which is not possible in reality. I suspect that the fragment scan might be good.

  1. Do we expect the Torsion profile derived from the MM optimised geometry of the whole molecule with MM energy to match the Torsion profile from the QM optimised geometry of the whole molecule with QM energy.
  2. Do we expect the energy difference between the two energy minima, which in this case is the pi and 0 torsion angle to be the same between the MM optimised geometry of the whole molecule with MM energy and the QM optimised geometry with QM energy?

@xiki-tempula
Copy link
Author

Ok, it looks like a fragment issue. The fragment for this torsion is
image
Which seems to be giving a reasonable fit
plot_torsion.pdf

Is there any way to change the fragment size or not fragment the molecule?
In this case, it is really bad as the potential difference between the two energy minimum should have a potential difference of 5 kcal/mol but end up having a potential difference 1.5 kcal/mol.
max
min

@djcole56
Copy link

@xiki-tempula, thanks for this analysis, I think we're getting close. I've run bespokefit myself too, and agree with your QM scan above - the fragment scan gives a QM energy diff between minima of 8 kcal/mol, and bespokefit closely reproduces this. To answer your previous question, yes I would expect MM torsion profiles at MM optimised geometries to match the QM scans, and bespokefit is doing what it's supposed to here.

The question is then why is there an energy diff of 1.5 kcal/mol in the large molecule. That would be scientifically interesting, but also surprising if the MM barrier reduces by that much when transferring from fragment to whole. I think we need to rule out other possible errors first though. Have you checked whether you still get an energy diff of 1.5 kcal/mol using the new FF direct from bespokefit (rather than using combined.xml)?

@xiki-tempula
Copy link
Author

xiki-tempula commented Dec 13, 2023

@djcole56 Yes. I can verify that this is the case. All energy minimised to the minimum.
Openff

@djcole56
Copy link

Interesting, can you also post the final bespokefit FF xml file and we'll look into it

@xiki-tempula
Copy link
Author

Archive.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants