Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for intermolecular_interactions #321

Open
xiki-tempula opened this issue May 20, 2022 · 19 comments
Open

Support for intermolecular_interactions #321

xiki-tempula opened this issue May 20, 2022 · 19 comments

Comments

@xiki-tempula
Copy link
Collaborator

The gromacs topology supports the so called [intermolecular_interactions]
https://manual.gromacs.org/documentation/2019-rc1/reference-manual/topologies/topology-file-formats.html#id35

Which describes the interactions (bond/angle/dihedral) between molecules. @chryswoods I wonder if the current gromacs topology writer supports writing this? Thank you.

@lohedges
Copy link
Member

I see no reference to [ intermolecular_interactions ] in the Sire GroTop parser, so assume that this is unsupported for now. I imagine that optional sections could be patched in within BioSimSpace if needed, although this could be tricky if it needs to be done in a way that is self-consistent with other information in the file, e.g. if adding the section would modify something else. If this is the case, then it would be better added to the C++ parser instead.

The information written by the GroTop parser comes directly from the molecular properties, which in turn come from the force field information that is parsed from reading another topology file. Where would [ intermolecular_interactions ] come from? I assume that it would need to be user defined, since force field properties come from parameterising single molecules.

@xiki-tempula
Copy link
Collaborator Author

xiki-tempula commented May 20, 2022

@lohedges

this could be tricky if it needs to be done in a way that is self-consistent with other information in the file, e.g. if adding the section would modify something else. If this is the case, then it would be better added to the C++ parser instead.

Supporting it in BSS is relatively easy as it is last section in the topology file, so it could be appended to end of the file without affecting the original file. But it would obviously be better if it could be supported.

Where would [ intermolecular_interactions ] come from?

So it would be coming from the ABFE restraint generation section. I think an ideal API written in psudocode would be

bond = Sire_bond(Sire_atom1, Sire_atom2, function_type, equilibrium_length_0, equilibrium_length_1, fc_0, fc_1)
angle1 = Sire_angle(Sire_atom1, Sire_atom2, Sire_atom3, ...)
angle2 = Sire_angle(Sire_atom1, Sire_atom2, Sire_atom3, ...)
dihedral1 = Sire_dihedral(Sire_atom1, Sire_atom2, Sire_atom3, Sire_atom4, ...)
dihedral2 = Sire_dihedral(Sire_atom1, Sire_atom2, Sire_atom3, Sire_atom4, ...)
dihedral3 = Sire_dihedral(Sire_atom1, Sire_atom2, Sire_atom3, Sire_atom4, ...)

bonds = [bond, ]
angles = [angle1, angle2]
dihedrals = [dihedral1, dihedral2, dihedral3]
intermolecular_interactions = {'bonds': bonds, 'angles': angles, 'dihedrals': dihedrals}
Sire_system.setProperty('intermolecular_interactions', intermolecular_interactions)

Where the Sire_atom could come from any atom in the Sire_system even if they are not from the some molecule.

@lohedges
Copy link
Member

Yes, something like this should be possible. Are similar interactions needed for a general ABFE implementation, e.g. would you need to do a similar thing if working with another engine, e.g. AMBER? In Sire we parser information from the topology file into engine specific potential terms, e.g. GromacsBond, GromacsAngle, etc. These are then stored as molecular properties using computer algebra expressions. When writing to a different format, e.g. AMBER, the expressions are then converted to the appropriate potential terms, e.g. AmberBond, AmberAngle, etc. If these kind of inter-molecular interaction terms will be general for ABFE, then it would be good to do this in a general way.

For now, if you are only working with GROMACS, and it's possible to just append the records to the existing topology file, then I'd recommend just manually writing the lines for the terms that you need. This can be generalised and, if necessary, incorporated into Sire.IO.GroTop at a later stage. If you need to do this in one place, then pass the system to Process.Gromacs, then you could just some strings as a system property, then extract them to append to the topology in the process setup.

@xiki-tempula
Copy link
Collaborator Author

@lohedges Thanks. I will give it a thought.

@chryswoods
Copy link
Member

chryswoods commented May 20, 2022

Lester is right that the C++ parser doesn't support intermolecular interactions at the moment. Most of the functionality is there, as intermolecular interactions use the same format and parsing structure as the intramolecular interactions. The challenge was more where to place the interactions in our internal data model of the molecules. Molecules are held individually and can be moved between Systems of molecules as and when needed. We would have to think about where the intermolecular potentials should sit, and how or if they should move when one of the molecules involved in the interaction is moved between Systems.

I agree it is something that is worth adding and would be best suited to the C++ Sire layer. Getting the data model right for it is likely key.

My guess is that they would become properties of the System, and would be referenced via BondM, AngleM, DihedralM etc style classes (I am in the process of adding NameM style classes, e.g. SelectorM<Atom>, that provide multi-molecule versions of single molecule views, e.g. Selector<Atom>. I've recently added Bond, SelectorBond and SelectorMBond, and will be expanding the rest as I get to them in the 'feat_web' refresh).

@xiki-tempula
Copy link
Collaborator Author

@chryswoods

Molecules are held individually and can be moved between Systems of molecules as and when needed. We would have to think about where the intermolecular potentials should sit, and how or if they should move when one of the molecules involved in the interaction is moved between Systems.

Can we have some property associated with the system, which is not associated with any molecule within the system.
So have an API similar to

system = protein + ligand
system.setProperty('intermolecular', intermolecular)
new_protein = system[0]
new_ligand = system[1]
new_system = new_protein + new_ligand
assert new_system.hasProperty('intermolecular') == False

@lohedges
Copy link
Member

lohedges commented May 20, 2022

This is already the case with the existing API. System properties aren't associated with molecules and are only preserved when manipulating an existing system, e.g. adding a molecule. If you extract molecules to create a new system, then that system won't have an system-level properties from the old system. (Other than the file format originally associated with the molecules, which is preserved for convenience.)

import BioSImSpace as BSS

s = BSS.IO.readMolecules("amber/ala/*")
ss = ss = (s[0] + s[1]).toSystem()

priint(s._sire_object.propertyKeys())
['fileformat', 'space']

priint(ss._sire_object.propertyKeys())
['fileformat']

Here we've lost the information pertaining to the simulation box, since it is unrelated to the molecules in isolation. It would then be up to the user to set this property if it's needed, either directly, or indirectly by solvating.

@chryswoods
Copy link
Member

Yes, as Lester says, you are right. Nearly everything in BioSimSpace / Sire is derived from SireBase::Property, and most classes can act as Property containers. For example, you could even add a System as a property of a Molecule, which itself is embedded in another System. It is an extremely powerful and flexible system.

The downside of this flexibility is that we need to think carefully about how we constrain the user API so that there are no surprises. One of the reasons we wrote BioSimSpace was to provide a facade on top of Sire that provides an easy-to-use and more straightforward API. This is why Lester hides the Sire objects as, e.g. ._sire_object and doesn't expose them directly to a user of BioSimSpace. A BioSimSpace Molecule is a much more constrained and easier to use facade that is built on top of the very flexible and power (but harder to use) Sire Molecule.

To support intermolecular potentials in BioSimSpace we would need to think about how we could add the hooks that would keep the properties that hold intermolecular potentials consistent, e.g. so that intermolecular properties are copied across to new System objects as needed.

Note that this is also why Lester writes multiple redundant checks at multiple levels in BioSimSpace. We practice defensive coding, and try not to assume that another layer has done the validation correctly. Where performance isn't a problem, multiple repeated validation at different levels future-proofs us for cases where we reorganise the code, or repurpose one level of components against another higher level of components. We only look to remove redundant checks when these cause performance issues.

@xiki-tempula
Copy link
Collaborator Author

@fjclark It might be good for you to chip in as well.

I was initially thinking of such a data structure to hold the restraint, where the gromacs topology writer is f.

f.write('[ intermolecular_interactions ]')
restraint = Sire.System.Property('intermolecular')
f.write('[ bonds ]')
for bond in restraint.Property('GromacsBonds'):
    f.write(bond.toString())
    #;    ai   aj    type  bA         kA       bB    kB
    #2610  1606    6     0.648      0.0    0.648   4184.00
f.write('[ angles ]')
for angle in restraint.Property('GromacsAngles'):
    f.write(angle.toString())
    #;   ai    aj    ak    type    thA      fcA      thB       fcB
    #  2611  2610  1606       1   107.901    0.0   107.901     41.84
f.write('[ dihedrals ]')
for dihedral in restraint.Property('GromacsDihedrals'):
    f.write(dihedral.toString())
    #;   ai    aj    ak    al  type    phiA      fcA    phiB      fcB
    #  2612  2611  2610  1606     2    -160.764    0.0    -160.764    41.84

However, I think it might be very difficult as the bond, angle and dihedral will be containing atoms from different molecules and we will need to access the index of atom in the system.

For restraint generation, I think it would be probably easiest to save the the index of the atoms directly. So I wonder what would be an engine agonist way of saving the force constant and dihedral type? I would imagine that there is something similar to

restraint = {'Bond': [(1, 2), 
                      Sire.Bond(type=1, 
                                value=1.2 * Sire.Units.nm, 
                                fc=1000 * Sire.Units.kcal_per_mol), ],
             'Angle': [(1, 2, 3), 
                       Sire.Angle(type=1, 
                                  value=1.2 * Sire.Units.radian, 
                                  fc=1000 * Sire.Units.kcal_per_mol), ]}

And then I could use some hack to get it to gromacs output format

@lohedges
Copy link
Member

Just to note that the BioSimSpace System object has a getIndex method that can take objects that it contains, e.g. molecules, atoms, residues, and return the absolute index of that object in the system. I've required this information when setting up other configuration files, e.g. metadynamics collective variables, so it might be of use to you.

@xiki-tempula
Copy link
Collaborator Author

@lohedges Thanks. Then I can probably replace the atom index by sire atom object. I wonder if there is any engine agonist way of represent the function type and the relevant parameters?

@chryswoods
Copy link
Member

Also, we have a computer algebra system built in so store bond, angle and dihedral parameters as algebraic expressions. The SireMM.GromacsBond, SireMM.GromacsAngle and SireMM.GromacsDihedral extract the parameters from the algebraic expressions. This enables us to have an engine-portable representation of parameters.

You can create the potentials using

from Sire.CAS import Symbol
r = Symbol("r")
r0 = (1.2 * Sire.Units.nm).value()
k = (1000 * Sire.Units.kcal_per_mol).value()

bond = k * (r - r0)**2

and then something similar for the angle. The GromacsBond class can take an algebraic expression in the constructor and will extract the bond type and parameters for writing to a file. These expressions can then also be used for the Amber and other file topology writers, which extract the parameters they need (and convert to the appropriate units)

@xiki-tempula
Copy link
Collaborator Author

xiki-tempula commented May 23, 2022

@chryswoods Thanks. I wonder if you mind show me an example?
Say I have two sire.atom object atom1 and atom2 and BSS.system system, Gromacs bond type 6, which is the inter-molecular version of (bond = k * (r - r0)**2), force constance as k and equilibrium length as r0.

def construct_GromacsBond(system, atom1, atom2, k, r0, bond_type=6):
    return '2610  1606    6     0.648      0.0    0.648   4184.00'

So I could know how to set such class up.

@chryswoods
Copy link
Member

Will do - I will put something together later today :-)

@chryswoods
Copy link
Member

I've had a go, but I think it does depend on where you want to add this code. To go into the GroTop C++ class it would be better to add into the C++ layer. I am not sure where without thinking more, but GroSystem is a possibility. It would be held as a list of intermolecular potentials in GroSystem, so that the indexing for the atoms would be correct.

You would then also have to add intermolecular bond support to GromacsBond, which I think is as simple as adding the func_type option as 6 (it will default to 1 for harmonic bonds).

There would then be some work to create a new Property class derived from Property that could hold a QHash of Expression objects indexed by a pair of either integers (if the atoms in the two atoms could be mapped to integer indexes), or as actual Atom objects (from which those indexes could be obtained).

This is a reasonable amount of work, and it would depend on how far you wanted to progress with this?

The quicker route is writing something in the Python layer that either writes a separate file or that appends to the gromacs topology file after it has been written out by Sire. This could only be a temporary solution as it would need to go into the C++ layer eventually. This is both to prevent technical debt, and also because it would be challenging to do a multi-engine compatible solution that didn't go through the C++ layer.

Either way I would still store the bond potentials as Expression objects, as Expression is already derived from Property.

Assuming you have created the bond potential using

>>> from Sire.CAS import Symbol
>>> from Sire.Units import nanometer, kJ_per_mol

>>> bond = 1000 * (kJ_per_mol/(nanometer*nanometer)).value() * (r - (1.2*nanometer).value())**2
>>> print(bond)
2.39006 [r - 12]^2

(note how the equation has the internal units of Sire, namely kcal_per_mol / A^2 and A)

You can construct the GromacsBond class using

>>> from Sire.MM import GromacsBond

>>> g_bond = GromacsBond(bond, r)
>>> print(g_bond)
Gromacsbond( functionType() = bond, parameters() = [ 1.2, 2000 ] )

Note how the GromacsBond class has automatically converted the parameters into the units needed for the gromacs topology file, e.g. nanometers and kJ_per_mol per nm^2. Note also that the force constant has been doubled, because the bond equation used internally in gromacs is

E(r) = 0.5 * k (r - r0)**2

(https://manual.gromacs.org/current/reference-manual/functions/bonded-interactions.html#equation-eqnharmbondstretch)

This means that the bond force constant you want to use needs to be double the value in the equation, as Gromacs automatically multiplies the intramolecular bond force constant by 0.5. Keeping track of the correct units and half-or-not-half conversion across all molecular file formats is a pain, hence why we store intramolecular potentials as algebraic expressions. In contrast, if we created an AmberBond from this expression, e.g.

>>> from Sire.MM import AmberBond

>>> a_bond = AmberBond(bond, r)
>>> print(a_bond)
AmberBond( k = 2.39006, r0 = 12 )

we see that the parameters are in Amber units (kcal_per_mol / A^2 and A) and the force constant is not doubled, as Amber uses the intramolecular bond expression of

E(r) = k (r - r0)**2

It is not clear from the gromacs docs (https://manual.gromacs.org/documentation/2019/reference-manual/topologies/topology-file-formats.html#id13) whether type 6 bonds using 0.5 k (r - r0)**2 or k (r - r0)**2.

With the above, a possible construct_GromacsBond function could be;

from Sire.CAS import Symbol
from Sire.Units import kJ_per_mol, nanometer

def construct_GromacsBond(system, atom1, atom2, k, r0):
    atom1 = system.getIndex(atom1)
    atom2 = system.getIndex(atom2)
    bond = 0.5 * k * (kJ_per_mol/(nanometer*nanometer)).value() * (r - (r0 * nanometer).value()))**2

    return (atom1, atom2, bond)

and then, in your code to append to the topology file

from Sire.MM import GromacsBond

# open the topology file in appending mode
file = open("topology_file", "a")

file.write("\n[intermolecular interactions]")

for bond in intermolecular_bonds:
    atom0 = bond[0]
    atom1 = bond[1]

    gromacs_bond = GromacsBond(bond[2])

    params = gromacs_bond.parameters()
    bond_type = gromacs_bond.functionType()

    # the above has returned 1, as this is a harmonic bond.
    # We know this is an intermolecular bond, so should be type 6
    if bond_type != 1:
        raise IncompatibleError(f"We can only write intermolecular bonds of harmonic type!")

    bond_type = 6

    # now write the topology line in the right format. based on the above information.
    # r0 is params[0] and k is params[1]. GromacsBond returns the parameters in the order
    # that they are written to a gromacs topology file. Multiple parameters can be returned
    # as GromacsBond supports a number of different gromacs bond types
    file.write(...)

@xiki-tempula
Copy link
Collaborator Author

xiki-tempula commented May 24, 2022

@fjclark What is your thought on this?
Given that there doesn't seems to be easy way out. On my end, in the near future (Q2/Q3), I will use the separate class (BSS.FreeEnergy.Restraint) to store the restraint in some temporary format (probably a dictionary) and use restraint.toString(format='Gromacs') to alter the gromacs topology, and I might not have time to come back to this issue.

@fjclark
Copy link
Contributor

fjclark commented May 25, 2022

@xiki-tempula sorry about the slow reply. Does this sound like a reasonable initial approach?:

Given that representing the restraints might be more complex than we thought at first, maybe it would be sensible to have both BSS.FreeEnergy.RestraintSearch (what is currently BSS.FreeEnergy.Restraint), and BSS.FreeEnergy.Restraint (to hold an engine-agnostic representation of the restraints with potentials as algebraic expressions). This could be used as:

restraint_search = BSS.FreeEnergy.RestraintSearch(...)
#... run restraint_routine
restraint , best_frame = restraint_search.analyse() # where restraint is of the type BSS.FreeEnergy.Restraint
protocol = BSS.Protocol.FreeEnergy(restraint=restraint, ...) 
ABFE = BSS.FreeEnergy.Absolute(system, protocol, work_dir="output") # Handle writing restraints here based on the engine

@xiki-tempula I am aware that you are currently reusing BSS.FreeEnergy.Relative for ABFE (https://github.com/Exscientia/BioSimSpace/blob/0cd62867293fd2acd5cd23f4c27aca0985a7910e/test/Sandpit/Exscientia/Free_Energy/test_relative.py#L13), but I think it might be worth having BSS.FreeEnergy.Absolute, even if this is extremely similar to BSS.FreeEnergy.Relative, for the sake of clarity.

@xiki-tempula
Copy link
Collaborator Author

xiki-tempula commented May 25, 2022

@fjclark

but I think it might be worth having BSS.FreeEnergy.Absolute, even if this is extremely similar to BSS.FreeEnergy.Relative, for the sake of clarity.

This also my initial thought, then I realise that for Gromacs, a separate ABFE class will be quite problematic.
For decoupling a charged molecule, one need to "perturb" a alchemical ion at the same time, so we need to have decouple and perturbation in the same "thing".

With regard to

protocol = BSS.Protocol.FreeEnergy(restraint=restraint, ...) 
ABFE = BSS.FreeEnergy.Absolute(system, protocol, work_dir="output")

I'm not quite sure of whether

protocol = BSS.Protocol.FreeEnergy(restraint=restraint, ...) 
ABFE = BSS.FreeEnergy.Absolute(system, protocol, work_dir="output")

is better or

protocol = BSS.Protocol.FreeEnergy() 
ABFE = BSS.FreeEnergy.Absolute(system, protocol, work_dir="output", property_map={'restraint'=restraint)

is better? @lohedges @msuruzhon

@msuruzhon
Copy link
Collaborator

As a user that hasn't done much ABFE before I would certainly prefer an explicit parameter, rather than a key in a property_map @xiki-tempula .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants