-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate OpenFF / OpenFE protein loaders so that entire system has Molecule representations #1182
Comments
Just to leave some breadcrumbs, the openfe approach is to try and follow the PDB-recommended route of matching residue-by-residue mmcif templates to the raw file to assign bond orders, aromaticity & formal charges. The repo for this is here: https://github.com/OpenFreeEnergy/pdbinf An example for loading CDK2 (which features a nonstandard residue) is here: https://github.com/OpenFreeEnergy/pdbinf/blob/main/notebooks/tpo_load.ipynb I've also played around with questions of, if the monomer has an incorrect label or the atoms have incorrect labels, can you still find/apply the correct template: https://github.com/OpenFreeEnergy/pdbinf/blob/main/notebooks/tpo_guessing_demo.ipynb It should currently handle standard AAs, RNA, DNA and if you download the chemical component dictionary (or any template) anything which is a standard nonstandard component. This is all still hinging on the residues being correctly delimited, if for example you had a cap that had been merged with the neighbouring residue this wouldn't be handled well. The OpenFF approach is to provide SMARTS templates + atom names to |
@richardjgowers : As I understand it, the mmcif templates are fully protonated forms of the non-polymeric (non-residue) form of each residue, meaning matching must be done based on canonical residue and atom names. Is this the strategy that OpenFE uses? This is a PDB-recommended approach, but quickly breaks down when you are dealing with molecules not currently in the chemical component dictionary, like small molecules of interest. In this case, there may be no canonical naming for the entities. Could you elaborate on the philosophy behind this approach that would enable someone to deal with small molecules or polymeric residues not currently in the CCD? Is the expectation that the user will provide a local set of additions to the chemical component dictionary, establishing their own canonical residue and atom naming schemes that do not conflict with the official PDB CCD? What happens if the PDB updates to include residue names that clash? I don't think this is a bad approach, but I'd love to better understand how the workflow is envisioned to be usable even under ideal circumstances before diving down into the technical details. |
We should try to integrate these loaders so that we can eventually use tools like OpenFF or Espaloma to parameterize the receptor/biomolecule as well as the ligand.
The text was updated successfully, but these errors were encountered: