-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add explicit_hydrogen
parameter
#741
base: interfaces
Are you sure you want to change the base?
Conversation
CodSpeed Performance ReportMerging #741 will not alter performanceComparing Summary
|
mol = EditableMol(Mol()) | ||
|
||
has_charge_annot = "charge" in atoms.get_annotation_categories() | ||
for i in range(atoms.array_length()): | ||
rdkit_atom = Atom(atoms.element[i].capitalize()) | ||
if has_charge_annot: | ||
rdkit_atom.SetFormalCharge(atoms.charge[i].item()) | ||
if explicit_hydrogen: | ||
rdkit_atom.SetNoImplicit(True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the base case where explicity_hydrogen=True
, what would this mean if I just call to_mol
for a typical crystal structure that won't have any hydrogens resolved? Will RDKit infer charges / valences in that case? If so this might lead to broken molecules -- if that were the case, should we check that if explicit hydrogen is true there have to be hydrogens in the structure? (I could see this being sth users will try -- at least I would have tried it :D )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Charges would also not be inferred automatically before this PR. But I am not sure what this means for valence: As the bond types are also explicitly set I guess RDKit assumes a radical, if explicit_hydrogen=True
, but hydrogen atoms are actually missing. Probably, I should test this.
Having a check there sounds like a good idea, especially as I would agree that this could be a common mistake. However, I also think strictly checking for the simple presence of hydrogen atoms might not be sensible enough, as there are valid molecules without hydrogen atoms, although they appear rarely. What do you think about raising a warning as a 'reminder' to the user to check the input?
I checked the different behaviors when an import biotite.interface.rdkit as rdkit_interface
import biotite.structure.info as info
import rdkit.Chem.AllChem as Chem
import rdkit.Chem.Draw as Draw
atoms = info.residue("C")
atoms = atoms[atoms.element != "H"]
mol = rdkit_interface.to_mol(atoms, explicit_hydrogen=True)
Chem.Compute2DCoords(mol)
Draw.MolToFile(mol, "explicit.png")
mol = rdkit_interface.to_mol(atoms, explicit_hydrogen=False)
Chem.Compute2DCoords(mol)
Draw.MolToFile(mol, "non_explicit.png")
So |
I added a warning in case the structure contains no hydrogen atoms. @Croydon-Brixton Could you have a look again? |
@@ -141,17 +139,29 @@ def to_mol( | |||
HB3 | |||
HXT | |||
""" | |||
mol = EditableMol(Mol()) | |||
hydrogen_mask = atoms.element == "H" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I just remembered that I've seen the very occasional structures with deuterium as well (e.g. 1wq2). How would we want to deal with these cases? Do we want to treat this isotope as hydrogen as well (probably chemically most appropriate) or not? If so, the PDB represents deuterium as the element "D"
for i in range(atoms.array_length()): | ||
rdkit_atom = Atom(atoms.element[i].capitalize()) | ||
rdkit_atom = Chem.Atom(atoms.element[i].capitalize()) | ||
if explicit_hydrogen: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if explicit_hydrogen: | |
if explicit_hydrogen: | |
# ... tell RDKit to not assume any implicit hydrogens |
if explicit_hydrogen: | ||
if not hydrogen_mask.any(): | ||
warnings.warn( | ||
"No hydrogen found in the input, although 'explicit_hydrogen' is 'True'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"No hydrogen found in the input, although 'explicit_hydrogen' is 'True'" | |
"No hydrogen found in the input, although 'explicit_hydrogen' is 'True'. " | |
"This may lead to atoms wrongly interpreted as radicals. Be careful." |
rdkit_atom = Atom(atoms.element[i].capitalize()) | ||
rdkit_atom = Chem.Atom(atoms.element[i].capitalize()) | ||
if explicit_hydrogen: | ||
rdkit_atom.SetNoImplicit(True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm to my mind this option is still problematic, at least given the defaults of the to_mol function:
By default most PDBs won't have any hydrogens present, but the default in to_mol here is explicit_hydrogen=True. This will lead to radicals upon calling Chem.Sanitize, which is undesirable almost always and will probably catch a lot of users (especially those less well versed in RDKit) off guard.
I would suggest:
- Changing the default to not expect explicit hydrogens.
- Emitting a warning if a user says they're having explicit hydrogens, but no hydrogens are found, with a note that this can lead to radicals. After that, they're on their own ^^
Not directly related to the radicals, but another thing I noted while looking at this: |
@padix-key I've added my suggested changes here for your convenience: padix-key#13 |
This parameter in
interface.rdkit.to_mol()
defines whether hydrogen should be explicitly or implicitly included in the createdMol