Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate same complex in diffrent runs #99

Open
ClinicalAI opened this issue Nov 7, 2022 · 5 comments
Open

Generate same complex in diffrent runs #99

ClinicalAI opened this issue Nov 7, 2022 · 5 comments
Assignees

Comments

@ClinicalAI
Copy link

Hi,
I am wondering if there is any parameter (like set a seed random) to set to generate same structure in different run for the same metal core and ligands? In the current version, if we run the program on the same metal-ligand complex we will get slightly different results for each run.

@ralf-meyer ralf-meyer self-assigned this Nov 7, 2022
@ralf-meyer
Copy link
Member

ralf-meyer commented Nov 7, 2022

Hi,
I was not able to reproduce this problem in a first quick test (but I know I have encountered it before). Could you post an example where you encountered this behavior that I could include in future automated testing?

Regarding a solution: Currently the only way I know to achieve this is to set numpy's global seed: np.random.seed(0) before launching molsimplify. A workaround might therefore be:

import numpy as np
from molSimplify.Scripts.generator import startgen
np.random.seed(0)
startgen(['main.py', '-i', 'CLIinput.inp'], True, gui=False)

I will also add this to our list of feature requests for the next release!

@dbkchu

This comment was marked as duplicate.

@ClinicalAI
Copy link
Author

ClinicalAI commented Nov 8, 2022

Hi,
Thanks. That would be good if you can add a command line option for this purpose. A toy problem for that can be:

Metal Core: ZN
Ligand1: [O-]C(=O)CCC(=O)[O-]
Ligand2: c1ncn(c1)Cc1ccc(cc1)Cn1cncc1

Command: molsimplify -core "Zn" -lig "[O-]C(=O)CCC(=O)[O-]" "c1ncn(c1)Cc1ccc(cc1)Cn1cncc1" -ligocc 2 -skipANN True

The output XYZ file is different in each running.

I tried two approaches, startgen_pythonic and startgen in python code and set the seeds:

import random
import numpy as np
seed = 12345
random.seed(seed)
np.random.seed(seed)
from molSimplify.Scripts.generator import startgen_pythonic

input_dict = {'-core': "zn",
                  '-lig': str("[O-]C(=O)CCC(=O)[O-]"),
                  '-ligocc': "2",
                  '-skipANN': "True",
                  }
startgen_pythonic(input_dict)

and

import random
import numpy as np
seed = 12345
random.seed(seed)
np.random.seed(seed)
from molSimplify.Scripts.generator import startgen
startgen(['main.py', '-i', 'CLIinput.inp'], True, gui=False)

But still the XYZ is different in different runs. Any help to solve the issue will be appreciated.

Thanks,

@ralf-meyer
Copy link
Member

Thanks for the follow up!

Bad news up front: I was not able to fully eliminate all sources of randomness. As far as I can tell this is mostly due to the fact that we rely heavily on openbabel. The main source of randomness is the conversion from smiles to a 3d structure and openbabel currently does not implement a random seed for this transformation (see openbabel/openbabel#1934 for more details).

A workaround might be to add the ligand to a custom database, see http://hjkgrp.mit.edu/tutorials/2018-05-09-molsimplify-tutorial-10-adding-ligands-molsimplify/ section 2 (Due to a bug in that function you might need to run this command twice if you have not yet set up a custom data path). This way the conversion is only run the first time when the ligand is added.

After trying this (and setting the random seeds) I still found a small variation in the generated structures that I will have to investigate further but most likely stems from the openbabel force field optimization.

@ClinicalAI
Copy link
Author

Ok. Thanks. I will try adding ligands to DB.
Meanwhile, you may use RDKIT instead of openbabel to convert smile to xyz. It seems in RDKIT there is a random seed to reproduce the generated 3D compound:
rdkit/rdkit#2575

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants