:Author: Mohammed Azzouzi :Docs: https://stk-search.readthedocs.io
stk_search
is a Python package for searching the chemical space of molecules formed by stk
. It is built on top of stk
and stko
. For more details on the use of the package, please refer to the corresponding publication as well as the documentation associated with it.
We use stk and stko for building and calculating the properties of the molecules.
We use BoTorch for the implementation of the Bayesian optimization.
For the implementation of the geometric modeling on 3D structure, we use the implementation of models in GEOM3D.
To install the package, follow these steps:
-
Open a terminal and change to the directory where the
pyproject.toml
file is located.cd path/to/directory
-
Create a new conda environment
conda create -n stk_search python=3.8
-
Activate the environment
conda activate stk_search
-
Run the following command to install the package: In some cases, you may need to install
gcc
before installing the package.pip install -e .
-
Install additional packages to use the GNN model:
For GPU:
pip install torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.3.0+cu121.html # Make sure the torch version is the right one
For CPU:
pip install torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.3.0+cpu.html
Refer to the example notebooks where we show a step-by-step use of the package to search a space of oligomers formed of 6 building blocks.
-
Notebook 0: Generate Building Blocks
- Shows how to go from a list of SMILES to generate a list of building blocks.
- Introduces a way to run calculations using
xtb
andxtb_stda
to get the properties of the building blocks and save them in a database. - Demonstrates how to generate a dataframe with the necessary data to form a representation of the constructed molecules for Bayesian optimization.
-
Notebook 1: Define Search Space
- Shows how to define the search space and generate a search space pickle that can be loaded later to run the search algorithm.
-
Notebook 2: Run Search Algorithm
- Shows how to run the search algorithm on the search space using different search algorithms: BO, EA, SUEA.
-
Notebook 3: Representation Learning
- Shows how to run a representation learning using a 3D geometry-based GNN.
We used the implementation of different GNNs following the code in Geom3D. If you use any of the capabilities related to representation learning, please cite their paper:
@article{liu2023symmetry,
title={Symmetry-Informed Geometric Representation for Molecules, Proteins, and Crystalline Materials},
author={Liu, Shengchao and Du, Weitao and Li, Yanjing and Li, Zhuoxinran and Zheng, Zhiling and Duan, Chenru and Ma, Zhiming and Yaghi, Omar and Anandkumar, Anima and Borgs, Christian and others},
journal={arXiv preprint arXiv:2306.09375},
year={2023}
}
For questions, please contact `[email protected]'