Skip to content

Commit

Permalink
Merge branch 'cleaned_version'
Browse files Browse the repository at this point in the history
  • Loading branch information
mohammedazzouzi15 committed Nov 1, 2024
2 parents 8ff9d27 + db7d6cc commit 25c716e
Show file tree
Hide file tree
Showing 16 changed files with 474 additions and 626 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,10 @@ wandb/
*.gz
*.tar
*.gz

#extra folders
notebooks/*.ipynb
notebooks/*/*.ipynb
notebook/
lightning_logs/

776 changes: 204 additions & 572 deletions Example_notebooks/4_learn_molecules_representation.ipynb

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
Model: 6-frag_target_241101__SchNet_splitrand-nummol5000
Target: target
spliting function: topk_split
Number of molecules: 5000
Number of fragment: 6
Number of training data: 3496
Number of validation data: 388
Number of test data: 258
Number of training data loader: 27
Number of validation data loader: 3
Number of test data loader: 2
Best model: data_example/representation_learning//6-frag/target/241101//SchNet/splitrand-nummol5000/epoch=9-val_loss=0.29-other_metric=0.00.ckpt
Best model val loss: 0.29
Model loaded
Model evaluation done
Perfomance with learned embedding
MAE train: 0.58, MSE train: 0.63, R2 train: 0.17
MAE val: 0.62, MSE val: 0.77, R2 val: 0.14
MAE test: 1.63, MSE test: 2.77, R2 test: -138.10
Binary file not shown.
Binary file not shown.
Binary file not shown.
10 changes: 6 additions & 4 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@
# -- Project information -----------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information

project = "stk_search"
project_copyright = "2024, Mohammed Azzouzi"
project = "stk-search"
project_copyright = "2023, Mohammed Azzouzi"
author = "Mohammed Azzouzi"

# -- General configuration ---------------------------------------------------
Expand All @@ -20,15 +20,17 @@
"sphinx.ext.intersphinx",
"sphinx.ext.viewcode",
"sphinx_copybutton",
"sphinx.ext.autodoc",
]

autosummary_imported_members = True

autodoc_typehints = "description"
autodoc_member_order = "groupwise"
autoclass_content = "class"

autodoc_type_aliases = {
"Properties": "dict[str, Json]",
"Json": "Json",
}

intersphinx_mapping = {
"python": ("https://docs.python.org/3", None),
Expand Down
95 changes: 95 additions & 0 deletions docs/source/index copy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
Welcome to stk_search's documentation!
===================================

skt_search is a Python library for searching the chemical space of molecules formed by stk. It is built on top of stk and stk_optim.
<p align="center">
<img src="../../overview.svg" alt="Overview Image" />
</p>

.. toctree::
:maxdepth: 2
:caption: Contents:

Calculators <_autosummary/stk_search.Calculators>
utils <_autosummary/stk_search.utils>
Search_algorithm <_autosummary/stk_search.Search_algorithm>
Representation <_autosummary/stk_search.Representation>
SearchAlgorithm <_autosummary/stk_search.Search_algorithm>
SearchSpace <_autosummary/stk_search.SearchSpace>
Modules <modules>


.. tip::

⭐ Star us on GitHub! ⭐

GitHub: https://GitHub.com/mohammedazzouzi15/stk_search

## Overview

`stk_search` is a Python package for searching the chemical space of molecules formed by `stk`. It is built on top of `stk` and `stko`. For more details on the use of the package, please refer to the corresponding publication as well as the documentation associated with it.

We use [stk](https://github.com/lukasturcani/stk) and [stko](https://github.com/JelfsMaterialsGroup/stko) for building and calculating the properties of the molecules.

We use [BoTorch](https://botorch.org/) for the implementation of the Bayesian optimization.

For the implementation of the geometric modeling on 3D structure, we use the implementation of models in [GEOM3D](https://github.com/chao1224/Geom3D).

## Installation

To install the package, follow these steps:

1. **Open a terminal and change to the directory where the `pyproject.toml` file is located.**
```bash
cd path/to/directory
```

2. **Create a new conda environment**
```bash
conda create -n stk_search python=3.8
```

3. **Activate the environment**
```bash
conda activate stk_search
```

4. **Run the following command to install the package:**
In some cases, you may need to install `gcc` before installing the package.
```bash
pip install -e .
```

5. **Install additional packages to use the GNN model:**

**For GPU:**
```bash
pip install torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.3.0+cu121.html
# Make sure the torch version is the right one
```

**For CPU:**
```bash
pip install torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.3.0+cpu.html
## Usage
Refer to the example notebooks where we show a step-by-step use of the package to search a space of oligomers formed of 6 building blocks.
1. **Notebook 0: Generate Building Blocks**
- Shows how to go from a list of SMILES to generate a list of building blocks.
- Introduces a way to run calculations using [`xtb`](command:_github.copilot.openSymbolFromReferences?%5B%22%22%2C%5B%7B%22uri%22%3A%7B%22scheme%22%3A%22file%22%2C%22authority%22%3A%22%22%2C%22path%22%3A%22%2Fc%3A%2FUsers%2Fma11115%2FOneDrive%20-%20Imperial%20College%20London%2Fgithub_folder%2FSTK_search%2FREADME.md%22%2C%22query%22%3A%22%22%2C%22fragment%22%3A%22%22%7D%2C%22pos%22%3A%7B%22line%22%3A55%2C%22character%22%3A74%7D%7D%5D%2C%220faed8b2-e29e-4f60-b965-c22999e98b01%22%5D "Go to definition") and [`xtb_stda`](command:_github.copilot.openSymbolFromReferences?%5B%22%22%2C%5B%7B%22uri%22%3A%7B%22scheme%22%3A%22file%22%2C%22authority%22%3A%22%22%2C%22path%22%3A%22%2Fc%3A%2FUsers%2Fma11115%2FOneDrive%20-%20Imperial%20College%20London%2Fgithub_folder%2FSTK_search%2FREADME.md%22%2C%22query%22%3A%22%22%2C%22fragment%22%3A%22%22%7D%2C%22pos%22%3A%7B%22line%22%3A55%2C%22character%22%3A82%7D%7D%5D%2C%220faed8b2-e29e-4f60-b965-c22999e98b01%22%5D "Go to definition") to get the properties of the building blocks and save them in a database.
- Demonstrates how to generate a dataframe with the necessary data to form a representation of the constructed molecules for Bayesian optimization.
2. **Notebook 1: Define Search Space**
- Shows how to define the search space and generate a search space pickle that can be loaded later to run the search algorithm.
3. **Notebook 2: Run Search Algorithm**
- Shows how to run the search algorithm on the search space using different search algorithms: BO, EA, SUEA.
4. **Notebook 3: Representation Learning**
- Shows how to run a representation learning using a 3D geometry-based GNN.
## Contact
For questions, please contact `[email protected]'
88 changes: 59 additions & 29 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@ Welcome to stk_search's documentation!
===================================

skt_search is a Python library for searching the chemical space of molecules formed by stk. It is built on top of stk and stk_optim.
<p align="center">
<img src="../../overview.svg" alt="Overview Image" />
</p>

.. toctree::
:maxdepth: 2
Expand All @@ -22,44 +25,71 @@ skt_search is a Python library for searching the chemical space of molecules for

GitHub: https://GitHub.com/mohammedazzouzi15/stk_search

Installation
============
## Overview

To install the package, follow these steps:
`stk_search` is a Python package for searching the chemical space of molecules formed by `stk`. It is built on top of `stk` and `stko`. For more details on the use of the package, please refer to the corresponding publication as well as the documentation associated with it.

We use [stk](https://github.com/lukasturcani/stk) and [stko](https://github.com/JelfsMaterialsGroup/stko) for building and calculating the properties of the molecules.

We use [BoTorch](https://botorch.org/) for the implementation of the Bayesian optimization.

For the implementation of the geometric modeling on 3D structure, we use the implementation of models in [GEOM3D](https://github.com/chao1224/Geom3D).

## Installation

To install the package, follow these steps:

1. **Open a terminal and change to the directory where the `pyproject.toml` file is located.**
cd path/to/directory
2. create a new conda environment
conda create -n stk_search python=3.8
3. activate the environment
conda activate stk_search
4. Run the following command to install the package:
pip install -e .
5. install additional package to use the GNN model:
for GPU:
pip install torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.3.0+cu121.html
Make sure the torch version is the right one
for CPU:
pip install torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.3.0+cpu.html
```bash
cd path/to/directory
```

2. **Create a new conda environment**
```bash
conda create -n stk_search python=3.8
```

3. **Activate the environment**
```bash
conda activate stk_search
```

4. **Run the following command to install the package:**
In some cases, you may need to install `gcc` before installing the package.
```bash
pip install -e .
```

5. **Install additional packages to use the GNN model:**

**For GPU:**
```bash
pip install torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.3.0+cu121.html
# Make sure the torch version is the right one
```

**For CPU:**
```bash
pip install torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.3.0+cpu.html
## Usage
Usage
============
for the usage of the package, please refer to the example notebooks in the notebooks folder.
Refer to the example notebooks where we show a step-by-step use of the package to search a space of oligomers formed of 6 building blocks.
Contributing
============
[Provide guidelines for contributing to the project]
1. **Notebook 0: Generate Building Blocks**
- Shows how to go from a list of SMILES to generate a list of building blocks.
- Introduces a way to run calculations using [`xtb`](command:_github.copilot.openSymbolFromReferences?%5B%22%22%2C%5B%7B%22uri%22%3A%7B%22scheme%22%3A%22file%22%2C%22authority%22%3A%22%22%2C%22path%22%3A%22%2Fc%3A%2FUsers%2Fma11115%2FOneDrive%20-%20Imperial%20College%20London%2Fgithub_folder%2FSTK_search%2FREADME.md%22%2C%22query%22%3A%22%22%2C%22fragment%22%3A%22%22%7D%2C%22pos%22%3A%7B%22line%22%3A55%2C%22character%22%3A74%7D%7D%5D%2C%220faed8b2-e29e-4f60-b965-c22999e98b01%22%5D "Go to definition") and [`xtb_stda`](command:_github.copilot.openSymbolFromReferences?%5B%22%22%2C%5B%7B%22uri%22%3A%7B%22scheme%22%3A%22file%22%2C%22authority%22%3A%22%22%2C%22path%22%3A%22%2Fc%3A%2FUsers%2Fma11115%2FOneDrive%20-%20Imperial%20College%20London%2Fgithub_folder%2FSTK_search%2FREADME.md%22%2C%22query%22%3A%22%22%2C%22fragment%22%3A%22%22%7D%2C%22pos%22%3A%7B%22line%22%3A55%2C%22character%22%3A82%7D%7D%5D%2C%220faed8b2-e29e-4f60-b965-c22999e98b01%22%5D "Go to definition") to get the properties of the building blocks and save them in a database.
- Demonstrates how to generate a dataframe with the necessary data to form a representation of the constructed molecules for Bayesian optimization.
License
============
[Specify the license under which the package is distributed]
2. **Notebook 1: Define Search Space**
- Shows how to define the search space and generate a search space pickle that can be loaded later to run the search algorithm.
Contact
============
3. **Notebook 2: Run Search Algorithm**
- Shows how to run the search algorithm on the search space using different search algorithms: BO, EA, SUEA.
[Provide contact information or links to relevant resources]
4. **Notebook 3: Representation Learning**
- Shows how to run a representation learning using a 3D geometry-based GNN.
```
## Contact
For questions, please contact `[email protected]'
6 changes: 4 additions & 2 deletions src/dev_scripts/get_frag_encoding.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,17 +9,19 @@
to run it from the command line:
python get_frag_encoding.py --config_dir config_dir.
"""



import os
import Path
from pathlib import Path
import pathlib import Path

import numpy as np
import pandas as pd
import Path
import torch

from stk_search.geom3d import (
dataloader,
oligomer_encoding_with_transformer,
Expand Down
58 changes: 57 additions & 1 deletion src/dev_scripts/run_representation_learning_polymer.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,51 @@
"""Trains a polymer representation learning model using a given configuration file.
Functions:
----------
- main(config_dir):
Train the model using the given configuration.
config_dir (str): The path to the directory containing the configuration file.
- load_model(config_dir, config):
Load the best model from the checkpoint.
config_dir (str): The path to the directory containing the configuration file.
config (dict): Configuration dictionary.
Returns
-------
pymodel: Loaded model.
- evaluate_pymodel(data, pymodel, device):
Evaluate the model on a given batch of data.
data: Batch of data.
pymodel: The model to evaluate.
device: Device to run the evaluation on.
Returns
-------
z, z_opt, data.y: Predictions and ground truth.
- evaluate_model_prediction(loader, pymodel, config_dir, name_df="train"):
Evaluate the model predictions on a given data loader.
loader: Data loader.
pymodel: The model to evaluate.
config_dir (str): The path to the directory containing the configuration file.
name_df (str): Name of the dataframe to save predictions.
Returns
-------
df_original: DataFrame containing predictions and ground truth.
- evaluale_model_performance(df_pred):
Evaluate the performance of the model.
df_pred: DataFrame containing predictions and ground truth.
Returns
-------
mae, mse, r2: Mean Absolute Error, Mean Squared Error, and R2 score.
"""

import os

import pandas as pd
Expand All @@ -11,9 +59,17 @@


def main(config_dir):
"""Train the model using the given configuration.
Args:
----
config_dir (str): The path to the directory containing the
configuration file.
"""
config = read_config(config_dir)
bbs_dict = polymer_GNN_architecture_utils.get_bbs_dict(
config["pymongo_client"] , config["precursor_database_name"]
config["pymongo_client"], config["precursor_database_name"]
)

(
Expand Down
3 changes: 2 additions & 1 deletion src/stk_search/Calculators/STDA_calculator.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ def __init__(
self.num_threads = num_threads
self._output_dir = output_dir
self.maxev_excitedenergy = maxev_excitedenergy
self.XTB4STDAHOME = "/media/mohammed/Work/xtb4stda_home"

def calculate(self, mol):
"""Calculate the excited state properties.
Expand Down Expand Up @@ -118,7 +119,7 @@ def calculate(self, mol):
env = os.environ.copy()
env["OMP_NUM_THREADS"] = str(self.num_threads)
env["MKL_NUM_THREADS"] = str(self.num_threads)
env["XTB4STDAHOME"] = "/media/mohammed/Work/bin/xtb4stda_home"
env["XTB4STDAHOME"] = self.XTB4STDAHOME
command = [self.stda_bin_path + "xtb4stda", xyz]
with Path("gen_wfn.out").open("w", encoding="utf-8") as f:
sp.run( # noqa: S603
Expand Down
Loading

0 comments on commit 25c716e

Please sign in to comment.