EquiScore is a generic protein-ligand interaction scoring method integrating physical prior knowledge with data augmentation modeling
Implementation of EquiScore, by Duanhua Cao 😊.
Some bugs(🐛 🐛) have been fixed, and bash commands are further provided to help users unfamiliar with python quickly use EquiScore for virtual screening 😃
This repository contains all code, instructions and model weights necessary to screen compounds by EquiScore, eval EquiScore or to retrain a new model.
If you have any question, feel free to open an issue or reach out to us: [email protected] ✉️.
If you want to train one of our models with the PDBscreen data you should do:
- download Preprocessed PDBscreen data from zenodo
- uncompress the directory by tar command and place it into
data
such that you have the path/EquiScore/data/training_data/PDBscreen
- see retraining EquiScore part for details.
We recommend setting up the environment using Anaconda.
Clone the current repo
git clone [email protected]:CAODH/EquiScore.git
This is an example for how to set up a working conda environment to run the code (but make sure to use the correct pytorch, DGL, cuda versions or cpu only versions):
conda create --name EquiScore python=3.8
conda activate EquiScore
Through our testing, the relevant environment can be successfully installed by executing the following commands in sequence:
conda install pytorch==1.11.0 cudatoolkit=11.3 -c pytorch
conda install -c dglteam dgl-cuda11.1
conda install -c conda-forge rdkit
conda install -c conda-forge biopython
conda install -c conda-forge scikit-learn
conda install -c conda-forge prolif
pip install prefetch-generator
pip install lmdb
pip install numpy==1.22.3
or you can download the conda-packed file zenodo, and then unzip it in ${anaconda install dir}/anaconda3/envs/EquiScore. ${anaconda install dir} represents the dir where the anaconda is installed. For me, ${anaconda install dir}=/root.
mkdir ${anaconda install dir}/anaconda3/envs/EquiScore
tar -xzvf EquiScore.tar.gz -C ${anaconda install dir}/anaconda3/envs/EquiScore
conda activate EquiScore
after enter the EquiScore env: run
conda-unpack
We implemented a Screening.py python script, to help anyone want to screen Hits from a compound library.
We provide a toy example under the ./data/sample_data folder for illustration.
-
Docking compounds with target protein to get docking pose, EquiScore is robust to pose sources and you can choose any method you are familiar with to generate poses(Glide,Vina,Surflex,Gold,LeDock), or you can try a deep learning method.
-
Assume that you have obtained the results of the docking in the previous step. Then, get pocket region and compound pose. run script:
python ./get_pocket/get_pocket.py --docking_result ./data/sample_data/sample_compounds.sdf --recptor_pdb ./data/sample_data/sample_protein.pdb --single_sdf_save_path ./data/sample_data/tmp_sdfs --pocket_save_dir ./data/sample_data/tmp_pockets
or use bash command script in bash_scripts dir: You just need to replace the corresponding parameters
cd ~/EquiScore/bash_scripts
bash Getpocket.sh
-
Then, you have all data to predict protein-ligand interaction by EquiScore! Be patient. This is the last step!
python Screening.py --ngpu 1 --test --test_path ./data/sample_data/ --test_name tmp_pockets --pred_save_path ./data/test_results/EquiScore_pred_for_tmp_pockets.csv
or use bash command script in bash_scripts dir: You just need to replace the corresponding parameter
cd ~/EquiScore/bash_scripts
bash Screening.sh
-
Until now, you get all prediction result in pred_save_path!
Just like screen compounds for a target, benchmark dataset have many targets for screening, so we implemented a script to calculate the results
-
We provided Preprocessed pockets on zenodo (download pockets from zenodo). IF YOU WANT GET RAW DATASET PLEASE DOWNLOAD RAW DATA FROM REFERENCE PAPERS.
-
you need to download the Preprocessed dataset and extract data to ./data/external_test_data.(for example, all pockets in DEKOIS2.0 docking by Glide SP should be extract into one dir like ./data/external_test_data/dekois2_pocket)
-
if you want to preprocessed data to get pocket , all pocket file names should contain '_active' for active ligand,'_decoy' for decoys and all pockets in a dir for one benchmark dataset
-
run script (You can use the nohup command and output redirects as you normally like):
python Independent_test.py --test --test_path ./data/external_test_data --test_name dekois2_pocket --test_mode multi_pose
the result will be saved in ~/EquiScore/workdir/official_weight/
or use bash command script in bash_scripts dir: You just need to replace the corresponding parameter
cd ~/EquiScore/bash_scripts
bash Benchmark_test.sh
use multi_pose arg if one ligand have multi pose and set pose_num and idx_style in args ,see args
--help for more details
-
you need to download the traing dataset , and extract pocket data to ./data/training_data/PDBscreen (You can also use your own private data, As long as it can fit to EquiScore after processing)
-
use uniprot id to deduplicated data and split data in
./data/data_splits/screen_model/data_split_for_training.py
in this script, will help deduplicated dataset by uniprot id and split train/val data and save data path into a pkl file (like "train_keys.pkl, val_keys.pkl, test_keys.pkl"). -
run train.py script:
python Train.py --ngpu 1 --train_keys your_keys_path --val_keys your_keys_path --test_keys your_keys_path --test
or use bash command script in bash_scripts dir: You just need to replace the corresponding parameter
cd ~/EquiScore/bash_scripts
bash Training.sh
(In the first round of training, data is processed and saved, so it may be slower, depending on hardware conditions OR If you wish to expedite the training process, please refer to the preprocessing workflow in dataset.py, save the data to the LMDB database, and then specify the LMDB path in the training script by adding --lmdb_cache lmdb_cache_path to replace --test like we did in bash command )
Cao D, Chen G, Jiang J, et al. EquiScore: A generic protein-ligand interaction scoring method integrating physical prior knowledge with data augmentation modeling[J]. bioRxiv, 2023: 2023.06. 18.545464. doi: https://doi.org/10.1101/2023.06.18.545464
MIT