Skip to content

Latest commit

 

History

History

notebooks

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Table of Contents

Data EDA and Preprocessing

  1. EDA_of logP logp.json dataset
  2. EDA of logP physprop dataset
  3. General usage of standartization scripts
  4. EDA of logP OChem dataset
  5. EDA of logP Diverse dataset
  6. EDA OF logP NCI dataset
  7. Standartization of SMILES in datasets and merging all data sources of logP
  8. EDA of merged logP with pH and Temperature dataset
  9. EDA of logD benchmark Lipophilicity dataset
  10. EDA of logD logD7.4 dataset
  11. EDA of logD OChem dataset
  12. Standartization of SMILES in datasets and merging all data sources of logD
  13. Analysis of logP+pH dataset
  14. Analysis of logP dataset with averaging of duplicated values (pH and Temperature were dropped)
  15. Analysis of logP dataset with averaging of duplicated values (pH and Temperature were not known)
  16. Split logp_mean, logP_pH_range_mean, logP_wo_parameters and logD_pH datasets to train/val/test
  17. Split logP dataset without averaging of duplicated values and logD (only Lipophilicity source) datasets
  18. Algorithm for defining symmetric and asymetric molecules
  19. Split dataset with ZINC molecules
  20. Selecting molecules with specific properties to test model
  21. Calculate the percent of atoms in more than one ring in logP dataset
  22. Creating of MultiTask datasets and their splits
  23. EDA of benchmark ESOL anf FreeSolv datasets
  24. Standartization, merge and split of benchmark ESOL and FreeSolv datasets
  25. Split of final logp_wo_logp_json_wo_averaging and logd_Lip_wo_averaging datasets

Analysis of molecules with hyper-atoms

  1. Count the percent of molecules which were merged strongly (length of molecule significantly decreased)
  2. Count the unique representations of molecules and get the most common ones

Models notebooks

  1. Analyzing the best and the worst predictions
  2. Testing substructures extraction
  3. Comparison of StructGNN and D-MPNN predictions

StructGNN and D-MPNN

  1. Merging train and validation datasets for cross-validation
  2. Analyzing the best and the worst predictions
  3. Analysis of additional RDKit features in model
  4. RDKit features + XGBoost and RDKit features + MLPRegressor models

Count Morgan Fingerprint

  1. Morgan Fingerprint + FFNN model and hyperparameter optimization
  2. Analyzing the best and the worst predictions
  3. Morgan Fingerprint + FFNN model with cross-validation

OTGNN

  1. Prepare data for training
  2. Analyzing the best and the worst predictions

JtVAE

  1. Get list of all SMILES in data
  2. Get SMILES of molecules containing substructures that are not presented in JTree vocabulary
  3. DRAFT Notebook with training of JtVAE encoder part + FFNN
  4. Feature importance of fingerprint extracted by pretrained JtVAE encoder