- EDA_of logP logp.json dataset
- EDA of logP physprop dataset
- General usage of standartization scripts
- EDA of logP OChem dataset
- EDA of logP Diverse dataset
- EDA OF logP NCI dataset
- Standartization of SMILES in datasets and merging all data sources of logP
- EDA of merged logP with pH and Temperature dataset
- EDA of logD benchmark Lipophilicity dataset
- EDA of logD logD7.4 dataset
- EDA of logD OChem dataset
- Standartization of SMILES in datasets and merging all data sources of logD
- Analysis of logP+pH dataset
- Analysis of logP dataset with averaging of duplicated values (pH and Temperature were dropped)
- Analysis of logP dataset with averaging of duplicated values (pH and Temperature were not known)
- Split logp_mean, logP_pH_range_mean, logP_wo_parameters and logD_pH datasets to train/val/test
- Split logP dataset without averaging of duplicated values and logD (only Lipophilicity source) datasets
- Algorithm for defining symmetric and asymetric molecules
- Split dataset with ZINC molecules
- Selecting molecules with specific properties to test model
- Calculate the percent of atoms in more than one ring in logP dataset
- Creating of MultiTask datasets and their splits
- EDA of benchmark ESOL anf FreeSolv datasets
- Standartization, merge and split of benchmark ESOL and FreeSolv datasets
- Split of final logp_wo_logp_json_wo_averaging and logd_Lip_wo_averaging datasets
- Count the percent of molecules which were merged strongly (length of molecule significantly decreased)
- Count the unique representations of molecules and get the most common ones
- Analyzing the best and the worst predictions
- Testing substructures extraction
- Comparison of StructGNN and D-MPNN predictions
- Merging train and validation datasets for cross-validation
- Analyzing the best and the worst predictions
- Analysis of additional RDKit features in model
- RDKit features + XGBoost and RDKit features + MLPRegressor models