This repository contains the official implementation of A User-Tunable Machine Learning Framework for Step-Wise Synthesis Planning, available on arXiv.
We introduce MHNpath, a machine learning-driven retrosynthetic tool designed for computer-aided synthesis planning. Leveraging modern Hopfield networks and novel comparative metrics, MHNpath efficiently prioritizes reaction templates, improving the scalability and accuracy of retrosynthetic predictions. The tool incorporates a tunable scoring system that allows users to prioritize pathways based on cost, reaction temperature, and toxicity, thereby facilitating the design of greener and cost-effective reaction routes. We demonstrate its effectiveness through case studies involving complex molecules from ChemByDesign, showcasing its ability to predict novel synthetic and enzymatic pathways. Furthermore, we benchmark MHNpath against existing frameworks, replicating experimentally validated "gold-standard" pathways from PaRoutes. Our case studies reveal that the tool can generate shorter, cheaper, moderate-temperature routes employing green solvents, as exemplified by compounds such as dronabinol, arformoterol, and lupinine.
-
Setup Environment
Make the project directory your current working directory (this is important):
conda create -n "mhnpath" python=3.8 conda activate mhnpath pip install -r requirements.txt
To use the pricing feature, obtain your API keys from one or all of Mcule, Molport, and Chemspace, and add them to the
config.yaml
file. We highly recommend doing this for the best and most accurate results. -
Download Data and Models
Go to Figshare Dataset, click on Download All, and save the zip file with the default name
28673540.zip
. -
Extract and Organize Files
Run the following command to unzip and move the data/models to the required locations:
python extract.py
-
Inference
To perform inference, run:
python tree_search_global_greedy.py -product "compound" -n_enz 5 -n_syn 5 -max_depth 5 -json_pathway "tree.json" -device "cuda"
Parameters:
product
: SMILES string of the target product. (Required)n_enz
: Number of enzyme reaction rules to consider. (Optional, default: 3)n_syn
: Number of synthetic reaction rules to consider. (Optional, default: 3)max_depth
: Maximum depth for the tree search. (Optional, default: 3)json_pathway
: Filename for saving the resulting pathway tree in JSON format. (Optional, default:"tree.json"
)device
: Device to run the model on; either"cpu"
or"cuda"
. (Optional, default:"cpu"
)
-
Training
To train using the same hyperparameters as in our experiments, run the following commands:
python mhnreact/train.py --concat_rand_template_thresh 3 --exp_name enz_final --ssretroeval True --csv_path data/enz_mhn_shuffled.csv --save_model True --seed 0 --epoch 11 --dropout 0.01 --lr 1e-4 --hopf_beta 0.035 --hopf_association_activation 'Tanh' --norm_input False --temp_encoder_layers 2 --batch_size 32 > enz_final.txt
python mhnreact/train.py --concat_rand_template_thresh 3 --exp_name syn1_final --ssretroeval True --csv_path data/syn_mhn_split_1.csv --save_model True --seed 0 --epoch 11 --dropout 0.01 --lr 1e-4 --hopf_beta 0.035 --hopf_association_activation 'Tanh' --norm_input False --temp_encoder_layers 2 --batch_size 32 > syn1_final.txt
python mhnreact/train.py --concat_rand_template_thresh 3 --exp_name syn2_final --ssretroeval True --csv_path data/syn_mhn_split_2.csv --save_model True --seed 0 --epoch 11 --dropout 0.01 --lr 1e-4 --hopf_beta 0.035 --hopf_association_activation 'Tanh' --norm_input False --temp_encoder_layers 2 --batch_size 32 > syn2_final.txt
python mhnreact/train.py --concat_rand_template_thresh 3 --exp_name syn3_final --ssretroeval True --csv_path data/syn_mhn_split_3.csv --save_model True --seed 0 --epoch 11 --dropout 0.01 --lr 1e-4 --hopf_beta 0.035 --hopf_association_activation 'Tanh' --norm_input False --temp_encoder_layers 2 --batch_size 32 > syn3_final.txt
python mhnreact/train.py --concat_rand_template_thresh 3 --exp_name syn4_final --ssretroeval True --csv_path data/syn_mhn_split_4.csv --save_model True --seed 0 --epoch 11 --dropout 0.01 --lr 1e-4 --hopf_beta 0.035 --hopf_association_activation 'Tanh' --norm_input False --temp_encoder_layers 2 --batch_size 32 > syn4_final.txt
python mhnreact/train.py --concat_rand_template_thresh 3 --exp_name syn5_final --ssretroeval True --csv_path data/syn_mhn_split_5.csv --save_model True --seed 0 --epoch 11 --dropout 0.01 --lr 1e-4 --hopf_beta 0.035 --hopf_association_activation 'Tanh' --norm_input False --temp_encoder_layers 2 --batch_size 32 > syn5_final.txt
This code base is built on top of, and thanks to them for maintaining the repositories:
If you find MHNpath helpful, please consider citing:
@misc{prakash2025usertunablemachinelearningframework,
title={A User-Tunable Machine Learning Framework for Step-Wise Synthesis Planning},
author={Shivesh Prakash and Hans-Arno Jacobsen and Viki Kumar Prasad},
year={2025},
eprint={2504.02191},
archivePrefix={arXiv},
primaryClass={cs.CE},
url={https://arxiv.org/abs/2504.02191},
}