Reproduction of benchmarks and inverse design case study using a docker image [as an example]. One can run these calculaitons without the docker environment but one need to edit the *.pbs files to make sure the job management system on your PC/HPC work.
Download this repo and unzipped it.
Put Materials Project's new API key in "APIKEY.ini".
Edit "CPUs" in "slurm.conf" to set up the number of CPU threads available for the docker container.
docker pull xiaohang07/slices:v9 # Download SLICES_docker with pre-installed SLICES and other relevant packages.
# Make entrypoint_set_cpus.sh executable
sudo chmod +x entrypoint_set_cpus.sh
# Repalce "[]" with the absolute path of this repo's unzipped folder to setup share folder for the docker container.
docker run -it --privileged=true -h workq --gpus all --shm-size=0.1gb -v /[]:/crystal -w /crystal xiaohang07/slices:v9 /crystal/entrypoint_set_cpus.sh
Convert MP-20 dataset to json (cdvae/data/mp_20 at main · txie-93/cdvae. GitHub. https://github.com/txie-93/cdvae (accessed 2023-03-12))
cd /crystal/benchmark/Match_rate_MP-20/get_json/0_get_mp20_json
python 0_mp20.py
Rule out unsupported elements
cd /crystal/benchmark/Match_rate_MP-20/get_json/1_element_filter
python 1_splitRun.py
#After running python 1_splitRun.py, the computation is only submitted to the queue,
# not completed. To monitor the progress of the computation, use the qstat command.
#If all tasks are marked with a status of "C", it indicates that the computation has finished.
python 2_collect.py
#After the computation are finished, running python 2_collect.py to get results.
Convert to primitive cell
cd /crystal/benchmark/Match_rate_MP-20/get_json/2_primitive_cell_conversion
python 1_splitRun.py
#After running python 1_splitRun.py, the computation is only submitted to the queue,
# not completed. To monitor the progress of the computation, use the qstat command.
#If all tasks are marked with a status of "C", it indicates that the computation has finished.
python 2_collect.py
#After the computation are finished, running python 2_collect.py to get results.
Rule out crystals with low-dimensional units (e.g. molecular crystals or layered crystals)
cd /crystal/benchmark/Match_rate_MP-20/get_json/3_3d_filter
python 1_splitRun.py
#After running python 1_splitRun.py, the computation is only submitted to the queue,
# not completed. To monitor the progress of the computation, use the qstat command.
#If all tasks are marked with a status of "C", it indicates that the computation has finished.
python 2_collect.py
#After the computation are finished, running python 2_collect.py to get results.
Calculate reconstruction rate of IAP-refined structures, ZL*-optimized structures, rescaled structures under strict and coarse setting.
cd /crystal/benchmark/Match_rate_MP-20/matchcheck3
python 1_ini.py
#After running python 1_ini.py, the computation is only submitted to the queue,
# not completed. To monitor the progress of the computation, use the qstat command.
#If all tasks are marked with a status of "C", it indicates that the computation has finished.
python 2_collect_grid_new.py
#After the computation are finished, running python 2_collect_grid_new.py to get "results_collection_matchcheck3.csv"
Calculate reconstruction rate of IAP-refined structures, ZL*-optimized structures, IAP-refined rescaled structures, rescaled structures under strict and coarse setting.
cd /crystal/benchmark/Match_rate_MP-20/matchcheck4
python 1_ini.py
#After running python 1_ini.py, the computation is only submitted to the queue,
# not completed. To monitor the progress of the computation, use the qstat command.
#If all tasks are marked with a status of "C", it indicates that the computation has finished.
python 2_collect_grid_new.py
#After the computation are finished, running python 2_collect_grid_new.py to get "results_collection_matchcheck4.csv"
Reproduction of Table 1: the table below illustrates the correspondence between the data in "results_collection_matchcheck4.csv" and the match rates of SLI2Cry for the filtered MP-20 dataset (40,330 crystals) presented in Table 1.
Setting | Rescaled Structure | 𝑍𝐿∗-Optimized Structure | IAP-Refined Structure | IAP-Refined Rescaled Structure |
---|---|---|---|---|
Strict | std_match_sum | opt_match_sum | opt2_match_sum | std2_match_sum |
Loose | std_match2_sum | opt_match2_sum | opt2_match2_sum | std2_match2_sum |
Reproduction of Table 2: the match rate of SLI2Cry for the MP-20 dataset (45,229 crystals) = opt2_match2_sum*40330/45229.
Download entries to build the filtered MP-21-40 dataset
cd /crystal/benchmark/Match_rate_MP-21-40/0_get_json_mp_api
python 0_mp21-40_dataset.py
!!! If “mp_api.client.core.client.MPRestError: REST query returned with error status code” occurs. The solution is:
pip install -U mp-api
Rule out crystals with low-dimensional units (e.g. molecular crystals or layered crystals) in general dataset
cd /crystal/benchmark/Match_rate_MP-21-40/0_get_json_mp_api/1_filter_prior_3d
python 1_splitRun.py
#After running python 1_splitRun.py, the computation is only submitted to the queue,
# not completed. To monitor the progress of the computation, use the qstat command.
#If all tasks are marked with a status of "C", it indicates that the computation has finished.
python 2_collect.py
#After the computation are finished, running python 2_collect.py to get results.
Calculate reconstruction rate of IAP-refined structures, ZL*-optimized structures, rescaled structures under strict and coarse setting.
cd /crystal/benchmark/Match_rate_MP-21-40/matchcheck3
python 1_ini.py
#After running python 1_ini.py, the computation is only submitted to the queue,
# not completed. To monitor the progress of the computation, use the qstat command.
#If all tasks are marked with a status of "C", it indicates that the computation has finished.
python 2_collect_grid_new.py
#After the computation are finished, running python 2_collect_grid_new.py to get results.
Reproduction of Table S1: the table below illustrates the correspondence between the data in "results_collection_matchcheck3.csv" and the match rates of SLI2Cry for the filtered MP-21-40 dataset (23,560 crystals) presented in Table S1.
Setting | Filtered MP-21-40 |
---|---|
Strict | opt2_match_sum |
Loose | opt2_match2_sum |
Extract MOFs with 21-40 atoms per unit cells in QMOF database to build the QMOF-21-40 dataset ( Figshare: https://figshare.com/articles/dataset/QMOF_Database/13147324 Version 14)
cd /crystal/benchmark/Match_rate_QMOF-21-40/get_json/0_get_mof_json
python get_json.py
Rule out unsupported elements
cd /crystal/benchmark/Match_rate_QMOF-21-40/get_json/1_element_filter
python 1_splitRun.py
#After running python 1_splitRun.py, the computation is only submitted to the queue,
# not completed. To monitor the progress of the computation, use the qstat command.
#If all tasks are marked with a status of "C", it indicates that the computation has finished.
python 2_collect.py
#After the computation are finished, running python 2_collect.py to get results.
Convert to primitive cell
cd /crystal/benchmark/Match_rate_QMOF-21-40/get_json/2_primitive_cell_conversion
python 1_splitRun.py
#After running python 1_splitRun.py, the computation is only submitted to the queue,
# not completed. To monitor the progress of the computation, use the qstat command.
#If all tasks are marked with a status of "C", it indicates that the computation has finished.
python 2_collect.py
#After the computation are finished, running python 2_collect.py to get results.
Rule out crystals with low-dimensional units (e.g. molecular crystals or layered crystals)
cd /crystal/benchmark/Match_rate_QMOF-21-40/get_json/3_3d_filter
python 1_splitRun.py
#After running python 1_splitRun.py, the computation is only submitted to the queue,
# not completed. To monitor the progress of the computation, use the qstat command.
#If all tasks are marked with a status of "C", it indicates that the computation has finished.
python 2_collect.py
#After the computation are finished, running python 2_collect.py to get results.
Calculate reconstruction rate of IAP-refined structures, ZL*-optimized structures, rescaled structures under strict and coarse setting.
cd /crystal/benchmark/Match_rate_QMOF-21-40/matchcheck3
python 1_ini.py
#After running python 1_ini.py, the computation is only submitted to the queue,
# not completed. To monitor the progress of the computation, use the qstat command.
#If all tasks are marked with a status of "C", it indicates that the computation has finished.
python 2_collect_grid_new.py
#After the computation are finished, running python 2_collect_grid_new.py to get results.
Reproduction of Table S1: the table below illustrates the correspondence between the data in "results_collection_matchcheck3.csv" and the match rates of SLI2Cry for the filtered QMOF-21-40 dataset (339 MOFs) presented in Table S1.
Setting | Filtered QMOF-21-40 |
---|---|
Strict | opt2_match_sum |
Loose | opt2_match2_sum |
Convert MP-20 dataset to json (cdvae/data/mp_20 at main · txie-93/cdvae. GitHub. https://github.com/txie-93/cdvae (accessed 2023-03-12))
cd /crystal/benchmark/Validity_rate_ucRNN__Success_rate_cRNN/0_get_json/0_get_mp20_json
python 0_mp20.py
Rule out unsupported elements
cd /crystal/benchmark/Validity_rate_ucRNN__Success_rate_cRNN/0_get_json/1_element_filter
python 1_splitRun.py
#After running python 1_splitRun.py, the computation is only submitted to the queue,
# not completed. To monitor the progress of the computation, use the qstat command.
#If all tasks are marked with a status of "C", it indicates that the computation has finished.
python 2_collect.py
#After the computation are finished, running python 2_collect.py to get results.
Convert to primitive cell
cd /crystal/benchmark/Validity_rate_ucRNN__Success_rate_cRNN/0_get_json/2_primitive_cell_conversion
python 1_splitRun.py
#After running python 1_splitRun.py, the computation is only submitted to the queue,
# not completed. To monitor the progress of the computation, use the qstat command.
#If all tasks are marked with a status of "C", it indicates that the computation has finished.
python 2_collect.py
#After the computation are finished, running python 2_collect.py to get results.
Rule out crystals with low-dimensional units (e.g. molecular crystals or layered crystals)
cd /crystal/benchmark/Validity_rate_ucRNN__Success_rate_cRNN/0_get_json/3_3d_filter
python 1_splitRun.py
#After running python 1_splitRun.py, the computation is only submitted to the queue,
# not completed. To monitor the progress of the computation, use the qstat command.
#If all tasks are marked with a status of "C", it indicates that the computation has finished.
python 2_collect.py
#After the computation are finished, running python 2_collect.py to get results.
Convert crystal structures in datasets to SLICES strings and conduct data augmentation
cd /crystal/benchmark/Validity_rate_ucRNN__Success_rate_cRNN/1_unconditioned_RNN/1_augmentation
python 1_splitRun.py
#After running python 1_splitRun.py, the computation is only submitted to the queue,
# not completed. To monitor the progress of the computation, use the qstat command.
#If all tasks are marked with a status of "C", it indicates that the computation has finished.
python 2_collect.py
#After the computation are finished, running python 2_collect.py to get results.
Train unconditional RNN; sample 10000 SLICES strings
cd /crystal/benchmark/Validity_rate_ucRNN__Success_rate_cRNN/1_unconditioned_RNN/2_train_sample
sh 0_train_prior_model.sh
Modify ./workflow/2_sample_HTL_model_100x.py to define the number of SLICES to be sampled
sh 1_sample_in_parallel.sh
#After running sh 1_sample_in_parallel.sh, the computation is only submitted to the queue,
# not completed. To monitor the progress of the computation, use the qstat command.
#If all tasks are marked with a status of "C", it indicates that the computation has finished.
python 2_collect_clean_glob_details.py
#After the computation are finished, running python 2_collect_clean_glob_details.py to get results.
Removing duplicate edges in SLICES strings to fix the syntax error
cd /crystal/benchmark/Validity_rate_ucRNN__Success_rate_cRNN/1_unconditioned_RNN/3_fix_syntax_check
python 1_splitRun.py
#After running python 1_splitRun.py, the computation is only submitted to the queue,
# not completed. To monitor the progress of the computation, use the qstat command.
#If all tasks are marked with a status of "C", it indicates that the computation has finished.
python 2_collect_clean_glob_details.py
#After the computation are finished, running python 2_collect_clean_glob_details.py to get results.
Reconstruct crystal structures from SLICES strings and calculate the number of reconstructed crystals (num_reconstructed)
cd /crystal/benchmark/Validity_rate_ucRNN__Success_rate_cRNN/1_unconditioned_RNN/4_inverse
python 1_splitRun.py
#After running python 1_splitRun.py, the computation is only submitted to the queue,
# not completed. To monitor the progress of the computation, use the qstat command.
#If all tasks are marked with a status of "C", it indicates that the computation has finished.
python 2_collect_clean_glob_details.py
#After the computation are finished, running python 2_collect_clean_glob_details.py to get results.
!!! In order to address the potential memory leaks associated with M3GNet, we implemented a strategy of
restarting the Python script at regular intervals, with a batch size of 30.
python count.py #calculate the number of reconstructed crystals (num_reconstructed)
Evaluate the compositional validity of the reconstructed crystals and calculate the number of compositionally valid reconstructed crystals (num_comp_valid)
cd /crystal/benchmark/Validity_rate_ucRNN__Success_rate_cRNN/1_unconditioned_RNN/5_check_comp_valid
python 1_splitRun.py
#After running python 1_splitRun.py, the computation is only submitted to the queue,
# not completed. To monitor the progress of the computation, use the qstat command.
#If all tasks are marked with a status of "C", it indicates that the computation has finished.
python 2_collect_clean_glob_details.py
#After the computation are finished, running python 2_collect_clean_glob_details.py to get results.
python count.py # calculate the number of compositionally valid reconstructed crystals (num_comp_valid)
Evaluate the structural validity of the reconstructed crystals and calculate the number of structurally valid reconstructed crystals (num_struc_valid)
cd /crystal/benchmark/Validity_rate_ucRNN__Success_rate_cRNN/1_unconditioned_RNN/6_check_struc_validity
python 1_splitRun.py
#After running python 1_splitRun.py, the computation is only submitted to the queue,
# not completed. To monitor the progress of the computation, use the qstat command.
#If all tasks are marked with a status of "C", it indicates that the computation has finished.
python 2_collect_clean_glob_details.py
#After the computation are finished, running python 2_collect_clean_glob_details.py to get results.
python count.py # calculate the number of compositionally valid reconstructed crystals (num_struc_valid)
Reproduction of Table 3: Structural validity (%) = num_struc_valid/num_reconstructed*100 Compositional validity (%) = num_comp_valid/num_reconstructed*100
(1) Convert crystal structures in datasets to SLICES strings and conduct data augmentation
cd /crystal/benchmark/Validity_rate_ucRNN__Success_rate_cRNN/2_conditioned_RNN/1_augmentation
python 1_splitRun.py # wait for jobs to finish (using qstat to check)
python 2_collect.py
(2) Train conditional RNN
cd /crystal/benchmark/Validity_rate_ucRNN__Success_rate_cRNN/2_conditioned_RNN/2_train_sample
sh 0_train_prior_model.sh
(3) Sample 1000 SLICES strings with
sh 1_sample_in_parallel.sh # wait for jobs to finish (using qstat to check)
python 2_collect_clean_glob_details.py
(4) Removing duplicate edges in SLICES strings to fix the syntax error
cd /crystal/benchmark/Validity_rate_ucRNN__Success_rate_cRNN/2_conditioned_RNN/3_fix_syntax_check
python 1_splitRun.py # wait for jobs to finish (using qstat to check)
python 2_collect_clean_glob_details.py
(5) Reconstruct crystal structures from SLICES strings and calculate the number of reconstructed crystals (num_reconstructed)
cd /crystal/benchmark/Validity_rate_ucRNN__Success_rate_cRNN/2_conditioned_RNN/4_inverse
python 1_splitRun.py # wait for jobs to finish (using qstat to check)
python 2_collect_clean_glob_details.py
!!! In order to address the potential memory leaks associated with M3GNet, we implemented a strategy of
restarting the Python script at regular intervals, with a batch size of 30.
python count.py #calculate the number of reconstructed crystals (num_reconstructed)
(6) Evaluate the formation energy distribution of the reconstructed crystals with the M3GNet model
cd /crystal/benchmark/Validity_rate_ucRNN__Success_rate_cRNN/2_conditioned_RNN/5_eform_m3gnet
python 1_splitRun.py # wait for jobs to finish (using qstat to check)
python 2_collect_clean_glob_details.py
python 3_normal_distri_plot.py # plot the formation energy distribution (M3GNet) of the reconstructed crystals
(7) Evaluate the formation energy distribution of the reconstructed crystals at PBE level (took less than 1 day to finish with 36*26 cores HPC; need to tweak the ./workflow/0_EnthalpyOfFormation*.py to deal with some tricky cases of VASP cell optimization)
cd /crystal/benchmark/Validity_rate_ucRNN__Success_rate_cRNN/2_conditioned_RNN/6_eform_PBE
python 1_splitRun.py # wait for jobs to finish (using qstat to check)
python 2_collect_clean_glob_details.py
python 3_normal_distri_plot.py # plot the formation energy distribution (PBE) of the reconstructed crystals
(8) Reproduction of Table 3: Calculate SR5, SR10, SR15 in Table S1 using formation energies (at PBE level) of crystals generated with a target of -4.5 eV/atom
cd /crystal/benchmark/Validity_rate_ucRNN__Success_rate_cRNN/2_conditioned_RNN/7_calculate_FigureS2c
python calculate_SR5-10-15_TableS1.py # SR5, SR10, SR15 are printed in the terminal
(9) Reproduction of Fig. S2c: Repeat step (3-6) with
cd /crystal/benchmark/Validity_rate_ucRNN__Success_rate_cRNN/2_conditioned_RNN/7_calculate_FigureS2c
python plot_FigureS1c.py # get Fig. S2c as test3.svg
The formation energy distributions with