Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
BCSim committed Dec 18, 2021
0 parents commit ff5896e
Show file tree
Hide file tree
Showing 225 changed files with 13,704 additions and 0 deletions.
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
*.pyc
__pycache__
__pycache__/*
shazi_*
*.bak
131 changes: 131 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
### Folder contents
- `oep-wy`: codes for OEP and dataset generation
- `nn-train`: codes to train and test a NN model
- `xcnn`: codes to perform KS-DFT/NN using trained NN model as an xc function

### Example
An example is provided in folder `example`. 11 $\rm H_2$ and 11 $\rm HeH^+$ molecules are used.

- OEP: `python run_oep.py`
- Generate dataset: `python gen_dataset.py`
- Training: `python run_train.py`
- Testing: `python run_test.py` (only to check training progress, different from KS-SCF/NN)
- KS-SCF/NN: `python run_xcnn.py`

The model obtained from training can be used in KS-DFT/NN. A pre-trained model is also provided in `example/xcnn/saved_model/H2-HeH+_CNN_GGA_1_0.504-0.896-0.008_HFX_ll_0.9_9.dat`.


### Configuration
#### OEP and Dataset
**[OEP]**
Key | Value | Note
----|------ | ----
InputDensity | none | Density matrix in `ndarray` format. Will compute a CCSD 1-rdm if `none` is given.
Structure | structure/H2/d0500.str
OrbitalBasis | aug-cc-pvqz
PotentialBasis | aug-cc-pvqz
ReferencePotential | hfx | Coulomb matrix and Hartree-Fock Exchange matrix
PotentialCoefficientInit | zeros | Can use a txt or `ndarray` file
CheckPointPath | oep-wy/chk/H2/d0500
ConvergenceCriterion | 1.e-12 | Stop criterion of Newton optimization procedure.
SVDCutoff | 5.e-6 | Cutoff for truncated SVD
LambdaRegulation | 0 | Lambda value for regulation to get smooth potential. Used for multiple electrons system.
ZeroForceConstrain | false | It seems not a good choice to use zero force constrain during optimization
RealSpaceAnalysis | true | Output density difference between input and output density in real space

**[DATASET]**
Key | Value | Note
----|------ | ----
MeshLevel | 3
CubeLength | 0.9 | in Bohr
CubePoint | 9 | number of discrete points
OutputPath | oep-wy/dataset/H2
OutputName | d0500
Symmetric | xz | Transform $(x, y, z)$ to $(\sqrt{x^2 + y^2}, 0, z)$ and keep only unique points

#### Training and Testing
##### Training
**[OPTIONS]**
Key | Value | Note
----|------ | ----
prefix | nn-train
log_path | %(prefix)s/train/train.log
verbose | False
data_path | %(prefix)s/dataset/H2-HeH+_0.9_9.npy
model | CNN_GGA_1_zsym | The models with and without `_zsym` suffix have same architecture and only differ in output. See `nn-train/model.py` and `nn-train/const_list.py`.
model_save_path | %(prefix)s/train/model_chk/H2-HeH+_0.9_0_CNN_GGA_1.dat
batch_size | 200
max_epoch | 200000
learning_rate | 5e-3
loss_function | MSELoss_zsym
optimiser | SGD
train_set_size | 78800
validate_set_size | 19600
enable_cuda | True
constrain | zsym | Needs to be `zsym` to use model and loss function with `_zsym` suffix

##### Testing
**[OPTIONS]**
Key | Value | Note
----|------ | ----
prefix | nn-train
log_path | %(prefix)s/test/H2/d0500/test.log
verbose | False
data_path | %(prefix)s/dataset/H2/d0500.npy
model | CNN_GGA_1
restart | %(prefix)s/train/model_chk/H2-HeH+_0.9_0_CNN_GGA_1.dat.restart10000
batch_size | 1
loss_function | MSELoss
optimiser | SGD
test_set_size | 4920
enable_cuda | True
output_path | %(prefix)s/test/H2/d0500
constrain | none

#### KS-SCF/NN
**[XCNN]**
Key | Value | Note
----|------ | ----
Verbose | True
CheckPointPath | xcnn/chk/H2/d0500
EnableCuda | True
Structure | structure/H2/d0500.str
OrbitalBasis | aug-cc-pVQZ
ReferencePotential | hfx | Should be same as the one used in OEP
Model | cnn_gga_1
ModelPath | xcnn/saved_model/H2-HeH+_0.9_0_CNN_GGA_1.dat.restart10000
MeshLevel | 3 | Should be same as the one used in training
CubeLength | 0.9 | Should be same as the one used in training
CubePoint | 9 | Should be same as the one used in training
Symmetric | xz+ | Similar to xz but keep only $z>0$ part. Only for $\rm H_2$
InitDensityMatrix | rks | Used in combination with next row to setup initial density matrix for KS-SCF/NN
xcFunctional | b3lypg | Follow PySCF's convention. In PySCF, `b3lyp` is different from `b3lypg` and the latter refers to conventional `B3LYP` functional.
ConvergenceCriterion | 1.e-6 | For SCF procedure
MaxIteration | 99
ZeroForceConstrain | True | Typically enabled to keep zero force condition.

## Dependencies
- numpy
- scipy
- tqdm
- ConfigParser/configparser
- PyTorch with CUDA support
- PySCF > 1.5

### Note on PySCF
A customised version of libcint is used to support extra Gaussian integrals. Therefore PySCF installed using pip/conda/docker will fail and you may have to compile it from source code. A straight workaround is described below (maybe not that efficient):
1. Download [PySCF](https://github.com/pyscf/pyscf) source code and follow its procedure to compile core module.
2. Go to `pyscf/lib/build/deps/src/libcint`, where the source code of libcint is placed.
3. Open `scripts/auto_intor.cl` and add the following two lines to the last `gen-cint` block:
```
'("int3c1e_ovlp" ( \, \, ))
'("int3c1e_ipovlp" (nabla \, \, ))
```
4. Follow the instructions at `Generating integrals` in `README` to generate new codes and place them accordingly. I choose to NOT update libcint here.
5. Go back to `pyscf/lib/build` where the command to compile PySCF core module is executed. Run `make` again and the libcint library will be updated.
6. Open `pyscf/gto/moleintor.py` and add the following two lines to `_INTOR_FUNCTIONS`
```
'int3c1e_ovlp' : (1, 1),
'int3c1e_ipovlp' : (3, 3),
```
7. Done
25 changes: 25 additions & 0 deletions example/gen_dataset.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
from __future__ import print_function
import numpy as np
import os
import sys


PATH_PREFIX = "./oep-wy/dataset"

raw_files_H2 = ["%s/H2/%s" % (PATH_PREFIX, f) for f in os.listdir("%s/H2" % (PATH_PREFIX)) if (not f.endswith("coords.npy")) and f.endswith(".npy")]
raw_files_H2 = sorted(raw_files_H2)
raw_files_HeH = ["%s/HeH+/%s" % (PATH_PREFIX, f) for f in os.listdir("%s/HeH+" % (PATH_PREFIX)) if (not f.endswith("coords.npy")) and f.endswith(".npy")]
raw_files_HeH = sorted(raw_files_HeH)
raw_files = raw_files_H2 + raw_files_HeH

all_data = np.load(raw_files[0])
for i, f in enumerate(raw_files[1:]):
new_data = np.load(f)
all_data = np.concatenate((all_data, new_data), axis=0)
assert(all_data.shape[1] == 4 * 9 * 9 * 9 + 1)

print("Dataset size:", all_data.shape)

np.random.shuffle(all_data)
np.save("%s/H2-HeH+_0.9_9" % (PATH_PREFIX), all_data)

15 changes: 15 additions & 0 deletions example/nn-train/test/H2/d0500.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
[OPTIONS]
prefix: nn-train
log_path: %(prefix)s/test/H2/d0500/test.log
verbose: False
data_path: %(prefix)s/dataset/H2/d0500.npy
model: CNN_GGA_1
restart: %(prefix)s/train/model_chk/H2-HeH+_0.9_0_CNN_GGA_1.dat.restart10000
batch_size: 1
loss_function: MSELoss
optimiser: SGD
test_set_size: 4920
enable_cuda: True
output_path: %(prefix)s/test/H2/d0500

constrain: none
15 changes: 15 additions & 0 deletions example/nn-train/test/H2/d0540.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
[OPTIONS]
prefix: nn-train
log_path: %(prefix)s/test/H2/d0540/test.log
verbose: False
data_path: %(prefix)s/dataset/H2/d0540.npy
model: CNN_GGA_1
restart: %(prefix)s/train/model_chk/H2-HeH+_0.9_0_CNN_GGA_1.dat.restart10000
batch_size: 1
loss_function: MSELoss
optimiser: SGD
test_set_size: 4920
enable_cuda: True
output_path: %(prefix)s/test/H2/d0540

constrain: none
15 changes: 15 additions & 0 deletions example/nn-train/test/H2/d0580.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
[OPTIONS]
prefix: nn-train
log_path: %(prefix)s/test/H2/d0580/test.log
verbose: False
data_path: %(prefix)s/dataset/H2/d0580.npy
model: CNN_GGA_1
restart: %(prefix)s/train/model_chk/H2-HeH+_0.9_0_CNN_GGA_1.dat.restart10000
batch_size: 1
loss_function: MSELoss
optimiser: SGD
test_set_size: 4920
enable_cuda: True
output_path: %(prefix)s/test/H2/d0580

constrain: none
15 changes: 15 additions & 0 deletions example/nn-train/test/H2/d0620.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
[OPTIONS]
prefix: nn-train
log_path: %(prefix)s/test/H2/d0620/test.log
verbose: False
data_path: %(prefix)s/dataset/H2/d0620.npy
model: CNN_GGA_1
restart: %(prefix)s/train/model_chk/H2-HeH+_0.9_0_CNN_GGA_1.dat.restart10000
batch_size: 1
loss_function: MSELoss
optimiser: SGD
test_set_size: 4920
enable_cuda: True
output_path: %(prefix)s/test/H2/d0620

constrain: none
15 changes: 15 additions & 0 deletions example/nn-train/test/H2/d0660.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
[OPTIONS]
prefix: nn-train
log_path: %(prefix)s/test/H2/d0660/test.log
verbose: False
data_path: %(prefix)s/dataset/H2/d0660.npy
model: CNN_GGA_1
restart: %(prefix)s/train/model_chk/H2-HeH+_0.9_0_CNN_GGA_1.dat.restart10000
batch_size: 1
loss_function: MSELoss
optimiser: SGD
test_set_size: 4920
enable_cuda: True
output_path: %(prefix)s/test/H2/d0660

constrain: none
15 changes: 15 additions & 0 deletions example/nn-train/test/H2/d0700.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
[OPTIONS]
prefix: nn-train
log_path: %(prefix)s/test/H2/d0700/test.log
verbose: False
data_path: %(prefix)s/dataset/H2/d0700.npy
model: CNN_GGA_1
restart: %(prefix)s/train/model_chk/H2-HeH+_0.9_0_CNN_GGA_1.dat.restart10000
batch_size: 1
loss_function: MSELoss
optimiser: SGD
test_set_size: 4920
enable_cuda: True
output_path: %(prefix)s/test/H2/d0700

constrain: none
15 changes: 15 additions & 0 deletions example/nn-train/test/H2/d0740.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
[OPTIONS]
prefix: nn-train
log_path: %(prefix)s/test/H2/d0740/test.log
verbose: False
data_path: %(prefix)s/dataset/H2/d0740.npy
model: CNN_GGA_1
restart: %(prefix)s/train/model_chk/H2-HeH+_0.9_0_CNN_GGA_1.dat.restart10000
batch_size: 1
loss_function: MSELoss
optimiser: SGD
test_set_size: 4920
enable_cuda: True
output_path: %(prefix)s/test/H2/d0740

constrain: none
15 changes: 15 additions & 0 deletions example/nn-train/test/H2/d0780.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
[OPTIONS]
prefix: nn-train
log_path: %(prefix)s/test/H2/d0780/test.log
verbose: False
data_path: %(prefix)s/dataset/H2/d0780.npy
model: CNN_GGA_1
restart: %(prefix)s/train/model_chk/H2-HeH+_0.9_0_CNN_GGA_1.dat.restart10000
batch_size: 1
loss_function: MSELoss
optimiser: SGD
test_set_size: 4920
enable_cuda: True
output_path: %(prefix)s/test/H2/d0780

constrain: none
15 changes: 15 additions & 0 deletions example/nn-train/test/H2/d0820.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
[OPTIONS]
prefix: nn-train
log_path: %(prefix)s/test/H2/d0820/test.log
verbose: False
data_path: %(prefix)s/dataset/H2/d0820.npy
model: CNN_GGA_1
restart: %(prefix)s/train/model_chk/H2-HeH+_0.9_0_CNN_GGA_1.dat.restart10000
batch_size: 1
loss_function: MSELoss
optimiser: SGD
test_set_size: 4920
enable_cuda: True
output_path: %(prefix)s/test/H2/d0820

constrain: none
15 changes: 15 additions & 0 deletions example/nn-train/test/H2/d0860.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
[OPTIONS]
prefix: nn-train
log_path: %(prefix)s/test/H2/d0860/test.log
verbose: False
data_path: %(prefix)s/dataset/H2/d0860.npy
model: CNN_GGA_1
restart: %(prefix)s/train/model_chk/H2-HeH+_0.9_0_CNN_GGA_1.dat.restart10000
batch_size: 1
loss_function: MSELoss
optimiser: SGD
test_set_size: 4920
enable_cuda: True
output_path: %(prefix)s/test/H2/d0860

constrain: none
15 changes: 15 additions & 0 deletions example/nn-train/test/H2/d0900.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
[OPTIONS]
prefix: nn-train
log_path: %(prefix)s/test/H2/d0900/test.log
verbose: False
data_path: %(prefix)s/dataset/H2/d0900.npy
model: CNN_GGA_1
restart: %(prefix)s/train/model_chk/H2-HeH+_0.9_0_CNN_GGA_1.dat.restart10000
batch_size: 1
loss_function: MSELoss
optimiser: SGD
test_set_size: 4920
enable_cuda: True
output_path: %(prefix)s/test/H2/d0900

constrain: none
15 changes: 15 additions & 0 deletions example/nn-train/test/HeH+/d0500.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
[OPTIONS]
prefix: nn-train
log_path: %(prefix)s/test/HeH+/d0500/test.log
verbose: False
data_path: %(prefix)s/dataset/HeH+/d0500.npy
model: CNN_GGA_1
restart: %(prefix)s/train/model_chk/H2-HeH+_0.9_0_CNN_GGA_1.dat.restart10000
batch_size: 1
loss_function: MSELoss
optimiser: SGD
test_set_size: 4920
enable_cuda: True
output_path: %(prefix)s/test/HeH+/d0500

constrain: none
15 changes: 15 additions & 0 deletions example/nn-train/test/HeH+/d0540.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
[OPTIONS]
prefix: nn-train
log_path: %(prefix)s/test/HeH+/d0540/test.log
verbose: False
data_path: %(prefix)s/dataset/HeH+/d0540.npy
model: CNN_GGA_1
restart: %(prefix)s/train/model_chk/H2-HeH+_0.9_0_CNN_GGA_1.dat.restart10000
batch_size: 1
loss_function: MSELoss
optimiser: SGD
test_set_size: 4920
enable_cuda: True
output_path: %(prefix)s/test/HeH+/d0540

constrain: none
15 changes: 15 additions & 0 deletions example/nn-train/test/HeH+/d0580.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
[OPTIONS]
prefix: nn-train
log_path: %(prefix)s/test/HeH+/d0580/test.log
verbose: False
data_path: %(prefix)s/dataset/HeH+/d0580.npy
model: CNN_GGA_1
restart: %(prefix)s/train/model_chk/H2-HeH+_0.9_0_CNN_GGA_1.dat.restart10000
batch_size: 1
loss_function: MSELoss
optimiser: SGD
test_set_size: 4920
enable_cuda: True
output_path: %(prefix)s/test/HeH+/d0580

constrain: none
Loading

0 comments on commit ff5896e

Please sign in to comment.