This folder contains graphs-, utils folder and a gen_data.py script.
The graphs folder contains a small set of graph which can be used to verify that DeepStruc is running correctly. To
generate a broader distribution of mono-metallic nanoparticles (MMNPs) for training, validation and testing the gen_data.py
script is needed.
DiffPy-CMI in required to simulate PDFs, which only runs on Linux or macOS. To run it on a Windows computer please use the Ubuntu subsystem. To generate more data run the gen_data.py script. The scripts takes a range of arguments which are all descriped below or use the help command to produce the parameter list. The help argument will also show default values.
python gen_data.py --help
>>> usage: gen_data.py [-h] [-d DIRECTORY] [-a ATOMS [ATOMS ...]]
>>> [-t {SC,FCC,BCC,HCP,Ico,Dec,Oct} [{SC,FCC,BCC,HCP,Ico,Dec,Oct} ...]]
>>> [-n NUM_ATOMS] [-i INTERPOLATION] [-q QMIN] [-Q QMAX]
>>> [-r RMIN] [-R RMAX] [-rs RSTEP] [-b BISO]
>>>
>>> Generating structures, graphs and conditional PDFs for DeepStruc.
>>> ...
List of possible arguments or run the '--help' argument for additional information.
Arg | Description | Example |
---|---|---|
-h or --help |
Prints help message. | |
-d or --directory |
Prints help message. str | -d new_data |
-a or --atoms |
An atom or list of atoms. str | -a Nb W Mo |
-t or --structure_type |
A single or list of structure types. Possible structure types are: SC, FCC, BCC, HCP, Ico, Dec and Oct. str | -t SC Ico |
-n or --num_atoms |
Maximum number of possible atoms in structures generated. int | -n 200 |
-i or --interpolation |
Prints help message. int | -i 3 |
-q or --qmin |
Smallest scattering amplitude for simulated PDFs. float | -q 0.2 |
-Q or --qmax |
Largest scattering amplitude for simulated PDFs. float | -Q 22.3 |
-p or --qdamp |
PDF Gaussian dampening factor due to limited Q-resolution. Not applied when equal to zero. float | -p 0.02 |
-r or --rmin |
Smallest r-value for simulated PDFs. float | -r 1.5 |
-R or --rmax |
Largest r-value for simulated PDFs. float | -R 20.0 |
-s or --rstep |
r-grid spacing for simulated PDFs. float | -s 0.1 |
-e or --delta2 |
Coefficient for (1/r**2) contribution to the peak sharpening. float | -e 3.5 |
-b or --biso |
Isotropic Atomic Displacement Parameter for simulated PDFs. float | -b 0.2 |
A simplified description is shown below. For detailed description of the data format please revisit the paper.
To simulate MMNPs we use the ASE library. All of the MMNPs are in a XYZ format where the elements and their euclidian distances are described as seen below:
Atom1 x1 y1 z1
Atom2 x2 y2 z2
...
AtomN xN yN zN
Each structure in graph representation can be described as, G = (X,A), where X ∈ RN×F is the node feature matrix which contains F features that can describe each of the N atoms in the structure. We use F = 3 comprising only the Euclidean coordinates of the atom in a 3-dimensional space. The interatomic relationships are captured using the adjacency matrix A ∈ RN×N. In our case, the entries of the adjacency matrix are the Euclidean distance between pairs of atoms resulting in a soft adjacency matrix. However, when the distance between any pair of nodes is larger than the lattice constant the corresponding edge weight is set to zero.
The following figure shows a decahedron consisting of seven atoms alongside the components describing it in our chosen graph representation!
The PDF is the Fourier transform of total scattering data, which can be obtained through x-ray, neutron or electron scattering.
G(r) can be interpreted as a histogram of real-space interatomic distances and the information is equivalent to that of an unassigned distance matrix.
A simulated PDF and how we normalise them are shown below: