Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEP for organic system #808

Closed
Leo-678 opened this issue Nov 29, 2024 · 7 comments
Closed

NEP for organic system #808

Leo-678 opened this issue Nov 29, 2024 · 7 comments

Comments

@Leo-678
Copy link

Leo-678 commented Nov 29, 2024

Dear Prof.Fan

I am training FAPbI3 model, and while the RMSE during training stays within a normal range, I encounter an issue when performing molecular dynamics simulations. Specifically, the FA molecules tend to cluster together and thats not happen in AIMD simulations. I have tried changing the training parameters for the NEP but the issue persists.

What parameters should I pay attention to when training models involving organic molecules like FA (formamidinium) in the structure?

Best Regards!
Leo

@zhyan0603
Copy link
Contributor

Dear Leo,

Could you provide more details? For example, is your training set entirely from AIMD sampling? You can also try using this script plt_nep_train_results.py to plot the training results. Also, sharing the hyperparameters from your nep.in file would be helpful.

Finally, it might be better to move this issue to the "Discussions" board, so others can chime in too.

Best,
Zihan

@Leo-678
Copy link
Author

Leo-678 commented Dec 2, 2024

Dear Zihan:
Thank you for your reply. My training structure starts with pretraining using MTP, followed by structure selection using the D-optimal method with the pretrained potential. Finally, I perform training in NEP. I have tried two sets of parameters: one that I modified and the default one. However, after running for some time in NEP, an error occurs. In LAMMPS, no error is reported, but the FA molecules aggregate.

train-1
type 5 C Pb I N H
version 4
cutoff 9 5
n_max 8 6
l_max 4 2
neuron 50
batch 100
generation 2000000

train-2

type 5 C Pb I N H
version 4 # default
cutoff 8 4 # default
n_max 4 4 # default
basis_size 8 8 # default
l_max 4 2 0 # default
neuron 30 # default
lambda_e 1.0 # default
lambda_f 1.0 # default
lambda_v 0.1 # default
batch 1000 # default
population 50 # default
generation 1000000 # default

train-1 train-2 Best Regards! Leo

@zhyan0603
Copy link
Contributor

Dear Leo,

Thank you for sharing the details of your setup. I have a few suggestions that might help with the FA molecule aggregation issue:

Your current cutoff (e.g., cutoff 8 4) might also be too large, possibly including too many neighbors, especially some that may be from periodic image atoms. This can impact model stability. Could you share the size of your system? It might help determine if the cutoff need adjustment. Additionally, the log file during the training process could provide useful information, such as the Maximum number of neighbors for one atom for radial and angular descriptors. If this number is close to or even exceeds the maximum number of atoms for the system, reducing the cutoff or adding larger structures to the training set may help.

Also, I suggest trying active learning directly with NEP to add some new structures, which might be helpful.

Best,
Zihan

@Leo-678
Copy link
Author

Leo-678 commented Dec 2, 2024

Dear Zihan:

Thanks for your reply. Yes, that's exactly the issue. I discovered that for my AIMD simulation with 144 atoms in total, the cutoff value was indeed too large. I will respond to you after conducting the cutoff testing. However, since I'm studying phase transitions. I believe the choice of cutoff value should have a significant impact on the phase transition behavior. Do you have any suggestions for selecting a cutoff value? Or does this need to be adjusted adaptively based on how the molecular dynamics simulation behaves?
image

Best Regards.
Leo

@zhyan0603
Copy link
Contributor

Hi Leo,

For choosing the cutoff, it is best to keep it within half of the box size to avoid periodic image effects. If this is unacceptable, adding some supercell structures to the training set can help the model learn more diverse environments.

You can also test how the accuracy of the model changes with a smaller cutoff. If the RMSE is still acceptable, lowering the cutoff can speed up the simulation and generally has no adverse effects. For your system, you might try cutoff 6 4 and check whether they can correctly describe the phase transition behavior. For some phase transitions, it may be sufficient.

Best,
Zihan

@Leo-678
Copy link
Author

Leo-678 commented Dec 12, 2024

Hi Leo,

For choosing the cutoff, it is best to keep it within half of the box size to avoid periodic image effects. If this is unacceptable, adding some supercell structures to the training set can help the model learn more diverse environments.

You can also test how the accuracy of the model changes with a smaller cutoff. If the RMSE is still acceptable, lowering the cutoff can speed up the simulation and generally has no adverse effects. For your system, you might try cutoff 6 4 and check whether they can correctly describe the phase transition behavior. For some phase transitions, it may be sufficient.

Best, Zihan

Dear Zihan

Thank you very much for your reply. After testing many parameters, I found that whether it can run successfully and its performance is indeed closely related to the training parameters. Finally, I would like to ask if there are any recommended active learning examples. On the official website, I only found related commands. If there are examples available, it would help me get started more quickly.

Best
Leo

@zhyan0603
Copy link
Contributor

Dear Leo,

Regarding your request for active learning examples, I would like to inform you that as of now, the official documentation does not yet include specific examples for active learning with NEP. However, we have assigned a dedicated team member to maintain and expand our examples, which will eventually include active learning examples.

In the meantime, I can suggest two approaches that might help you:

MD Simulations and Sampling:
Start by performing MD simulations with the current NEP model. Then, extract some structures from these simulation trajectories for DFT calculations and use to verify the reliability of the current model (set prediction 1 in your nep.in file). For the sampling method, you can use random sampling, uniform sampling, or descriptor-based farthest point sampling implemented in pynep. If the prediction error is very close to the training error, it indicates that the NEP model is able to handle the current simulation conditions. You can then increase the simulation time (from ps to ns), temperature, pressure, or other conditions of interest so that NEP can learn more complex local atomic environments. If the predictions are not satisfactory, add the sampled structures to the training set for further training. The nep.restart file allows you to refine the force field incrementally. Repeat this sampling and prediction cycle until the force field meets your goals, such as accurately predicting phase transitions.

By the way, moderately reducing the batch size (eg. batch 200) can speed up training without compromising too much on model accuracy.

On-the-fly active learning:
See activate command for more details.

In addtion, our team members are implementing improved sampling methods similar to those used in the MTP active learning strategy. This feature is expected to debut in GPUMD 4.0, which will provide better tools for evaluating whether a structure should be included in the training set.

We appreciate your understanding and look forward to offering more comprehensive support and resources in the near future.

Best,
Zihan

@Leo-678 Leo-678 closed this as completed Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants