-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prebuild dataset too large for RAM #88
Comments
Thanks for reaching out. The problem can be solved once the It may replace The core routine is already developed(https://github.com/MDIL-SNU/SevenNet/blob/main/sevenn/train/collate.py), and I'm left to do some extra jobs for leveraging it for training. |
Thank you for the quick reply will this also work for datasets where the number of structures is already too large for memory without the graphs? |
Hi @JonathanSchmidt1. Unfortunately, in the case where the number of structures is already too large, even without graph, the method I mentioned will also fail. Some smart OS try to use swap memory to handle the out-of-memory but it is not a good idea. To overcome this, we need a technically elegant method that uses databases, such as mysql, lmdb, sqlite, and so on. Good news is that ASE already has database interfaces for its I'm personally trying to leverage the ASE db to do exactly what you are trying to do, but it gonna take some time.. If you know of any other open-source MLIP package that is relevant to this topic, please let me know. It will help my development. |
I guess schnetpack would be an example for a package using ase db and https://github.com/IntelLabs/matsciml/tree/main uses ldmb which I generally prefer. I think Alignn should also have a branch using lmdb. |
Thanks! I'll look around those repos. By the way, could you tell me the reason that you prefer lmdb over the ase db? I don't have experience with lmdb but have some with ase db. |
Hi,
thank you for the great package. I am trying to pre-build the graphs for some larger datasets that do not fit into RAM is this already possible (and also the training afterwards)?
best,
Jonathan
The text was updated successfully, but these errors were encountered: