Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input pipeline overhaul #180

Closed
Tetracarbonylnickel opened this issue Oct 6, 2023 · 2 comments
Closed

Input pipeline overhaul #180

Tetracarbonylnickel opened this issue Oct 6, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@Tetracarbonylnickel
Copy link
Contributor

Current Issue: Inputs, labels, and neighbous are currently loaded into RAM, which imposes limitations on the maximum size of datasets that can be processed.

Proposed Solution: To address this issue, we plan to precompute and store neighbors as HDF5 files alongside the dataset. Additionally, we will save the values of max_r and min_r along with the neighbors. In the input pipeline, two TFDatasets will be generated—one from the atoms file (moste formats possible exclusively .traj) and another from the precomputed neighbours. These datasets will then be merged.

Advantages of the Proposed Solution: This approach offers several advantages. Firstly, it eliminates the need to load both the dataset and its neighbors into RAM, thereby mitigating memory constraints. Secondly, if the same dataset is used for multiple training sessions with the same max_r and min_r values, the precomputing step can be skipped, resulting in a more efficient workflow.

@Tetracarbonylnickel Tetracarbonylnickel added the enhancement New feature or request label Oct 6, 2023
@M-R-Schaefer
Copy link
Contributor

#248

@M-R-Schaefer
Copy link
Contributor

#281

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants