Input pipeline overhaul #180

Tetracarbonylnickel · 2023-10-06T14:46:42Z

Current Issue: Inputs, labels, and neighbous are currently loaded into RAM, which imposes limitations on the maximum size of datasets that can be processed.

Proposed Solution: To address this issue, we plan to precompute and store neighbors as HDF5 files alongside the dataset. Additionally, we will save the values of max_r and min_r along with the neighbors. In the input pipeline, two TFDatasets will be generated—one from the atoms file (moste formats possible exclusively .traj) and another from the precomputed neighbours. These datasets will then be merged.

Advantages of the Proposed Solution: This approach offers several advantages. Firstly, it eliminates the need to load both the dataset and its neighbors into RAM, thereby mitigating memory constraints. Secondly, if the same dataset is used for multiple training sessions with the same max_r and min_r values, the precomputing step can be skipped, resulting in a more efficient workflow.

M-R-Schaefer · 2024-04-02T13:00:27Z

#248

M-R-Schaefer · 2024-06-09T11:09:27Z

#281

Tetracarbonylnickel added the enhancement New feature or request label Oct 6, 2023

M-R-Schaefer closed this as completed Jun 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Input pipeline overhaul #180

Input pipeline overhaul #180

Tetracarbonylnickel commented Oct 6, 2023

M-R-Schaefer commented Apr 2, 2024

M-R-Schaefer commented Jun 9, 2024

Input pipeline overhaul #180

Input pipeline overhaul #180

Comments

Tetracarbonylnickel commented Oct 6, 2023

M-R-Schaefer commented Apr 2, 2024

M-R-Schaefer commented Jun 9, 2024