Kliff DNN torch trainer #185

ipcamit · 2024-07-05T05:14:49Z

Summary

Added tests and trainer for training generic dense neural networks using libdescriptor and pytorch (as opposed to lightning).

Batched descriptor evaluation
DNN model using TorchML driver

TODO (if any)

Need to add the capability to write the trained model as a valid DUNN driver model.

Checklist

Before a pull request can be merged, the following items must be checked:

Make sure your code is properly formatted. isort and black are used for this purpose. The simplest way is to use pre-commit. See instructions here.
Doc strings have been added in the Google docstring format on your code.
Type annotations are highly encouraged. Run mypy to type check your code.
Tests have been added for any new functionality or bug fixes.
All linting and tests pass.

Note that the CI system will run all the above checks. But it will be much more efficient if you already fix most errors prior to submitting the PR.

mjwen · 2024-07-06T01:41:06Z

kliff/trainer/torch_trainer_utils/dataloaders.py

@@ -82,33 +185,22 @@ def collate(self, batch: Any) -> dict:
        """
        # get fingerprint and consistent properties
        config_0, property_dict_0 = batch[0]
-        device = config_0.device
-        ptr = torch.tensor([0], dtype=torch.int64, device=device)
+        ptr = np.array([0], dtype=np.intc)


I think the collated data will be provided as the input to a torch NN model. Any reason why use np.array but torch.tensor for all variables?

Update: NVM, now I understand that the batched data are passed to the descriptor in def _descriptor_eval_batch(self, batch) -> torch.Tensor, which requires numpy array.

mjwen · 2024-07-06T01:43:41Z

kliff/trainer/torch_trainer.py

+        )
+        self.torchscript_file = None
+        self.train_dataloader = None
+        self.validation_dataloader = None


Do we want to add test_dataloader and test the model at the end of training?

Yes, I will add it uniformly across all trainers after finalizing this release. Idea is to have a base method test() that can run some tests on the test datasets. It will do simple energy and forces test but will not be limited to it, but could also leverage openkim tests if user requests.

mjwen · 2024-07-06T01:51:15Z

Looks great! Merged.

ipcamit added 4 commits June 29, 2024 22:29

Restart support for torch trainer

4e5d954

Added env archiving

e7b9290

Working TorchTrainer

f919d89

Added torch DNN trainer and tests

0025474

mjwen self-requested a review July 6, 2024 01:37

mjwen self-assigned this Jul 6, 2024

mjwen added enhancement feature labels Jul 6, 2024

mjwen approved these changes Jul 6, 2024

View reviewed changes

mjwen merged commit 697d54f into openkim:v1 Jul 6, 2024
1 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kliff DNN torch trainer #185

Kliff DNN torch trainer #185

ipcamit commented Jul 5, 2024

mjwen Jul 6, 2024

mjwen Jul 6, 2024

mjwen Jul 6, 2024

ipcamit Jul 6, 2024

mjwen commented Jul 6, 2024

Kliff DNN torch trainer #185

Kliff DNN torch trainer #185

Conversation

ipcamit commented Jul 5, 2024

Summary

TODO (if any)

Checklist

mjwen Jul 6, 2024

Choose a reason for hiding this comment

mjwen Jul 6, 2024

Choose a reason for hiding this comment

mjwen Jul 6, 2024

Choose a reason for hiding this comment

ipcamit Jul 6, 2024

Choose a reason for hiding this comment

mjwen commented Jul 6, 2024