Unified OCP Trainer #520

mshuaibii · 2023-07-07T23:19:14Z

Currently, the ocp repo is limited to two hard-coded trainers: energy and forces. To provide more flexibility to the codebase, this PR consolidates the repo to handle any arbitrary targets someone may be interested on training on. For an initial release, we aim to support properties up to rank 2 tensors, with higher order properties possibly supported in the future.

Tracking desired changes and improvements:

Test Plan

Multi-gpu tests to ensure DP deprecation went smoothly. Results are compiled in https://docs.google.com/spreadsheets/d/1NbonjL7pwC0kZDojpgLSwn9u6G4p9atU8OylslApisw/edit?usp=sharing with corresponding wandb links.

OC20

Training
- S2EF-All
  - GemNet-OC
  - EqV2
- IS2RE-All
  - D++
Validation: Ensure consistent metrics on S2EF-Val-ID-30k
- GemNet-OC
- EqV2
Predictions: Compare MAE between two repo states on S2EF-Val-ID-30k
- GemNet-OC
- EqV2
Relaxations: Compare MAE of relaxed energies of two repo states on 100 batches of IS2RE-Val-ID
- GemNet-OC
- EqV2

ocpmodels/common/utils.py

ocpmodels/datasets/oc22_lmdb_dataset.py

…s (e.g. EqV2)

addressed

* initial single trainer commit * more general evaluator * backwards tasks * debug config * predict support, evaluator cleanup * cleanup, remove hpo * loss bugfix, cleanup hpo * backwards compatability for old configs * backwards breaking fix * eval fix * remove old imports * default for get task metrics * rebase cleanup * config refactor support * black * reorganize free_atoms * output config fix * config naming * support loss mean over all dimensions * config backwards support * equiformer can now run * add example equiformer config * handle arbitrary torch loss fns * correct primary metric def * update s2ef portion of OCP tutorial * add type annotations * cleanup * Type annotations * Abstract out _get_timestamp * don't double ids when saving prediction results * clip_grad_norm should be float * model compatibility * evaluator test fix * lint * remove old models * pass calculator test * remove DP, cleanup * remove comments * eqv2 support * odac energy trainer merge fix * is2re support * cleanup * config cleanup * oc22 support * introduce collater to handle otf_graph arg * organize methods * include parent in targets * shape flexibility * cleanup debug lines * cleanup * normalizer bugfix for new configs * calculator normalization fix, backwards support for ckpt loads * New weight_decay config -- defaults in BaseModel, extendable by others (e.g. EqV2) * Doc update * Throw a warning instead of a hard error for optim.weight_decay * EqV2 readme update * Config update * don't need transform on inference lmdbs with no ground truth * remove debug configs * ocp-2.0 example.yml * take out ocpdataparallel from fit.py * linter * update tutorials --------- Co-authored-by: Janice Lan <[email protected]> Co-authored-by: Richard Barnes <[email protected]> Co-authored-by: Abhishek Das <[email protected]>

mshuaibii added the dont-close label Jul 13, 2023

mshuaibii and others added 12 commits July 18, 2023 10:47

initial single trainer commit

9599f42

more general evaluator

68afdeb

backwards tasks

3c62f4a

debug config

569375c

predict support, evaluator cleanup

2e284cc

cleanup, remove hpo

ba97e97

loss bugfix, cleanup hpo

8af0f90

backwards compatability for old configs

d452675

backwards breaking fix

adba02c

eval fix

8bac184

remove old imports

4961bb1

default for get task metrics

99eb482

mshuaibii force-pushed the ocp_trainer branch from c9f3980 to 99eb482 Compare July 18, 2023 18:45

rebase cleanup

a269544

emsunshine mentioned this pull request Jul 17, 2023

Generalized AtomsToData class #516

Closed

mshuaibii and others added 10 commits July 19, 2023 11:55

config refactor support

448c567

Merge branch 'main' into ocp_trainer

12ec31f

black

15fdc56

reorganize free_atoms

c47111f

output config fix

eacd66b

config naming

024bc86

support loss mean over all dimensions

5f47f8a

config backwards support

0a7d815

equiformer can now run

73fba56

add example equiformer config

efd956d

This was referenced Jul 27, 2023

Config trainer is overwritten to "forces" or "energy" #543

Closed

main.py issues/improvements for mass inference with an ase-db with a checkpoint #539

Open

mshuaibii added 2 commits July 27, 2023 14:26

handle arbitrary torch loss fns

4477f90

correct primary metric def

0bd8935

mshuaibii requested review from janiceblue and anuroopsriram November 7, 2023 23:34

mshuaibii added 2 commits November 7, 2023 16:12

cleanup debug lines

cc6c6c2

cleanup

d2bdc6e

anuroopsriram reviewed Nov 10, 2023

View reviewed changes

ocpmodels/common/utils.py Show resolved Hide resolved

ocpmodels/datasets/oc22_lmdb_dataset.py Show resolved Hide resolved

mshuaibii and others added 11 commits November 14, 2023 15:51

normalizer bugfix for new configs

9984ae7

calculator normalization fix, backwards support for ckpt loads

d278b6e

New weight_decay config -- defaults in BaseModel, extendable by other…

caf611f

…s (e.g. EqV2)

Doc update

e7e2282

Throw a warning instead of a hard error for optim.weight_decay

af06723

EqV2 readme update

ccda09f

Config update

e11dba6

don't need transform on inference lmdbs with no ground truth

9f86d2e

Merge branch 'main' into ocp_trainer

54d606e

remove debug configs

e8c1c6f

ocp-2.0 example.yml

d3d7e1c

abhshkdz previously approved these changes Jan 4, 2024

View reviewed changes

take out ocpdataparallel from fit.py

ddac40a

janiceblue dismissed abhshkdz’s stale review via ddac40a January 4, 2024 23:32

janiceblue and others added 2 commits January 5, 2024 00:09

linter

3ab12b4

update tutorials

bc7b5cf

abhshkdz approved these changes Jan 5, 2024

View reviewed changes

mshuaibii removed the request for review from janiceblue January 5, 2024 04:51

mshuaibii added this pull request to the merge queue Jan 5, 2024

Merged via the queue into main with commit 1382a35 Jan 5, 2024
5 checks passed

lbluque mentioned this pull request Jan 26, 2024

Ase dataset updates #622

Merged

janiceblue mentioned this pull request Apr 12, 2024

[BE] Update all configs to use ocp2.0 format #653

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unified OCP Trainer #520

Unified OCP Trainer #520

mshuaibii commented Jul 7, 2023 •

edited

Loading

Unified OCP Trainer #520

Unified OCP Trainer #520

Conversation

mshuaibii commented Jul 7, 2023 • edited Loading

Test Plan

mshuaibii commented Jul 7, 2023 •

edited

Loading