Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unified OCP Trainer #520

Merged
merged 67 commits into from
Jan 5, 2024
Merged

Unified OCP Trainer #520

merged 67 commits into from
Jan 5, 2024

Conversation

mshuaibii
Copy link
Collaborator

@mshuaibii mshuaibii commented Jul 7, 2023

Currently, the ocp repo is limited to two hard-coded trainers: energy and forces. To provide more flexibility to the codebase, this PR consolidates the repo to handle any arbitrary targets someone may be interested on training on. For an initial release, we aim to support properties up to rank 2 tensors, with higher order properties possibly supported in the future.

Tracking desired changes and improvements:

  • Trainer
    • compute_loss()
    • compute_metrics()
    • validate()
    • Evaluator() refactor
    • predict()
    • save()
  • Config backward compatibility support
  • Handle arbitrary loss functions
  • Remove DP implementation, retaining only DDP.

Test Plan

Multi-gpu tests to ensure DP deprecation went smoothly. Results are compiled in https://docs.google.com/spreadsheets/d/1NbonjL7pwC0kZDojpgLSwn9u6G4p9atU8OylslApisw/edit?usp=sharing with corresponding wandb links.

OC20

  • Training
    • S2EF-All
      • GemNet-OC
      • EqV2
    • IS2RE-All
      • D++
  • Validation: Ensure consistent metrics on S2EF-Val-ID-30k
    • GemNet-OC
    • EqV2
  • Predictions: Compare MAE between two repo states on S2EF-Val-ID-30k
    • GemNet-OC
    • EqV2
  • Relaxations: Compare MAE of relaxed energies of two repo states on 100 batches of IS2RE-Val-ID
    • GemNet-OC
    • EqV2

abhshkdz
abhshkdz previously approved these changes Jan 4, 2024
@mshuaibii mshuaibii removed the request for review from janiceblue January 5, 2024 04:51
@mshuaibii mshuaibii added this pull request to the merge queue Jan 5, 2024
Merged via the queue into main with commit 1382a35 Jan 5, 2024
5 checks passed
@lbluque lbluque mentioned this pull request Jan 26, 2024
levineds pushed a commit that referenced this pull request Jul 11, 2024
* initial single trainer commit

* more general evaluator

* backwards tasks

* debug config

* predict support, evaluator cleanup

* cleanup, remove hpo

* loss bugfix, cleanup hpo

* backwards compatability for old configs

* backwards breaking fix

* eval fix

* remove old imports

* default for get task metrics

* rebase cleanup

* config refactor support

* black

* reorganize free_atoms

* output config fix

* config naming

* support loss mean over all dimensions

* config backwards support

* equiformer can now run

* add example equiformer config

* handle arbitrary torch loss fns

* correct primary metric def

* update s2ef portion of OCP tutorial

* add type annotations

* cleanup

* Type annotations

* Abstract out _get_timestamp

* don't double ids when saving prediction results

* clip_grad_norm should be float

* model compatibility

* evaluator test fix

* lint

* remove old models

* pass calculator test

* remove DP, cleanup

* remove comments

* eqv2 support

* odac energy trainer merge fix

* is2re support

* cleanup

* config cleanup

* oc22 support

* introduce collater to handle otf_graph arg

* organize methods

* include parent in targets

* shape flexibility

* cleanup debug lines

* cleanup

* normalizer bugfix for new configs

* calculator normalization fix, backwards support for ckpt loads

* New weight_decay config -- defaults in BaseModel, extendable by others (e.g. EqV2)

* Doc update

* Throw a warning instead of a hard error for optim.weight_decay

* EqV2 readme update

* Config update

* don't need transform on inference lmdbs with no ground truth

* remove debug configs

* ocp-2.0 example.yml

* take out ocpdataparallel from fit.py

* linter

* update tutorials

---------

Co-authored-by: Janice Lan <[email protected]>
Co-authored-by: Richard Barnes <[email protected]>
Co-authored-by: Abhishek Das <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants