Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated loss and eval metrics #896

Merged
merged 16 commits into from
Oct 30, 2024
Merged

Updated loss and eval metrics #896

merged 16 commits into from
Oct 30, 2024

Conversation

wood-b
Copy link
Collaborator

@wood-b wood-b commented Oct 25, 2024

This PR built off of #821

Loss Updates:

  • Adds loss functions to the registry
  • Updates how the loss is actually computed — loss should now be equivalent to not using our DDP wrapper i.e. if you compute the mae/mse loss with torch module you should get the same answer.
  • Adds per atom mae loss
  • Changes the name of L2MAELoss -> P2NormLoss L2NormLoss. l2mae is still part of registry for backwards compatibility with configs.
  • Removes the AtomwiseL2Loss

Evaluator Updates:

  • Adds new per-atom and other metrics
  • Adds metrics dicts function to the evaluator

Tests:

  • A bunch of new unit tests for losses including DDP tests. One note here is that if pytorch ever changes how the gradient averaging is done this won't be captured.
  • Comparing first batch E/F loss pre/post changes
  • Compare training of OC20 2M pre/post changes (see below)

Energy, forces, and loss all look comparable between eqv2-main (current main) and eqv2-new-loss-test (this PR) trained for 3 epochs on OC20 2M

Screenshot 2024-10-27 at 7 58 07 PM
Screenshot 2024-10-27 at 7 58 23 PM

@wood-b wood-b requested review from lbluque and misko October 25, 2024 19:23
@wood-b wood-b added minor Minor version release enhancement New feature or request labels Oct 25, 2024
Copy link

codecov bot commented Oct 26, 2024

Codecov Report

Attention: Patch coverage is 97.14286% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/fairchem/core/modules/evaluator.py 93.54% 2 Missing ⚠️
src/fairchem/core/modules/loss.py 98.14% 1 Missing ⚠️
Files with missing lines Coverage Δ
src/fairchem/core/common/registry.py 73.22% <100.00%> (+2.53%) ⬆️
src/fairchem/core/common/utils.py 68.26% <ø> (+0.02%) ⬆️
src/fairchem/core/trainers/base_trainer.py 87.75% <100.00%> (+0.16%) ⬆️
src/fairchem/core/trainers/ocp_trainer.py 69.66% <100.00%> (+0.30%) ⬆️
src/fairchem/core/modules/loss.py 95.77% <98.14%> (+29.70%) ⬆️
src/fairchem/core/modules/evaluator.py 91.80% <93.54%> (-1.66%) ⬇️

@wood-b wood-b requested a review from rayg1234 October 28, 2024 17:47
lbluque
lbluque previously approved these changes Oct 29, 2024
Copy link
Collaborator

@lbluque lbluque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wood-b ! Just left minor comments.

I double checked training with this, and it looks as expected, energy is slightly better but likely from the stress now being weighted slightly different.
image

src/fairchem/core/modules/loss.py Show resolved Hide resolved
src/fairchem/core/modules/loss.py Outdated Show resolved Hide resolved


class DDPLoss(nn.Module):
"""
This class is a wrapper around a loss function that does a few things
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for adding this comment, very helpful

rayg1234
rayg1234 previously approved these changes Oct 29, 2024
@wood-b wood-b dismissed stale reviews from rayg1234 and lbluque via 3a109e6 October 29, 2024 17:32
@wood-b wood-b added this pull request to the merge queue Oct 30, 2024
Merged via the queue into main with commit fbec2d3 Oct 30, 2024
8 checks passed
@wood-b wood-b deleted the loss_and_eval branch October 30, 2024 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request minor Minor version release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants