Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unified OCP Trainer #520

Merged
merged 67 commits into from
Jan 5, 2024
Merged
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
9599f42
initial single trainer commit
mshuaibii Jul 7, 2023
68afdeb
more general evaluator
mshuaibii Jul 11, 2023
3c62f4a
backwards tasks
mshuaibii Jul 11, 2023
569375c
debug config
mshuaibii Jul 11, 2023
2e284cc
predict support, evaluator cleanup
mshuaibii Jul 12, 2023
ba97e97
cleanup, remove hpo
mshuaibii Jul 12, 2023
8af0f90
loss bugfix, cleanup hpo
mshuaibii Jul 13, 2023
d452675
backwards compatability for old configs
mshuaibii Jul 13, 2023
adba02c
backwards breaking fix
mshuaibii Jul 14, 2023
8bac184
eval fix
mshuaibii Jul 14, 2023
4961bb1
remove old imports
janiceblue Jul 17, 2023
99eb482
default for get task metrics
janiceblue Jul 18, 2023
a269544
rebase cleanup
mshuaibii Jul 18, 2023
448c567
config refactor support
mshuaibii Jul 19, 2023
12ec31f
Merge branch 'main' into ocp_trainer
mshuaibii Jul 19, 2023
15fdc56
black
mshuaibii Jul 19, 2023
c47111f
reorganize free_atoms
mshuaibii Jul 20, 2023
eacd66b
output config fix
mshuaibii Jul 20, 2023
024bc86
config naming
mshuaibii Jul 20, 2023
5f47f8a
support loss mean over all dimensions
janiceblue Jul 21, 2023
0a7d815
config backwards support
mshuaibii Jul 21, 2023
73fba56
equiformer can now run
janiceblue Jul 25, 2023
efd956d
add example equiformer config
janiceblue Jul 26, 2023
4477f90
handle arbitrary torch loss fns
mshuaibii Jul 27, 2023
0bd8935
correct primary metric def
mshuaibii Aug 1, 2023
ac13093
update s2ef portion of OCP tutorial
mshuaibii Aug 1, 2023
929c2fb
add type annotations
mshuaibii Aug 9, 2023
f7b76ec
cleanup
mshuaibii Aug 9, 2023
55e71b3
Type annotations
r-barnes Aug 9, 2023
4b5e2a0
Abstract out _get_timestamp
r-barnes Aug 9, 2023
32ef93c
don't double ids when saving prediction results
janiceblue Aug 31, 2023
18f77dc
clip_grad_norm should be float
janiceblue Sep 7, 2023
49076b5
Merge branch 'main' into ocp_trainer
mshuaibii Oct 27, 2023
c1d06aa
model compatibility
mshuaibii Oct 27, 2023
7fa3870
evaluator test fix
mshuaibii Oct 27, 2023
4371bfa
lint
mshuaibii Oct 27, 2023
1abf998
remove old models
mshuaibii Oct 27, 2023
8395a3a
pass calculator test
mshuaibii Nov 2, 2023
a49bb4a
remove DP, cleanup
mshuaibii Nov 3, 2023
1f5a6be
remove comments
mshuaibii Nov 3, 2023
72a90d7
eqv2 support
mshuaibii Nov 3, 2023
396c1e7
Merge branch 'main' into ocp_trainer
mshuaibii Nov 3, 2023
2a82f56
odac energy trainer merge fix
mshuaibii Nov 3, 2023
843fbbd
is2re support
mshuaibii Nov 6, 2023
4566c23
cleanup
mshuaibii Nov 6, 2023
92336ec
config cleanup
mshuaibii Nov 6, 2023
371ad84
oc22 support
mshuaibii Nov 7, 2023
de2a6ad
introduce collater to handle otf_graph arg
mshuaibii Nov 7, 2023
5df5120
organize methods
mshuaibii Nov 7, 2023
2f793a8
include parent in targets
mshuaibii Nov 7, 2023
26179df
shape flexibility
mshuaibii Nov 7, 2023
cc6c6c2
cleanup debug lines
mshuaibii Nov 8, 2023
d2bdc6e
cleanup
mshuaibii Nov 8, 2023
9984ae7
normalizer bugfix for new configs
mshuaibii Nov 14, 2023
d278b6e
calculator normalization fix, backwards support for ckpt loads
mshuaibii Nov 17, 2023
caf611f
New weight_decay config -- defaults in BaseModel, extendable by other…
abhshkdz Dec 11, 2023
e7e2282
Doc update
abhshkdz Dec 11, 2023
af06723
Throw a warning instead of a hard error for optim.weight_decay
abhshkdz Dec 11, 2023
ccda09f
EqV2 readme update
abhshkdz Dec 11, 2023
e11dba6
Config update
abhshkdz Dec 11, 2023
9f86d2e
don't need transform on inference lmdbs with no ground truth
janiceblue Dec 20, 2023
54d606e
Merge branch 'main' into ocp_trainer
mshuaibii Jan 4, 2024
e8c1c6f
remove debug configs
mshuaibii Jan 4, 2024
d3d7e1c
ocp-2.0 example.yml
mshuaibii Jan 4, 2024
ddac40a
take out ocpdataparallel from fit.py
janiceblue Jan 4, 2024
3ab12b4
linter
janiceblue Jan 5, 2024
bc7b5cf
update tutorials
mshuaibii Jan 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ repos:
rev: 22.3.0
hooks:
- id: black
language_version: python3.8
additional_dependencies: ['click==8.0.4']
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v2.3.0
Expand Down
129 changes: 129 additions & 0 deletions configs/goc_oc20_debug.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
trainer: ocp
mshuaibii marked this conversation as resolved.
Show resolved Hide resolved

dataset:
train:
format: lmdb
src: /datasets01/open_catalyst/oc20/082422/struct_to_energy_forces/train/2M
key_mapping:
y: energy
force: forces
transforms:
normalizer:
energy:
mean: -0.7554450631141663
stdev: 2.887317180633545
forces:
mean: 0
stdev: 2.887317180633545
val:
src: /datasets01/open_catalyst/oc20/082422/struct_to_energy_forces/val/id_30k
test:
src: /datasets01/open_catalyst/oc20/082422/struct_to_energy_forces/val/id_30k

logger: tensorboard

loss_functions:
- energy:
fn: mae
coefficient: 1
- forces:
fn: l2mae
coefficient: 100

evaluation_metrics:
metrics:
energy:
- mae
- mse
- energy_within_threshold
forces:
- mae
- cosine_similarity
misc:
- energy_forces_within_threshold
primary_metric: forces_mae

outputs:
energy:
shape: 1
level: system
forces:
shape: 3
level: atom
train_on_free_atoms: True
eval_on_free_atoms: True

model:
name: gemnet_oc
num_spherical: 7
num_radial: 128
num_blocks: 4
emb_size_atom: 256
emb_size_edge: 512
emb_size_trip_in: 64
emb_size_trip_out: 64
emb_size_quad_in: 32
emb_size_quad_out: 32
emb_size_aint_in: 64
emb_size_aint_out: 64
emb_size_rbf: 16
emb_size_cbf: 16
emb_size_sbf: 32
num_before_skip: 2
num_after_skip: 2
num_concat: 1
num_atom: 3
num_output_afteratom: 3
cutoff: 12.0
cutoff_qint: 12.0
cutoff_aeaint: 12.0
cutoff_aint: 12.0
max_neighbors: 30
max_neighbors_qint: 8
max_neighbors_aeaint: 20
max_neighbors_aint: 1000
rbf:
name: gaussian
envelope:
name: polynomial
exponent: 5
cbf:
name: spherical_harmonics
sbf:
name: legendre_outer
extensive: True
output_init: HeOrthogonal
activation: silu
scale_file: configs/s2ef/all/gemnet/scaling_factors/gemnet-oc.pt

regress_forces: True
direct_forces: True
forces_coupled: False

quad_interaction: True
atom_edge_interaction: True
edge_atom_interaction: True
atom_interaction: True

num_atom_emb_layers: 2
num_global_out_layers: 2
qint_tags: [1, 2]
otf_graph: True

optim:
batch_size: 4
eval_batch_size: 4
load_balancing: atoms
eval_every: 5000
num_workers: 2
lr_initial: 5.e-4
optimizer: AdamW
optimizer_params: {"amsgrad": True}
scheduler: ReduceLROnPlateau
mode: min
factor: 0.8
patience: 3
max_epochs: 80
ema_decay: 0.999
clip_grad_norm: 10
weight_decay: 0
162 changes: 162 additions & 0 deletions configs/goc_stress_debug.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
trainer: ocp

dataset:
train:
format: lmdb
src: /checkpoint/saro00/mpf_datasets/s2efs/0/train.lmdb
key_mapping:
y: energy
force: forces
stress: stress
transforms:
decompose_tensor:
tensor: stress
rank: 2
decomposition:
isotropic_stress:
irrep_dim: 0
anisotropic_stress:
irrep_dim: 2
normalizer:
energy:
mean: -5.9749126
stdev: 1.866159
forces:
mean: 0
stdev: 1.866159
isotropic_stress:
mean: 43.27065
stdev: 674.1657344451734
anisotropic_stress:
stdev: 143.72764771869745
val:
src: /checkpoint/saro00/mpf_datasets/s2efs/0/val.lmdb
test:
src: /checkpoint/saro00/mpf_datasets/s2efs/0/val.lmdb

logger: tensorboard

loss_functions:
- energy:
fn: mae
coefficient: 1
- forces:
fn: l2mae
coefficient: 100
- isotropic_stress:
fn: mae
- anisotropic_stress:
fn: mae

evaluation_metrics:
metrics:
energy:
- mae
- mse
- energy_within_threshold
forces:
- mae
- cosine_similarity
isotropic_stress:
- mae
anisotropic_stress:
- mae
stress:
- stress_mae_from_decomposition
misc:
- energy_forces_within_threshold
primary_metric: forces_mae

outputs:
energy:
shape: 1
level: system
forces:
shape: 3
level: atom
train_on_free_atoms: True
eval_on_free_atoms: True

stress:
level: system
decomposition:
isotropic_stress:
irrep_dim: 0
anisotropic_stress:
irrep_dim: 2

model:
name: gemnet_oc
num_spherical: 7
num_radial: 128
num_blocks: 4
emb_size_atom: 256
emb_size_edge: 512
emb_size_trip_in: 64
emb_size_trip_out: 64
emb_size_quad_in: 32
emb_size_quad_out: 32
emb_size_aint_in: 64
emb_size_aint_out: 64
emb_size_rbf: 16
emb_size_cbf: 16
emb_size_sbf: 32
num_before_skip: 2
num_after_skip: 2
num_concat: 1
num_atom: 3
num_output_afteratom: 3
cutoff: 12.0
cutoff_qint: 12.0
cutoff_aeaint: 12.0
cutoff_aint: 12.0
max_neighbors: 30
max_neighbors_qint: 8
max_neighbors_aeaint: 20
max_neighbors_aint: 1000
rbf:
name: gaussian
envelope:
name: polynomial
exponent: 5
cbf:
name: spherical_harmonics
sbf:
name: legendre_outer
extensive: True
output_init: HeOrthogonal
activation: silu
scale_file: configs/s2ef/all/gemnet/scaling_factors/gemnet-oc.pt

regress_forces: True
direct_forces: True
forces_coupled: False

quad_interaction: True
atom_edge_interaction: True
edge_atom_interaction: True
atom_interaction: True

num_elements: 100
num_atom_emb_layers: 2
num_global_out_layers: 2
qint_tags: [1, 2]
otf_graph: True

optim:
batch_size: 4
eval_batch_size: 4
load_balancing: atoms
eval_every: 5000
num_workers: 2
lr_initial: 5.e-4
optimizer: AdamW
optimizer_params: {"amsgrad": True}
scheduler: ReduceLROnPlateau
mode: min
factor: 0.8
patience: 3
max_epochs: 80
ema_decay: 0.999
clip_grad_norm: 10
weight_decay: 0
Loading