-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression module for isolated atoms energies #45
Conversation
Looks good to me. I have tested it on openMLIP in my branch and it seems to be helpful. Related PR on openMLIP https://github.com/OpenDrugDiscovery/openMLIP/pull/149 @prtos waiting for your review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great PR, There are some structural choices that are worth discussing but we can merge it and continue improving the structure.
@@ -192,34 +222,67 @@ def _precompute_statistics(self, overwrite_local_cache: bool = False): | |||
def _compute_average_nb_atoms(self): | |||
self.__average_nb_atoms__ = np.mean(self.data["n_atoms"]) | |||
|
|||
def _set_linear_e0s(self): | |||
new_e0s = [np.zeros((max(self.numbers) + 1, 21)) for _ in range(len(self.__energy_methods__))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the second dimension is 21
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the regressed isolated energies don't change with the charges and the isolated atom energy factory returns a matrix which columns define the charge of a species, I just fill up fill the row from -10 to 10 to avoid an index error
openqdc/utils/io.py
Outdated
@@ -66,7 +66,7 @@ def push_remote(local_path, overwrite=True): | |||
""" | |||
remote_path = local_path.replace(get_local_cache(), get_remote_cache(write_access=overwrite)) | |||
gcp_filesys.mkdirs(os.path.dirname(remote_path), exist_ok=False) | |||
# print(f"Pushing {local_path} file to {remote_path}, ({gcp_filesys.exists(os.path.dirname(remote_path))})") | |||
print(f"Pushing {local_path} file to {remote_path}, ({gcp_filesys.exists(os.path.dirname(remote_path))})") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better use logger.info here no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to remove it. Thanks for flagging
openqdc/utils/constants.py
Outdated
@@ -8,7 +8,7 @@ | |||
|
|||
BOHR2ANG: Final[float] = 0.52917721092 | |||
|
|||
POSSIBLE_NORMALIZATION: Final[List[str]] = ["formation", "total", "inter"] | |||
POSSIBLE_NORMALIZATION: Final[List[str]] = ["formation", "total", "inter", "regression", "regression_inter"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This suggest that we should have a separate preprocessing class that take any dataset and return it with a normalized energy entry during iteration no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The POSSIBLE_NORMALIZATION
is a list that I included to sync with openMLIP as openMLIP is dependent from openQDC and doesn't affect the data in any way. It is just checked when you call the get_statistics
method.
Right now we are avoiding to modify the total_energy and the "normalization" is done on openMLIP and imo it is the most flexible way to handle this stuffs as other people can try to reconstruct the total_loss
Thanks for this great PR @FNTwin - this brings us a big step towards a robust implementation of the different normalizations in openMLIP. |
Checklist: