-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fundamental changes to KIM models, transforms, and parameter classes #140
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great!
I will need to take a second look at the parameter.py
to ensure the parameters are in the correct space (original/transformed) in various functions. Will do it in the second round after the current comments are addressed...
self.name = name | ||
|
||
def transform( | ||
self, model_param: Union["Parameter", np.ndarray] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I asked this before, but any reason we want to use
if TYPE_CHECKING:
from kliff.models.parameter import Parameter
and then use a string "Parameter" for annotation?
Given that Parameter
is a kliff class, would it be easier to remove the use of TYPE_CHECKING
and then directly use Parameter
for annotation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I do that then we will run into the issue of circular imports. As ParameterTransforms uses Parameter for type annotation, but Parameter will also use ParameterTransforms.
Example Line 52 in parameter.py
def __new__(
cls,
input_array: Union[np.ndarray, float, int],
name: str = None,
transform: "ParameterTransform" = None,
bounds: np.ndarray = None,
is_mutable: bool = False,
index: int = None,
opt_mask: np.ndarray = None,
):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
Adding
if TYPE_CHECKING:
from kliff.models.parameter import Parameter
Can still lead to circular import of TYPE_CHECKING
evaluates to True
. Given that the Parameter class is not used anywhere by for annotation. I beleive we need to remove the agove two lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per pep standards, TYPE_CHECKING
is always false during interpreter execution. So I dont think it will be an issue. Adding these to lines just help the static analyzers to get appropriate modules for type checking (they match the string annotations and use the appropriate classes for analyzing attributes etc). Can remove if needed. It will not affect the functionality in any way.
kliff/models/parameter.py
Outdated
from typing import Any, Dict, List, Optional, Sequence, TextIO, Tuple, Union | ||
from copy import deepcopy | ||
|
||
# from pathlib import Path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to delete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove what?
I removed Any, Dict, Optional, Sequence, TextIO, and commented out pathlib line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant remove the commented out from pathlib import Path
line.
kliff/models/model.py
Outdated
|
||
def _inverse_transform_params(self, model_params_transformed: Dict[str, Parameter]): | ||
def parameters(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love this a lot! To further mimic pytorch, can we rename this as named_parameters?
And we can add another method parameters()
to return an iterator/a list of parameter values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
Do you think something simple like this:
def parameters(self):
param_opt_list = []
for param_key in self.model_params:
if self.model_params[param_key].is_mutable:
param_opt_list.append(self.model_params[param_key])
return param_opt_list
or this?:
def parameters(self):
param_opt_list = []
for param_key in self.model_params:
if self.model_params[param_key].is_mutable:
yield self.model_params[param_key]
I think the TorchScript implementation is closer to 2, where as PyTorch is closer to 1.
2 will translate something like:
for param in model.parameter()
do something ...
I am fine either way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, pytorch goes with the iterator one, even for named_parameters
.
I guess the major reason is that for ML models, there can be may parameters and they use iterators to save memory. For us, I prefer your first solution.
kliff/models/kim.py
Outdated
settings = [['default', 0, 20], [2.0, 'fix'], [2.2, 'inf', 3.3]] | ||
instance.set_one(name, settings) | ||
""" | ||
self.opt_params.set_one(name, settings) | ||
# Keeping this API intact for now, no improvement comes to mind |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this function is a mess - hard to use and maintain. We will need to simplify it. Let's leave it for now and do it after the major stuff has been done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment for us. The reason why it is difficult to use and main is that it lumps a lot of stuff together - default values, bounds, fixed or not.
To simplify, we'd better split this into multiple functions.
I have addresses all the conversations. Once we have resolved the remaining ones, I can push a commit addressing the above. |
I have two more comments regarding docs and annotation.
It would be very help to add such an explanation to the docstring of the class. At least the meanings of
|
Ok. I am finishing all tests and required changes thoughout the project, but before I do here is what I am proposing: We define two spaces:
Below the the skeleton of changes I made. Please comment. now all needed functions wil have suffixes class Parameter(np.ndarray):
"""Parameter class for containing physics-based model parameters.
This class provides utilities for managing model parameters between the "model space"
and the "transformed space". See the glossary below for the definition of these spaces.
Modeled on `torch.nn.Parameters`, it inherits from `numpy.ndarray`. It is a numpy
array with additional attributes such as name, transform, etc. For maintaining compatibility,
use `get_numpy_array` (for getting a numpy array of parameters) and `get_opt_numpy_array`
(transformed numpy array but with only the optimizable values). `get_numpy_array` array
also helps in getting an instance of the Parameter to mutate safely, and pass it to
any impure function.
Glossary:
- Model space: The space in which the model expects the parameters to be in. Currently,
all models in OpenKIM expect the parameters to be in the affine cartesian space.
- Parameter space: Parameter space is the space in which you want to sample/optimize
yout parameters. Most often parameters are transformed using bijective transformations
of the model inputs for ease of optimization/sampling. For example, the log
transform is used in searching for the optimal parameters of sloppy model parameters.
There can be cases where transformation functions are not bijective, e.g. ceiling function for
mapping continuous parameters to discrete values. Parameter space is mostly
used for optimization, and not the model itself. If no transform_function is
provided then parameter space and model space are same.
All functions that needs input/output in the model space, will use the suffix
`_model_space` and `_param_space` for the transformed parameter space.
Attributes:
name : Name of the parameter.
transform : Instance of ``ParameterTransform`` object to be applied to the parameter.
index : Index of the parameter in the parameter vector. used for setting the parameter
in the KIMPY.
bounds : Bounds for the parameter, must be numpy array of shape n x 2, with [n,0] as
lower bound, and [n,1] as the upper bound. If None, no bounds are applied.
is_mutable : If True, the parameter is trainable, and will be passed to scipy optimizer.
If False, the parameter is not trainable, and kept constant.
opt_mask : A boolean array of the same shape as the parameter. If True, the
parameter is optimized. If False, the parameter is not optimized.
"""
def __new__(
cls,
input_array: Union[np.ndarray, float, int],
name: str = None,
transform_function: "ParameterTransform" = None,
bounds: np.ndarray = None,
is_mutable: bool = False,
index: int = None,
opt_mask: np.ndarray = None,
):
"""Initializes and returns a new instance of Parameter.
Args:
input_array: Input numpy array to initialize the parameter with.
name: Name of the parameter
transform_function: Instance of ``ParameterTransform`` object to be applied to the parameter.
bounds: n x 2 array of lower and upper bounds for the parameter. If None, no
bounds are applied
is_mutable: boolean if the parameter is mutable. This implies that either
the parameter will be passed to the scipy optimizer, or the parameter values will be edited.
index: Index of the parameter in the parameter vector. Used for setting the
parameter in the KIMPY.
opt_mask: Boolean array of the same shape as the parameter. The values
marked ``True`` are optimized, and ``False`` are not optimized.
Returns:
A new instance of Parameter.
"""
def transform(self):
"""Apply the transform to the parameter.
This method simple applies the function ~:kliff.transforms.ParameterTransform.__call__
to the parameter (or equivalently, ~:kliff.transforms.ParameterTransform.transform).
"""
def inverse_transform(self):
"""Apply the inverse transform to the parameter.
def copy_from_param_space(self, arr: np.ndarray):
"""Copy array to self in the parameter space.
Array can be a numpy array or a Parameter object.
This method assumes that the array is of the same type and shape as self,
compensated for opt_mask. If not, it will raise an error.
This method also assumes that the incoming array is in the same space, as the parameter
currently (i.e. "Parameter space", see glossary above for detail).
Args:
arr: Array to copy to self.
"""
def copy_from_model_space(self, arr: np.array):
"""Copy arr from model space.
Array can be a numpy array or a Parameter object. This method assumes that the
incoming array is in the model space and would need transformation to the parameter
space before copying. It is a safer method to use in most cases. If the parameter
is not transformed, it will transform it first for maintaining consistency.
Args:
arr: Array to copy to self.
"""
def get_numpy_array_model_space(self) -> np.ndarray:
"""Get a numpy array of parameters in the model space.
This method should be uses for getting the numpy array of parameters where the
``Parameters`` class might not work, for feeding values to the model.
Returns:
A numpy array of parameters in the original space.
"""
def get_numpy_array_param_space(self):
"""Applies the transform to the parameter, and returns the transformed array."""
def get_opt_numpy_array_param_space(self) -> np.ndarray:
"""Get a masked numpy array of parameters in the transformed space.
This method is similar to :get_numpy_array_param_space but additionally does apply the
opt_mask, and returns the array. This ensures the correctness of the array for
optimization/other applications. *This should be the de-facto method for getting
the numpy array of parameters.*
Returns:
A numpy array of parameters in the original space.
"""
def copy_at_param_space(self, index: Union[int, List[int]], arr: Union[int, float, np.ndarray]):
"""Copy values at a particular index or indices in parameter space.
This method directly copies the provided data, and does not perform any checks.
Args:
index: Index or indices to copy the values at.
arr: Array to copy to self.
"""
def add_transform(self, transform: "ParameterTransform"):
"""Save a transform object with the parameter.
Args:
transform: Instance of ``ParameterTransform`` object to be applied to the parameter.
"""
def add_bounds_model_space(self, bounds: np.ndarray):
"""Add bounds to the parameter.
Bounds should be supplied in the model space. The bounds will be transformed if
the transform_function is provided to the parameter.
Args:
bounds: numpy array of shape (n, 2)
"""
def add_bounds_param_space(self, bounds: np.ndarray):
"""Add bounds to the parameter.
Add bounds to the parameter in parameter space. It does not do any additional checks
or perform any transformations.
Args:
bounds: numpy array of shape (n, 2)
"""
def add_opt_mask(self, mask: np.ndarray):
"""Set mask for optimizing vector quantities.
It expects an input array of shape (n,), where n is the dimension of the vector
quantity to be optimized. This array must contain n booleans indicating which
properties to optimize.
Args:
mask: boolean array of same shape as the vector quantity to be optimized
"""
def get_bounds_param_space(self) -> List[Tuple[int, int]]:
"""Returns bounds array that is used by scipy optimizer.
Returns:
A list of tuples of the form (lower_bound, upper_bound)
"""
def get_bounds_model_space(self) -> np.ndarray:
"""Get the bounds in the original space.
Returns:
A numpy array of bounds in the original space.
"""
def has_bounds(self) -> bool:
"""Check if bounds are set for optimizing quantities
Returns:
True if bounds are set, False otherwise.
"""
def as_dict(self):
"""Return a dictionary containing the state of the object."""
def save(self, filename):
"""Save the parameter to disk."""
@classmethod
def from_dict(cls, state_dict):
"""Update the object's attributes based on the provided state dictionary.
Args:
state_dict (dict): The dictionary containing the state of the object.
This dictionary should include the "value" key.
"""
@classmethod
def load(cls, filename):
"""Load a parameter from disk.
TODO: non classmethod version
"""
def get_opt_param_name_value_and_indices(
self,
) -> Tuple[str, Union[float, np.ndarray], int]:
"""Get the name, value, and indices of the optimizable parameters.
Returns:
A tuple of lists of names, values, and indices of the optimizable parameters.
"""
@property
def lower_bound(self):
"""Get the lower bounds of the parameter.
Always returns values in parameter space.
Returns:
A numpy array of lower bounds.
"""
@property
def upper_bound(self):
"""Get the upper bounds of the parameter.
Always returns values in parameter space.
Returns:
A numpy array of upper bounds. |
This is much more clean! A couple of questions/comments:
|
I just meant simple real 3D euclidian space. I thought affine cartesian space would be more unambiguous. May should just write eucledian instead.
That sounds like a good idea, will give it a try. Just a minor comment, you meant
That was added to give ease of adding data in any order. Otherwise user have to know which space to copy data in, and it will become non-commutative operation with See also our previous discussions on similar issue regarding |
What does it mean for a parameter to be in 3D euclidean space? I would think it is just a vector space, and the dimension depends on the number of components. Anyway, probably we can simply use the
ah, right!
To clarify, I saw |
Let me gather my thoughts on that and get back to you. I just meant
distinguish between transformed space model space. By Euclidean or affine
space I meant that all the parameters have dimensions and values that are
combination of quantities in si or metal units etc. While transformed space
will be combination of them in some other space with dimension less
quantities ideally. Does that make sense?
Regarding `copy_at_model` I guess that was not there as it is not used
anywhere in the code and we always work in transformed space. But will
include it in next draft.
…On Wed, Dec 13, 2023, 23:16 Mingjian Wen ***@***.***> wrote:
I just meant simple real 3D euclidian space.
What does it mean for a parameter to be in 3D euclidean space? I would
think it is just a vector space, and the dimension depends on the number of
components. Anyway, probably we can simply use the model space and parameter
space, without these adjectives?
you meant np.any(self.opt_mask) right? For partially optimizable arrays.
ah, right!
For def copy_at_param_space, can we change the order of the arguments arr
and index? Also, do we need a def copy_at_model_space ?
To clarify, I saw copy_at_param_space included in your proposed new
Parameter class, but not copy_at_model_space. I was asking to add the
latter.
—
Reply to this email directly, view it on GitHub
<#140 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABTZ2DY4MMW5DTYIL5BGEX3YJKDUPAVCNFSM6AAAAABAGRVDV2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJVGE2DMNBRHE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Alright, clear now. |
…f-model-parameter-transform-v1 First round of comments
Added changes as per above discussion. Happy holidays! |
kliff/models/parameter.py
Outdated
from typing import Any, Dict, List, Optional, Sequence, TextIO, Tuple, Union | ||
from copy import deepcopy | ||
|
||
# from pathlib import Path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant remove the commented out from pathlib import Path
line.
self.name = name | ||
|
||
def transform( | ||
self, model_param: Union["Parameter", np.ndarray] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
Adding
if TYPE_CHECKING:
from kliff.models.parameter import Parameter
Can still lead to circular import of TYPE_CHECKING
evaluates to True
. Given that the Parameter class is not used anywhere by for annotation. I beleive we need to remove the agove two lines?
kliff/models/parameter.py
Outdated
return | ||
else: | ||
if self.transform is not None: | ||
for i in range(len(self)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update needed.
Thanks @ipcamit ! The explicit use of I don't think any of the comments are major - most of them are clarifying questions. Some of the comments (the early ones) are from my responses in the previous review. Please ignore them (I don't know why they appear). I did not take a look at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed all comments. Major changes:
KIMModel
-> Model
changes in parameter updates. Rest remain more or less the same. Need a couple of clarifications. Rest I do not think anything is left. Pushing the updated code after this comment.
3rd element: float or `INF` (case insensitive) | ||
Upper bound of the parameter. `INF` indicates that the upper bound is | ||
positive infinite, i.e. no upper bound is applied. | ||
def copy_at_param_space( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand this comment. copy_at_
just specifies that the incoming values will be copied in mask compensated fashion. The suffix tells the expected space of the incoming values. Could give an example of alternative name?
obj.index = index | ||
obj._is_transformed = False | ||
obj.bounds = bounds | ||
if opt_mask is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Done.
- No. That would mean all parameters will be optimizable by default. Not just all values of a parameter. Therefore all parameters are non-optimizable by default, unless explicitly stated. In original KLIFF all parameters that are not-optimizable by default, and unless you mark them using set_opt_param etc, they cannot be optimized
self.name = name | ||
|
||
def transform( | ||
self, model_param: Union["Parameter", np.ndarray] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per pep standards, TYPE_CHECKING
is always false during interpreter execution. So I dont think it will be an issue. Adding these to lines just help the static analyzers to get appropriate modules for type checking (they match the string annotations and use the appropriate classes for analyzing attributes etc). Can remove if needed. It will not affect the functionality in any way.
Pushed latest changes as per comments. Please ignore the |
@ipcamit I've left two or three small comments; can you please address these? Also, can you remove the Then, I think it is ready to be merged. |
Removed, trainer paths, and fixed |
I saw the tests are failing; these are the UQ tests. Are we planning to deal with UQ stuff later, and don't worry about it now? If that is the case, I think we can merge it now. I think a lot of UQ stuff needs to be rewritten. So ok for me to ignore the UQ tests. |
Yes. UQ I would deal with later. I have made it work earlier, but with changes in function names, members, and certain functionalities, I would need to redo it. |
This particular set of changes is the most vats and intrusive one, so I think easier will be for you to break it down in smaller incomplete commits and review them before merging in one big PR.
I have split the total changes in 3 categories
2, and 3 will be submitted post review of 1.
Changes in this commit
OptimizingParameters
, now parameters haveis_mutable
flag which indicates if this parameter can be changed during optimization.transformations
module is introduced, which will also hold coordinate and property transformations in future.__call__
method, which can directly acceptConfiguration
object and returns a dictionary of energy forces and stresses.Parameters
will transform parameters now.transforms
: New module for collecting all transformations. Currently it only contains ParameterTransformsmodels
to reflect the newer API.Let me know of suggestions. Once we have no actionable item on this, I will submit the part 2 of this PR.