-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interface to metatensor to use arbitrary machine learning models as collective variables #1082
Conversation
Thanks @Luthaf !!! A few comments below:
Then let's wait for the other checks that are running now. Thanks again! |
src/metatensor/metatensor.cpp
Outdated
//+ENDPLUMEDOC | ||
|
||
/*INDENT-OFF*/ | ||
#if !defined(__PLUMED_HAS_LIBTORCH) || !defined(__PLUMED_HAS_METATENSOR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Luthaf this triggers a bug in our script that does not detect the __PLUMED_HAS_METATENSOR
if it's in the same line as another __PLUMED_HAS_*
string, could you please split the line? Maybe just reversing the two strings will stop triggering an error:
#if !defined(__PLUMED_HAS_METATENSOR) || !defined(__PLUMED_HAS_LIBTORCH)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can try reversing
/*INDENT-OFF*/ | ||
#if !defined(__PLUMED_HAS_LIBTORCH) || !defined(__PLUMED_HAS_METATENSOR) | ||
|
||
namespace PLMD { namespace metatensor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Luthaf I think that you should place the namespace on a separate line for our script to detect it.
Notice that you can run very quickly the check on your computer with cd src ; make plumedcheck
. It just needs gawk to be installed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure!
src/metatensor/vesin.h
Outdated
#ifndef VESIN_H | ||
#define VESIN_H | ||
|
||
#include <stddef.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Luthaf do you really need a C compatible header here? If possible I would remote the C case and replace these with includes from the C++ std library (cstddef
and cstdint
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be a C compatible header since the library only exposes a C API.
As discussed above (in the "Unresolved questions/issues" section), this code is included as a library and not developed specially for plumed. Ideally I'd like to exclude it from the code checks in plumed, is this possible? I suggested three potential solutions in the initial message, is there one you prefer?
If nothing else works I can change the code here, but any future update to the library (for bug fixes or better performance, …) will require re-doing the same set of changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing to this, I forgot this note in the original pull request.
Notice that you can exclude the codecheck adding metatensor/
similarly to other external packages here
However, we cannot accept the vesin
namespace as is, because it adds symbols that are not in the PLMD
namespace. This could create clashes if e.g. you combine PLUMED with another code using another version of the vesin
library. If you look at other "external" tools (e.g. molfile, blas, lapack, xdfile, etc) they all provide symbols hidden in the PLMD
namespace.
Given this, I would suggest:
- writing a script that fixes namespace when importing the vesin library. You might want to get inspiration from
src/*/import.sh
scripts. - if possible, write such script so that all the issues with codecheck are fixed. if not, we can disable codecheck as explained above
- run the
./header
script onmetatensor
and check if it breaks something
This would be very close to what we do with other external packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had another idea that should handle the symbol clash as well: basically move the code to external/vesin-single-build.cpp
and then add a file neighbors.cpp
which looks like
namespace PLMD {
namespace metatensor {
#include "./external/vesin.h"
#include "./external/vesin-single-build.cpp"
}}
I can then add the metatensor/external
folder to the excluded folder list, and this seems to pass all checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be the only source code file located in a subdirectory so I would avoid it if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I finally had time to come back to this. The code is now using an import.sh
file to import code from vesin
It is expected to be a matrix of zeros. We only use the box to apply PBCs. But certainly it is possible that some code incorrectly passes a box even when there are no PBCs applied. In any case, within PLUMED it is correct to assume that having non zero numbers here imply PBCs.
Yes (in C order). So box[2][0] is the first component of the third vector. |
This seems to be mostly working, we now need to test it and quick the tires!
It has an extremely hard time handling C++17 code, including using lambda inside function calls (in torch::from_blob) and using initialization lists constructors
It segfaults in `keywords.exists("NO_ACTION_LOG")`
They will live in https://github.com/lab-cosmo/plumed-metatensor-tests for now
dd85ec8
to
3ba18d7
Compare
@Luthaf sorry for not merging this yet. Could you please confirm that I can do it? Thanks!!! |
If you are happy with the code, please do merge! I'll send further PR to improve it/update dependencies as needed. |
Thanks for your contribution! |
And thanks for your review! |
Description
This PR add an interface using metatensor atomistic models to execute arbitrary machine learning models, with the intended use case of using these models as collective variables in PLUMED.
The core idea of metatensor is to facilitate data sharing between different machine learning ecosystems by annotating data with metadata which describe what exactly is being passed around. This is done through the
TensorMap
class, which contains one of moreTensorBlock
, each block containing the actual values, andLabels
describing the rows and columns of the values.On top of this data format, we defined an interface for exporting trained ML models, and then loading and executing such models from a variety of simulation engines. The initial objective was to share models that would compute the energy and forces of a system and use these as ML force fields, but the same API can also be used for arbitrary quantities, including collective variables. These models are based on PyTorch for three reasons:
There is some documentation on the workflow to load and execute a metatensor model here for reference: https://docs.metatensor.org/latest/atomistic/overview.html#data-flow-between-the-model-and-engine. The main bits of data a model needs from PLUMED are the types for all particles (for atoms, this can be the atomic number), the positions of all particles, the simulation cell/box and (potentially many) neighbor lists for given cutoffs.
Difference with the existing PyTorch module
Both this module and the existing PyTorch module in PLUMED are based on PyTorch (for the same reasons!), the main difference as far as I know is that the PyTorch module is intended to act as transformation on top of CV computed by other PLUMED actions; which the metatensor module is intended to start with positions/atom types/... and compute a new CV from scratch. This enables using metatensor to compute some representation of the system (such as SOAP) and send it to PLUMED for further processing.
Unresolved questions/issues
The main question remaining for me is what to do regarding the neighbor lists calculation. I currently vendor code from https://github.com/Luthaf/vesin in the metatensor module, and it works pretty well. However, the code is just copy-pasted from another repository, and might need to be updated in the future; but does not 100% conform to the code style of the plumed repository. I can see a couple of solutions here, ordered from favorite to least favorite:
codecheck
orheaders.sh
. Is there a mechanism to mark some files/directories as "external" code that should be ignored by such linters?configure
. This would make the process of updating the code a lot easier (just update the archive), however I don't know if this will help with removing the code formastyle
/codecheck
/... unless these scripts respect.gitignore
.What do you think? Any other idea on what to do regarding "external" code in the PLUMED repository?
The other questions I have concerns the unit cell storage as returned by
this->getPbc().getBox();
:Type of contribution
Copyright
I agree to transfer the copyright of the code I have written to the PLUMED developers or to the author of the code I am modifying.
the module I added or modified contains a
COPYRIGHT
file with the correct license information. Code should be released under an open source license. I also used the commandcd src && ./header.sh mymodulename
in order to make sure the headers of the module are correct.Tests
- As discussed with @gtribello, the regtest run in a separate repository (currently cloning plumed from the branch of this PR, I'll update to pull from the main repo once the code is merged): https://github.com/lab-cosmo/plumed-metatensor-tests