Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable use of CVs defined by PyTorch neural network models #570

Merged
merged 109 commits into from
Nov 14, 2024

Conversation

zwpku
Copy link
Member

@zwpku zwpku commented Aug 28, 2023

This branch implements a class called torchANN, which allows to define cv components by loading pretrained PyTorch neural network models.

Installation Steps

  1. Download LibTorch. This package is required in order to enable the torchann class. First, download the code and unzip it.

         wget https://download.pytorch.org/libtorch/nightly/cpu/libtorch-cxx11-abi-shared-with-deps-latest.zip
         unzip libtorch-cxx11-abi-shared-with-deps-latest.zip
    

    In this way, the library is uncompressed under the current directory. Let's say it is located at /path/to/libtorch.

  2. Patch MD engine. This step is done as usual using the script update-colvars-code.sh. Enter the source code of Colvars package, and run:

         ./update-colvars-code.sh /path/to/md-engine        
    
  3. Compilation. This step depends on the engine to be compiled.

    • NAMD: add "--with-colvars-torch --torch-prefix path/to/libtorch" to the argument of ./config

      Assume packages that are required to build NAMD, e.g. charm, tcl/tcl-threaded, are already prepared.
      Then, one can compile the NAMD package with the following commands:

        ./config Linux-x86_64-g++ --charm-arch multicore-linux-x86_64 --with-colvars-torch    \
              --torch-prefix /path/to/libtorch  --with-fftw3 --fftw-prefix /path/to/fftw
        cd Linux-x86_64-g++
        make 
    
    • GROMACS: add "-DTorch_DIR=/path/to/libtorch/share/cmake/Torch" when running cmake

    An example of the command is:

        cmake .. -DCMAKE_INSTALL_PREFIX=/home/username/local/gromacs  \
                        -DFFTWF_LIBRARY=/home/username/mambaforge/lib/libfftw3f.so  \
                        -DFFTWF_INCLUDE_DIR=/home/username/mambaforge/include \
                        -DTorch_DIR=/path/to/libtorch/share/cmake/Torch/  \
                        -DCMAKE_CXX_COMPILER=/usr/bin/mpicxx \
                        -DOpenMP_gomp_LIBRARY=/home/username/mambaforge/lib/libgomp.so
    
    • LAMMPS: only installation by cmake is supported. In the directory of LAMMPS source code, run
         mkdir build && cd build
         cmake ../cmake -D PKG_COLVARS=yes -D COLVARS_TORCH=yes 
    

    and set the variable Torch_DIR in the file CMakeCache.txt. When a cpu version of libtorch library is used, it may
    also be necessary to set MKL path to empty:

         MKL_INCLUDE_DIR:PATH=
    

    Alternatively, one could combine these steps in one command:

         cmake ../cmake -D PKG_COLVARS=yes -D COLVARS_TORCH=yes      \ 
             -D  Torch_DIR=/path/to/libtorch/share/cmake/Torch -D MKL_INCLUDE_DIR=
    

    After that, run make and make install to compile and install the package.

The class has only been tested using simple neural network models (i.e. an autoencoder on alanine dipeptide), under NAMD and GROMACS engines. Feedbacks are welcome!

A (trivial) example

  1. Create a PyTorch model
import torch

class MyModel(torch.nn.Module):
    def __init__(self):
        super().__init__()
    def forward(self, x):
        return x

model = MyModel()
scripted_cv_filename = f'./identity.pt'
torch.jit.script(model).save(scripted_cv_filename)

This Python script simply creates a model which is an identity map and save it to a file named identity.pt.

  1. Define the COLVARS config file

This file defines two CVs using torchann class taking other cv components (here dihedral angles) as inputs.

colvarsTrajFrequency    10000
colvarsRestartFrequency 10000

colvar {
  name nn_0
  lowerBoundary -180.0
  upperBoundary 180
  width 5.0
  extendedLagrangian on
  extendedFluctuation 5.0
  extendedTimeConstant 200

  torchann {
    modelFile identity.pt
    m_output_index 0
    period 360

    dihedral {
      group1 { 
	atomnumbers 5
      }
      group2 { 
	atomnumbers 7
      }
      group3 { 
	atomnumbers 9
      }
      group4 { 
	atomnumbers 15
      }
    }

    dihedral {
      group1 { 
	atomnumbers 7
      }
      group2 { 
	atomnumbers 9
      }
      group3 { 
	atomnumbers 15
      }
      group4 { 
	atomnumbers 17
      }
    }

  }
}

colvar {
  name nn_1
  lowerBoundary -180.0
  upperBoundary 180
  width 5.0
  extendedLagrangian on
  extendedFluctuation 5.0
  extendedTimeConstant 200

  torchann {
    modelFile identity.pt
    m_output_index 1
    period 360

    dihedral {
      group1 { 
	atomnumbers 5
      }
      group2 { 
	atomnumbers 7
      }
      group3 { 
	atomnumbers 9
      }
      group4 { 
	atomnumbers 15
      }
    }

    dihedral {
      group1 { 
	atomnumbers 7
      }
      group2 { 
	atomnumbers 9
      }
      group3 { 
	atomnumbers 15
      }
      group4 { 
	atomnumbers 17
      }
    }
  }
}

abf {
  colvars nn_0 nn_1
  fullSamples	200
}

@giacomofiorin
Copy link
Member

The latest GROMACS test error is unrelated to Colvars: https://gitlab.com/gromacs/gromacs/-/issues/5204

@giacomofiorin
Copy link
Member

Hi there! GROMACS 2025 runs without errors the torchann input from this branch, but there are differences from the reference files generated by @zwpku and updated by @jhenin.

See the outputs here:
https://github.com/Colvars/colvars/actions/runs/11747796165/artifacts/2164700677

@zwpku
Copy link
Member Author

zwpku commented Nov 12, 2024

Hi there! GROMACS 2025 runs without errors the torchann input from this branch, but there are differences from the reference files generated by @zwpku and updated by @jhenin.

See the outputs here: https://github.com/Colvars/colvars/actions/runs/11747796165/artifacts/2164700677

@giacomofiorin thanks for the work!
Any hint on what the reason might be? I don't have a good understanding on the regression test... Could it be that the files under gromacs/tests/library/000_torchann/AutoDiff are outdated?

@giacomofiorin
Copy link
Member

Hi there! GROMACS 2025 runs without errors the torchann input from this branch, but there are differences from the reference files generated by @zwpku and updated by @jhenin.
See the outputs here: https://github.com/Colvars/colvars/actions/runs/11747796165/artifacts/2164700677

@giacomofiorin thanks for the work! Any hint on what the reason might be? I don't have a good understanding on the regression test... Could it be that the files under gromacs/tests/library/000_torchann/AutoDiff are outdated?

@zwpku Yes, the reference files currently in that folder were produced came from another build, with a different version of libTorch. Would you expect this kind of difference? It is small, but it did exceed our threshold (1.0e-6 relative error).

@zwpku
Copy link
Member Author

zwpku commented Nov 12, 2024

Hi there! GROMACS 2025 runs without errors the torchann input from this branch, but there are differences from the reference files generated by @zwpku and updated by @jhenin.
See the outputs here: https://github.com/Colvars/colvars/actions/runs/11747796165/artifacts/2164700677

@giacomofiorin thanks for the work! Any hint on what the reason might be? I don't have a good understanding on the regression test... Could it be that the files under gromacs/tests/library/000_torchann/AutoDiff are outdated?

@zwpku Yes, the reference files currently in that folder were produced came from another build, with a different version of libTorch. Would you expect this kind of difference? It is small, but it did exceed our threshold (1.0e-6 relative error).

@giacomofiorin If I see correctly, the torch model in that test is simply the identity map and the CV is a dihedral angle. So I expect there should be little difference due to different versions of libTorch. Could it also be caused by some changes in the source code or in the config files of that test? I can try to build and examine the test on my local machine.

@giacomofiorin
Copy link
Member

Hi there! GROMACS 2025 runs without errors the torchann input from this branch, but there are differences from the reference files generated by @zwpku and updated by @jhenin.
See the outputs here: https://github.com/Colvars/colvars/actions/runs/11747796165/artifacts/2164700677

@giacomofiorin thanks for the work! Any hint on what the reason might be? I don't have a good understanding on the regression test... Could it be that the files under gromacs/tests/library/000_torchann/AutoDiff are outdated?

@zwpku Yes, the reference files currently in that folder were produced came from another build, with a different version of libTorch. Would you expect this kind of difference? It is small, but it did exceed our threshold (1.0e-6 relative error).

@giacomofiorin If I see correctly, the torch model in that test is simply the identity map and the CV is a dihedral angle. So I expect there should be little difference due to different versions of libTorch. Could it also be caused by some changes in the source code or in the config files of that test? I can try to build and examine the test on my local machine.

@zwpku Yes, if you could please check as soon as possible that would be very helpful! We may have a chance to convince the GROMACS people to include this in the 2025 release, but timing is very tight.

Updated reference files accordingly.
@giacomofiorin
Copy link
Member

Still getting deviations from the reference files that you just uploaded in the CI tests. It is probably due to differences in libTorch versions. The one in the container is 2.4:

# Download pre-built libTorch
rm -fr /opt/torch
curl -o /tmp/libtorch.zip https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-2.4.1%2Bcpu.zip
unzip /tmp/libtorch.zip -d /opt

Is the .pt file sensitive to the version? If so, I strongly recommend that this is re-generated using a matching same version of libTorch?

@zwpku
Copy link
Member Author

zwpku commented Nov 14, 2024

Still getting deviations from the reference files that you just uploaded in the CI tests. It is probably due to differences in libTorch versions. The one in the container is 2.4:

# Download pre-built libTorch
rm -fr /opt/torch
curl -o /tmp/libtorch.zip https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-2.4.1%2Bcpu.zip
unzip /tmp/libtorch.zip -d /opt

Is the .pt file sensitive to the version? If so, I strongly recommend that this is re-generated using a matching same version of libTorch?

@giacomofiorin I guess the original deviation you encountered was (partially) due to random seed. The seed was fixed in gromacs/tests/library/Common/test.mdp in a previous commit by @jhenin, but somehow it was reverted again, possibly due to a merge with master.
The deviation from yesterday was due to the reference files I uploaded, which were generated on my local machine using Gromacs in single precision (I realized that Gromacs is compiled with double precision when running the CI tests). After changing to double precision and regenerating the reference files, it passes the test.

I tried libtorch 2.0.1, 2.3.0, and 2.4.1, and the results are the same. The .pt files generated using PyTorch 1.13.1 and 2.3.1 are different (in size). But when loaded in gromacs/colvars, the results (input/output/gradient) are the same.

@zwpku
Copy link
Member Author

zwpku commented Nov 14, 2024

@giacomofiorin besides, I saw some differences (e.g. in colvar.cpp, colvarmodule.cpp, colvarmodule_refs.h) when I ran
git diff master..torchann -- src. They seem to be unrelated to torchann class. Should these files be updated?

@giacomofiorin
Copy link
Member

@giacomofiorin besides, I saw some differences (e.g. in colvar.cpp, colvarmodule.cpp, colvarmodule_refs.h) when I ran git diff master..torchann -- src. They seem to be unrelated to torchann class. Should these files be updated?

Yes. I just did that.

If the tests pass (thank you for addressing the GROMACS precision issue!) we can proceed to merge this PR into master. Given the many conflicts accumulated in this branch, we should use a squash merge but link the PR in the commit.

@giacomofiorin giacomofiorin merged commit f718e65 into master Nov 14, 2024
15 checks passed
@giacomofiorin giacomofiorin deleted the torchann branch November 14, 2024 16:02
@giacomofiorin
Copy link
Member

PR merged! Thanks so much @zwpku and everyone who helped getting this PR done!

@giacomofiorin
Copy link
Member

FYI Lukas Mullender from the GROMACS team raised a couple of comments on the code regarding the use of GPU models and precision:
https://gitlab.com/gromacs/gromacs/-/merge_requests/4780#note_2213094685

acmnpv pushed a commit to gromacs/gromacs that referenced this pull request Nov 18, 2024
This MR includes small fixes and improvements to the copy of the Colvars library in `src/external`, as well as one feature (the `torchANN` collective variable type).

The Torch-Colvars interface was previously not included in !4611 because of failures in the Colvars CI runners. We have since confirmed with the main author of the feature that the culprit was the precision of the GROMACS build and he confirmed that the numerical results are consistent across libTorch versions (see Colvars/colvars#570 (comment)).

Matches [this commit in the Colvars repo](Colvars/colvars@3023d8e).

CC @HubLot @jhenin
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants