-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about how to use CUDA graph #151
Comments
That should be all you need to do. CUDA graphs won't always help performance. It depends on whether the overhead from launching kernels is a significant bottleneck for your model. It has the most benefit for calculations that involve running a lot of very short kernels. If you use Nsight Compute to profile your code, I think there are ways to tell whether it's using graphs or not. |
Thank you for the reply. I did some tests with a simple example, and I think in my case CUDA graph was not used at all because I used a workaround to create the force instead of using it directly for Hamiltonian REMD to work(see #147). Specifically I did
If I just do
and run a regular MD, it was twice faster with CUDA graph. But when I use the above workaround, it's not faster at all. Any idea why my workaround is a problem? Thank you, |
I can't think of any reason that wrapping it in a CustomCVForce would affect this. Aside from CUDA graphs, how much does wrapping it affect the speed? CustomCVForce does add overhead and require extra synchronization. |
A less expensive workaround for #147 is to also define the same global parameter in another force. For example, you could use an empty CustomBondForce with no bonds. force = CustomBondForce('0')
force.addGlobalParameter('myparam', 0)
system.addForce(force) The parameter should have the same name and default value as in the TorchForce. You're making a second force that uses the same parameter so openmmtools will be able to identify it. |
I was only testing on a toy force and wrapping it with CustomCVForce didn't affect the speed much. I am not sure how much it affects the speed for the actual ML force field yet. I tried your other workaround, but it's giving me another error:
although I have set the default parameters to be the same through
Any idea what is going wrong? The full test files are also attached. |
I think this is openmmtools confusing it. You also use it to specify a different default value: param_a = GlobalParameterState.GlobalParameter('param_a', standard_value=1.0)
param_b = GlobalParameterState.GlobalParameter('param_b', standard_value=1.0) As far as I can tell from the code, I think that causes it to loop over all the forces it has identified as having a global parameter with that name and call |
I changed those two lines to the following but still got the same error...
|
I tried serializing the System, and I found openmmtools had changed the default values for the two parameters to 1 and 4: <Force energy="0" forceGroup="0" name="CustomBondForce" type="CustomBondForce" usesPeriodic="0" version="3">
<PerBondParameters/>
<GlobalParameters>
<Parameter default="1" name="param_a"/>
<Parameter default="4" name="param_b"/>
</GlobalParameters>
<EnergyParameterDerivatives/>
<Bonds/>
</Force> I think it's because those are the first values in your schedule: lambda_schedule_a = np.array([1, 2, 3])
lambda_schedule_b = np.array([4, 5, 6]) If I change it to use those values both for the TorchForce and for the GlobalParameter objects, then it runs successfully. |
Thank you so much, and you are right that this workaround is less expensive than wrapping with CustomCVForce (~0.85 the cost for this toy example on my desktop). Would you mind also testing turning |
I am pretty sure CUDA graph is not used in this workaround either. If I change my torchForce to contain some offending operations (torch.inverse in this example), it is not erroring out.
Whereas if I run a regular MD, it gives the following error:
|
I think I see the problem. CustomCVForce uses |
The fix is in #152. Can you try it out and see if it fixes the problem for you? |
Thank you so much for the fix! Sorry for replying late - I was on vacation last week. I am trying to test out your solution, but I am having trouble compiling the package from source (I assume conda install will not incorporate your fix?). Specifically I am getting the following error:
I think I have librt already installed:
Any idea how to get around this error? Thank you! |
I don't think librt has any connection to cudart. Do you have the CUDA toolkit installed? See http://docs.openmm.org/latest/userguide/library/02_compiling.html#cuda-or-opencl-support. |
I started over and I don't see that error anymore, but I saw another error close to the end of the build.
I installed cudatoolkit, libtorch and pytorch through conda
I don't see Is conda installing cudatoolkit/libtorch a problem? Would you mind sharing some details on how to install libtorch from the downloaded zip file from the official website? |
That isn't going to work correctly. Packages in conda-forge tend to be compiled differently than in other channels. It can't be mixed with other channels. It has its own builds of both PyTorch and the CUDA libraries, so you shouldn't need to mix. |
Thank you, I was able to compile it by only using the conda-forge channel. And I think it is using CUDA graph correctly now. I see a slight speed up and it is also erroring out for the Thank you so much for the fix! |
Great, thanks! |
Hi,
I would like to use the CUDA graph option to accelerate my simulation, and I was wondering is there anything I need to do when converting the model to torchscript and saving to a file in order to use CUDA graph?
I tried simply adding
force.setProperty("useCUDAGraphs", "true")
to the simulation script but did not see any performance improvement compared to without CUDA graph. Is there any way I can investigate whether it is indeed using CUDA graph?Thank you,
Xiaowei
The text was updated successfully, but these errors were encountered: