-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The exchange probability gap is too large when using GPU for HREX #1177
Comments
Hi. This is known. My understanding is that energy calculation on the GPU is not reproducible, I guess because of differences in the way energy terms are added. On a large system, even a tiny relative difference could translate to a different acceptance. Empirically, I never identified problems due to this. Formally, I am not aware of any justification. A couple of handwaving consideration. First, the acceptances obtained by the Metropolis formula are sufficient to sample the correct distribution, but not necessary. For instance:
Strictly speaking, you should check that the ration between the "GPU acceptances" and the "CPU acceptances" (=1.0) is independent of the coordinates of the system. I don't know how to do it and if it is possible. Second, I suspect that any of such "GPU errors" will also be present when you integrate the equations of motion. So, my feeling is that even if you introduce some small errors in the exchange procedure it will be anyway negligible. I am not sure how convincing these arguments are. |
Thank you very much for your response. However, I'm sorry, I didn't quite understand. Isn't the exchange probability supposed to reach a certain level, for example, expected to be between 30%-40%, to consider the hrex successful?
|
Sorry I just noted that these average acceptances are computed after 200 attempts. So I would not expect them to be so different from each other. In addition, all replicas are identical except possibly for the initial coordinates right? Can you please report:
Then ideally could you plot the histogram of acceptance for each pair of replicas? It would be useful to know if problems are present at all attempts or if there are bimodal distribution. Finally, for each pair, also the time series of the acceptance could be useful (if it's not too messy) Thanks!! |
Thank you very much for your response. However, I'm sorry, I couldn't understand your point. I ran another REST2 simulation with 12 replicas from 310K to 510K, and the exchange probability was also low:
Additionally, here is the md.mdp file at the end. Below is an image I plotted showing the replica traversal, where the y-axis indicates which position the replica is in. And here is md,mdp. |
Dear plumed users:
This is my configuration:
When performing Hamiltonian Replica Exchange (HREX), I set the scaling for all replicas to be 1.0, so theoretically, the exchange probabilities should all be 1.0.
Indeed, when using CPU, the exchange probabilities are all 1.0, but when using GPU acceleration, the exchange probabilities vary significantly,
Replica exchange statistics
Is this normal? I recall that there might be insufficient computational precision on GPUs which can lead to significant errors, but would such large discrepancies in exchange probabilities affect the final results? If not, why?
The command I used is as follows:
nohup mpirun --use-hwthread-cpus -np 12 gmx_mpi mdrun -v -deffnm rest -nb gpu -pin on -ntomp 1 -replex 1000 -hrex -multidir rest0 rest1 rest2 rest3 rest4 rest5 rest6 rest7 rest8 rest9 rest10 rest11 -dlb no -plumed plumed.dat > nohup.out 2>&1 &
I am looking forward to your replies with great anticipation.
The text was updated successfully, but these errors were encountered: