-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ewald calculator is extremely slow due to for
loop
#80
Comments
What is the maximum number of atoms in the system you used for testing? |
1024 |
Actually, I didn't scroll all the way down, there are some cases, surprisingly on the small side, where PME edges out a win. I'll run bigger cases soon. For posterity, here's the benchmark script output for this change (on a L40s) (obviously, i disabled torchscript) results:
1024_light_cpu:
E: 0.028006192687826115
PME: 0.05425682137502008
1024_light_cuda:
E: 0.0021106005006004125
PME: 0.004031811624372494
1024_tight_cpu:
E: 0.24040848081131116
PME: 0.3489146663760039
1024_tight_cuda:
E: 0.003573990061340737
PME: 0.006905024374646018
128_light_cpu:
E: 0.002079558124023606
PME: 0.046015231499040965
128_light_cuda:
E: 0.001985459062780137
PME: 0.004574535623760312
128_tight_cpu:
E: 0.014728610187376034
PME: 0.33456345718695957
128_tight_cuda:
E: 0.0019395997496758355
PME: 0.0066749854377121665
system:
cpu: x86_64
gpu: NVIDIA L40S
node: kl003
platform: Linux-5.14.0-70.30.1.el9_0.x86_64-x86_64-with-glibc2.34
timestamp: '2024-10-15T15:24:30.801431'
version:
torch: 2.4.1+cu121
torch-pme-commit: not found
torch-pme-status: not found versus results:
1024_light_cpu:
E: 0.5565416831886978
E_MT: 0.5542255340005795
PME: 0.05269402262638323
PME_MT: 0.058186623187793884
1024_light_cuda:
E: 0.21703302312380401
E_MT: 0.21828696675038373
PME: 0.0032142497493623523
PME_MT: 0.003692456311910064
1024_tight_cpu:
E: 15.574867601000733
E_MT: 11.684244794687402
PME: 0.32599988918809686
PME_MT: 0.39711523550067795
1024_tight_cuda:
E: 0.5313887392512697
E_MT: 0.531285271186789
PME: 0.006680479064016254
PME_MT: 0.006621209626246127
128_light_cpu:
E: 0.022671775999697275
E_MT: 0.02297365800041007
PME: 0.04760915656152065
PME_MT: 0.048393513749033445
128_light_cuda:
E: 0.028179472687043017
E_MT: 0.02838066218646418
PME: 0.003127433312329231
PME_MT: 0.003241197062379797
128_tight_cpu:
E: 0.0762400030016579
E_MT: 0.07706811800017022
PME: 0.3605894702486694
PME_MT: 0.36393232043701573
128_tight_cuda:
E: 0.027397436311730416
E_MT: 0.027304522625854588
PME: 0.005997346625008504
PME_MT: 0.006475141062765033
system:
cpu: x86_64
gpu: NVIDIA L40S
node: kl001
platform: Linux-5.14.0-70.30.1.el9_0.x86_64-x86_64-with-glibc2.34
timestamp: '2024-10-15T15:08:35.999047'
version:
torch: 2.4.1+cu121
torch-pme-commit: not found
torch-pme-status: not found |
For the script, see #81 |
Are the numbers the total time (per frame) or time per atom (per frame)? I would have guessed the former based on the script, but just because for PME, the required cost seems to be the same for N=128 and N=1024. Do we have such a huge overhead? |
@kvhuguenin Let's move this discussion to #81 :) |
Here's up to ~11k atoms
... and the old version, until i got bored with waiting
|
Can you try this
This should compute the same stuff without vmap and should be torchscriptable |
10% faster, couldn't resist
|
This doesn't work,
|
this passes all the tests! ... and is faster than
|
@kvhuguenin can you please check that @ceriottm 's modification does the right thing? I'm not so sure I really understand the charge reshaping stuff. If we're sure, I'll make a PR to fix this. |
Heya, I noticed an errant for loop in the
Ewald
calculator. Here:The fix is simple, simply replace the above with
This gives a speedup of about 10x to 1000x and passes all the correctness tests. In fact, Ewald outperforms PME in all my tests (EDIT: not quite) with this fix.
Unfortunately, it doesn't work with TorchScript:
Anyone want to get a 1000x speedup to their name?
The text was updated successfully, but these errors were encountered: