-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calling the vectorized versions from Julia #42
Comments
Hi Mikkel, vals = hfmm3d(eps,zk,sources;charges=charges,nd=100) If sources was 3 x n, then the routine would expect charges to be an nd x n array (and throw an error otherwise). Let me know if that works! The vectorized call is untested. I am not sure what to expect for the performance gains. These may be platform dependent. Would be interested to know how it goes. Cheers, |
With regards to performance gain, we see a factor of 2+ for Helmholtz, and around a factor of 4 for Laplace. However, there is no benefit in cranking up nd. This effect stops pretty much after 16-32 densities. The larger you make nd, the more memory the code would need. So I would recommend calling it in batches of 16-32, whatever the memory on your system allows. Regards, |
Hi again, Thanks for the quick answers! @askhamwhat Ahh perfect, I tried looking into the code but I did not realize that calling @mrachh Sounds about right. For a very crude implementation I got 3-8 speedup (on a M1 mac) for my specific application. However, not everything here can directly be contributed to the vectorized call as the application includes additional computations which can be stacked/improved when calling a vectorized Cheers, |
Hey there,
I am trying to solve a problem with multiple right-hand sides. In short this means that I need to evaluate the hfmm3d for many different setups of charge strengths (or dipole strengths). I see that there is a vectorized version, however it seems that there is no interface for the function in Julia. Before I go and implement my own I want to know if there is a specific reason why there is no interface (other than it takes time to implement) as well as if there is any computational speed to gain using the vectorized version (currently, I have around 100 rhs, but this could be increased in the future).
Cheers,
Mikkel
The text was updated successfully, but these errors were encountered: