Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling the vectorized versions from Julia #42

Open
mipals opened this issue Sep 19, 2023 · 3 comments
Open

Calling the vectorized versions from Julia #42

mipals opened this issue Sep 19, 2023 · 3 comments

Comments

@mipals
Copy link

mipals commented Sep 19, 2023

Hey there,

I am trying to solve a problem with multiple right-hand sides. In short this means that I need to evaluate the hfmm3d for many different setups of charge strengths (or dipole strengths). I see that there is a vectorized version, however it seems that there is no interface for the function in Julia. Before I go and implement my own I want to know if there is a specific reason why there is no interface (other than it takes time to implement) as well as if there is any computational speed to gain using the vectorized version (currently, I have around 100 rhs, but this could be increased in the future).

Cheers,
Mikkel

@askhamwhat
Copy link
Collaborator

Hi Mikkel,
I think this may be available with the current wrapper. It is possible to pass a keyword argument to hfmm3d which lets the routine know that you are doing a vectorized call. So something like:

  vals = hfmm3d(eps,zk,sources;charges=charges,nd=100)

If sources was 3 x n, then the routine would expect charges to be an nd x n array (and throw an error otherwise).

Let me know if that works! The vectorized call is untested.

I am not sure what to expect for the performance gains. These may be platform dependent. Would be interested to know how it goes.

Cheers,
Travis

@mrachh
Copy link
Member

mrachh commented Sep 20, 2023

With regards to performance gain, we see a factor of 2+ for Helmholtz, and around a factor of 4 for Laplace. However, there is no benefit in cranking up nd. This effect stops pretty much after 16-32 densities. The larger you make nd, the more memory the code would need. So I would recommend calling it in batches of 16-32, whatever the memory on your system allows.

Regards,
Manas

@mipals
Copy link
Author

mipals commented Sep 20, 2023

Hi again,

Thanks for the quick answers!

@askhamwhat Ahh perfect, I tried looking into the code but I did not realize that calling hfmm3d_ also included the vectorized version (the documentation mentioned something of a _vec which i could not find anywhere). From the documentation it looks as the charges needs to be interleaved in one long vector, but I guess given Julias column-major it is the same as a nd x n matrix. Additionally, the dipoles needs to be interleaved component-wise and not dipole-wise (something that I missed the first time around).
After having tried it out there is one thing that annoys me slightly. If you give only one set of charges the function returns a vector (n x 1) but with multiple charges it returns a nd x n matrix. I guess this has to do with the underlying Fortran implementation. Thankfully this is something a simple transpose can fix 😄

@mrachh Sounds about right. For a very crude implementation I got 3-8 speedup (on a M1 mac) for my specific application. However, not everything here can directly be contributed to the vectorized call as the application includes additional computations which can be stacked/improved when calling a vectorized hfmm3d.

Cheers,
Mikkel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants