Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization for many calls #2

Open
j-proc opened this issue Mar 12, 2019 · 12 comments
Open

Optimization for many calls #2

j-proc opened this issue Mar 12, 2019 · 12 comments
Labels
enhancement New feature or request

Comments

@j-proc
Copy link

j-proc commented Mar 12, 2019

Pythia mostly contains calls to freud, so it does not seem that there would be substantial benefits to cythonizing it. However, there are places where freud objects are regenerated rather than reused and it seems that many of the append and concatenate's could be replaced by direct numpy allocation and assignment.

This would require replacing many functions with classes that perform the same feat but store the freud objects and perhaps intelligently decide when to regenerate (based on box changes?).

Interested in other opinions on what else could speed it up or what API people would like to see used.

@j-proc j-proc added the enhancement New feature or request label Mar 12, 2019
@csadorf
Copy link

csadorf commented Mar 12, 2019

I think cythonizing Pythia might be optimizing at the wrong level. We use freud as the "engine" for the exact reason to speed up the nearest-neighbor calculations etc. Pythia -- as I interpret it -- just combines and reformulates all of these data as specific descriptors.

I believe that cythonizing will make it harder to contribute and maintain this package so we should be careful where exactly we optimize. I'd rather adjust the freud API to make it easier to use freud efficiently if needed.

@j-proc
Copy link
Author

j-proc commented Mar 12, 2019

@csadorf That's actually what I was suggesting. Leave all the code optimization in cython to freud. But more efficiently use the calls and reuse objects. I haven't looked at it enough, but a brief look showed a lot of object creation inside loops. I'm not sure exactly how inefficient this is and if there's any form of magical caching that happens automatically with cython which would make this not a negative.

I'll perhaps do some profiling on some previous code I ran that I expected to run much faster and see how valid these suggestions might be. But I think there should be a lot that's possible here.

@klarh
Copy link
Collaborator

klarh commented Mar 12, 2019

I agree that adding cython would significantly increase the installation complexity and development overhead of the package. I would want to see compelling benchmarks for common use cases that currently run too slowly before being convinced that it's worth the trouble.

In some cases (I recall that the voronoi descriptors are particularly slow), perhaps it makes sense to move the core of some algorithms into freud proper?

@bdice
Copy link
Member

bdice commented Mar 12, 2019

@j-proc Is there a set of descriptors you are particularly interested in using?

@klarh It doesn't surprise me that Voronoi-based descriptors are among the slowest - freud does not perform its own Voronoi calculations, but instead clones particles across the periodic boundaries to ensure proper periodicity (increasing the problem size substantially) and calls out to scipy in a way that is perhaps less than efficient and likely not parallelized. I have been working with a different set of Voronoi-based descriptors that is not yet implemented in pythia, and have also observed that this is expensive.

@csadorf
Copy link

csadorf commented Mar 12, 2019

I'm not sure whether we are doing this yet, but a good start might be to create a test suite that does some benchmarking (and optionally profiling) so that we can reliably identify hot spots and potential regressions.

@j-proc
Copy link
Author

j-proc commented Mar 12, 2019

I could make that my first half of the hackathon. If I don't get to it before then. Just setting up some general benchmarking and profiling. I could solicit some code from the group on general freud scripts that people expect to run faster and perhaps don't. I expect a lot of this will come down to inefficient coding, but it would be good to know I think where the speed of everything slows down.

(Off topic for the package but related to the discussion)
One topic I haven't approached because it sort of violates the independence of freud from hoomd, but I think for the cases where there's freud calls while a simulation is running, a large speed up could be had by using the hoomd neighborlist instead of constructing a new one.

@j-proc
Copy link
Author

j-proc commented Mar 12, 2019

@bdice I was particularly thinking that the descriptors where a nearest neighbor range are used. These seemed particularly slow and I suspect are generating new neighborlists for each count.

@bdice
Copy link
Member

bdice commented Mar 12, 2019

@j-proc I searched the codebase for append and concatenate, which you mentioned in the first post. I see three places where append is used that could be preallocated instead. I didn't see any instances of concatenate in the package, only in example notebooks. This might help a bit.

  1. result.append(sphs)

  2. result.append(neighborhood_distance_singvals(box, positions, neighbors, rmax_guess))

  3. result.append(neighborhood_angle_singvals(box, positions, neighbors, rmax_guess))

@bdice
Copy link
Member

bdice commented Mar 12, 2019

@bdice I was particularly thinking that the descriptors where a nearest neighbor range are used. These seemed particularly slow and I suspect are generating new neighborlists for each count.

@j-proc The NearestNeighbors class in freud is using LinkCell internally and can be very slow in heterogeneous systems or when the LinkCell cell size (rmax_guess in pythia) is not chosen properly. @vyasr and I have a branch of freud (spatial_data_v2) where we're planning to redo much of how freud finds neighbors, using AABB trees by default (borrowed from HOOMD's fast periodic implementation) and this should make performance significantly faster in many cases, especially if the AABB tree is re-used (which is now much easier to do). We hope this will also substantially increase the speed of platoviz, where the same spatial data structures can then be re-used for multiple neighbor queries of different types.

In general, having a good benchmark suite for both freud and pythia would be extremely helpful.

@j-proc
Copy link
Author

j-proc commented Mar 12, 2019

@bdice Apologies, I think I'm just inserting what I expect the common usage would be there. You're correct.

I'm honestly not sure what all can be done. So Simon makes a very good point about doing the profiling first to see if there's actually much to do here. In the use cases I'd done in the past, it seemed like there were. But considering it's open source, one could always rewrite the functions to be faster for specific use cases.

@j-proc
Copy link
Author

j-proc commented Mar 12, 2019

@bdice @csadorf Agreed that benchmarking and possibly a bit of profiling would be the useful start to this. I'll make a similar issue in freud so they can be done separately if need be although as we said, the bulk should be in freud and that should likely be done first.

@bdice
Copy link
Member

bdice commented Mar 12, 2019

Another specific optimization that could be tested for pythia: Replace the construction of explicit NearestNeighbors and LinkCell classes in the various _nlist_helper functions with freud.locality.make_default_nlist_nn and freud.locality.make_default_nlist. Currently the AABB tree neighbor finding algorithm is undocumented, since we are revising its API substantially, but it used internally by those functions. Those functions should always use the fastest general algorithm we have for constructing neighborlists - there are cases where LinkCell beats the (undocumented) _AABBQuery but that's infrequent.

pythia/pythia/bonds.py

Lines 11 to 17 in f401f45

def _nlist_nn_helper(fbox, positions, neighbors, rmax_guess, exclude_ii=None):
if isinstance(neighbors, int):
nneigh = freud.locality.NearestNeighbors(rmax_guess, neighbors)
nneigh.compute(fbox, positions, positions, exclude_ii)
neighbors = nneigh.nlist
return neighbors

https://github.com/glotzerlab/freud/blob/9f89a2211e18d192fdedd4121e9e65c62b92ce64/freud/locality.pyx#L368

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants