-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
all-nearest neighbours algorithm #32
Comments
Cobbled together a working prototype: https://github.com/schlegelp/aann Quick & dirty benchmarks finding all nearest-neighbours between two neurons (120k and 23k nodes, respectively):
I think that looks promising. The rust implementation gave me some headaches and there is definitely room for improvement if you have time to take a look. |
Quick update: I managed to get the Rust implementation from 1.5s down to ~0.5s in my test scenario. Closer to At this point I'm a bit stuck - with my limited experience in Rust I don't really see anything else that could be optimized. I also wrote another implementation in Cython (where I have slightly more of an inkling) but couldn't push it below ~0.5s either. |
All-by-all query with 100 random neurons from the hemibrain dataset:
|
I had a look around and it seems there aren't any published 3D delaunay triangulation implementations in rust 😢 The tree topology wouldn't be available for LM->EM conversions and as you say, would be inaccurate in the case that you needed to hop between nearby branches. |
I was always interested in trying to find/make an all by all implementation. Nevertheless, my reading of the paper you linked was that the advantage in their Java implementation really only appeared in the scale of ~ 100,000 points. I think in general the construction and querying of k-d indices is very efficient in the best nearest neighbour packages – they have been very professionally optimised. Therefore we as amateurs in this area can probably only hope to be competitive when the big O scaling really starts to tip in favour of the all by all approach ie with a lot points. But since we should normally be operating in the 200-5000 point range, i think we are unlikely to get a major speed-up.
do you know the range/distribution of points-per-neuron in this sample? |
Yes, doesn't figure 1 suggest that tree-based implementations are CPU-faster when working in the 0-10k point range, which is mainly where we're working? |
I thought I had seen at least one. What about simple_delaunay_lib or d3_geo_voronoi_rs?
Exactly. Could ameliorate the problem by introducing extra edges but that would increase the query time again. Not sure how often that makes a massive difference though - will test.
Definitely true for me but I have faith in @clbarnes and @aschampion ;) I was for example thinking about using SIMD (single instruction multiple data) for the distance calculation - which is likely one of the bottlenecks - but that turned out to be too advanced for me. FWIW: I came across a couple other papers on this topic. One described an implementation using RTrees but optimising the search using local neighbourhood to speed up all-by-all queries.
I'm running a larger benchmark at the moment. Will share distribution once that's finished. |
This is the distribution of node counts for the sample of 100 neurons: Average size is ~5.5k nodes.
I just quickly tried combining all 100 neurons into one large point cloud to use as the query - i.e. instead of 100 queries x 100 targets I run 1 combined query x 100 targets and split the results into the original neurons afterwards. Unfortunately that doesn't seem to make much of a difference. Looks like the size of the target is the determining factor. |
Another small update: I made various small improvements such as avoiding making unnecessary copies/initialisations and using SIMD for distance calculations. Here is a new benchmark - this time with 200 hemibrain neurons which favours shorter query times:
As you can see the graph-based approximate nearest neighbor implementation is now in total ~2x faster than |
I'm tempted to implement this to see if it might be useful for NBLAST: ALL NEAREST NEIGHBOUR CALCULATION BASED ON DELAUNAY GRAPHS (arXiv)
Have you by any chance already looked into/tried something like this @clbarnes @aschampion @jefferis?
The text was updated successfully, but these errors were encountered: