Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented Ball Tree using Haversine distance as mentioned in #237 #263

Merged
merged 1 commit into from
Sep 23, 2024

Conversation

AtlasPilotPuppy
Copy link

@AtlasPilotPuppy AtlasPilotPuppy commented Sep 21, 2024

Resolves #237
This PR implements Ball Tree with Haversine and Euclidian distance.

Thie implmentation does not utilize the SpatialQueries Trait. But can be modified after some discussion on how to handle the differeing parameters
This does implement all the function utilized in the query_knn.py There is demo code at the bottom of basics.ipynb

There are also some Rust tests to test the functioning of the ball tree. This ball tree was influenced by this repo.

The current approach was chosen since it creates a ball tree and remain in memory for iterative querying. This is very efficient in our case since these are applied to all rows of the dataframe.
This is my first time writing a Polars extension so ideas and feedback is welcome.

…ctqqq#237

Thie implmentation does not utilize the SpatialQueries Trait. But can
be modified after somce discussion on how to handle the differeing
parameters
This does implement all the function utilized in the query_knn.py
There is demo code at the botton of basics.ipynb

There are also some Rust tests to test the functioning of the ball tree.
This ball tree was influenced by [this repo](https://github.com/grantslatton/ball-tree).
@abstractqqq
Copy link
Owner

abstractqqq commented Sep 23, 2024

@AtlasPilotPuppy The package compiles and I am merging it, but I will make some changes. e.g. move the file for better organization, reuse some exsisting code, etc.. Thanks a lot for the work.

@abstractqqq abstractqqq merged commit f9438fb into abstractqqq:main Sep 23, 2024
7 checks passed
@AtlasPilotPuppy
Copy link
Author

@abstractqqq much appreciated. Let me know if there are other issues you would like to look at.
I wanted to reuse more but my lack of familiarity was a hindrance.
Another thing is that the Leaf doesn't own the points array. Which leads to some interesting challenges. It may be something else to consider to have the leaf take the ownership of the points rather than the containing function.

@abstractqqq
Copy link
Owner

abstractqqq commented Sep 23, 2024

@abstractqqq much appreciated. Let me know if there are other issues you would like to look at. I wanted to reuse more but my lack of familiarity was a hindrance. Another thing is that the Leaf doesn't own the points array. Which leads to some interesting challenges. It may be something else to consider to have the leaf take the ownership of the points rather than the containing function.

There is a OwnedLeaf type FYI. For KNN style query in dataframe, I use the regular Leaf type because I don't need to copy data any more. Data is already copied from df to a row major slice. For the py_kdt model I expose to Python (in src/pymodels.rs), I use the OwnedLeaf type because when data is passed from Python to Rust, life time info is obscure. I think in general the lifetime system forces you to in-line some functions which you would normally factor out. This forces one to make only "really good factors" and in-line those that are only for convenience.. It has its pros and cons.

Thank you again and this is great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: BallTree algorithm with haversine distance
2 participants