Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Euclidean distance and Cosine similarity functions on dense vectors #23982

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

snash4
Copy link

@snash4 snash4 commented Nov 7, 2024

Description

Added two functions (Euclidean Distance and Cosine Similarity) on dense vectors.

Motivation and Context

feature requested here
#23981

Impact

Added two function for the feature:

  1. euclidean_distance_dense()

Screenshot 2024-11-07 at 1 25 10 PM

  1. cosine_similarity_dense()
    Screenshot 2024-11-07 at 1 24 46 PM

Test Plan

Inserted a vector dataset (SIFT benchmark ) in an Iceberg table and performed top-k similarity search on both functions. The table has embedding_id and vector columns. Vector column is an Array type. The functions return the top-k similar vectors to the query vector.

Screenshot 2024-11-07 at 1 24 10 PM

Screenshot 2024-11-07 at 1 22 49 PM

Added the relevant test cases

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==
presto-main operator changes
* Euclidean distance and cosine similarity functions for dense arrays -   pr: `23982`

@snash4 snash4 requested review from steveburnett, elharo and a team as code owners November 7, 2024 21:41
Copy link

CLA Not Signed

@snash4 snash4 changed the title Euclidean distance and Cosine similarity functions on dense vectors #23981 Euclidean distance and Cosine similarity functions on dense vectors Nov 7, 2024
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull branch, local docs build, review - looks good. Thanks for the doc!

@steveburnett
Copy link
Contributor

Thanks for the contribution! A few things to help your PR be ready to merge:

== NO RELEASE NOTE ==

@steveburnett
Copy link
Contributor

Thanks for the release note entry! A couple of suggestions to consider, to follow the Order of changes in the Release Note Guidelines.

== RELEASE NOTES ==
General Changes
* Add :func:`cosine_similarity_dense` and :func:`euclidean_distance_dense` functions for dense arrays.   pr:`23982`

@tdcmeehan tdcmeehan self-assigned this Nov 22, 2024
@tdcmeehan tdcmeehan added the from:IBM PR from IBM label Nov 22, 2024
@prestodb-ci prestodb-ci requested review from a team, jp-sivaprasad and auden-woolfson and removed request for a team November 22, 2024 16:01
@prestodb-ci
Copy link

Saved that user @snash4 is from IBM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:IBM PR from IBM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants