Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement vector visualization in the AI tab #386

Open
deepbuzin opened this issue Dec 7, 2024 · 1 comment
Open

Implement vector visualization in the AI tab #386

deepbuzin opened this issue Dec 7, 2024 · 1 comment
Assignees

Comments

@deepbuzin
Copy link

The AI tab would benefit from having an embedding vector visualization. That is, whenever the user enables the AI extension and sets up a vector index, we could plot the resulting vectors in the UI using UMAP.

UMAP is a technique that enables us to project 2000-dimensional data to 2 or 3 dimensions, that is, points with 2 or 3 coordinates that we can plot. It preserves the local structure, so the user can see which pieces of text are similar to each other according to the embedding model.

  • There is an explainer article with pictures
  • Here's a live demo on real data (make sure to select UMAP in the bottom left corner)
  • Here's a JS library the demo above is built on
Screenshot 2024-12-07 at 1 02 00 PM Screenshot 2024-12-07 at 1 04 12 PM

A potential workflow such visualization could enable:

  1. The user creates a vector index
  2. They open up visualization and select a type to visualize
  3. They input a text query
  4. The query gets embedded via the API and then gets projected on visualization. The user is able to see what points in the index are the closest to their query.

Alternatively, they could browse the visualization, filter it by EdgeQL expressions and see what points cluster up together. This information would enable them to adjust the content of the property that gets indexed by the database.

There would have to be a cap of ~10000 samples that get visualized, otherwise it would take forever to calculate a projection. Those samples would need to be picked uniformly across all of the records.

@deepbuzin
Copy link
Author

As with the other one, please chat to @1st1 to clarify what this feature is about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants