Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating/Avoiding to load the entire tree into RAM #8

Open
splitbrain opened this issue Jun 8, 2023 · 3 comments
Open

Updating/Avoiding to load the entire tree into RAM #8

splitbrain opened this issue Jun 8, 2023 · 3 comments

Comments

@splitbrain
Copy link

I'm wondering if I could use this implementation to efficiently store OpenAI embedding vectors for my documents. However from what I understand from the documentation, it seems the tree has to be built completely in RAM first, before it can be persisted to the filesystem and then more efficiently be searched. It also seems a already created tree can't be updated/extended.

Would it be feasible to implement updating/extending the tree on disk?

@splitbrain
Copy link
Author

It seems my assumption that the search would be more memory efficient was false. Some data, in case others are interested:

  • Items in the Tree: 13375
  • Dimensions: 1536
  • .bin file size for FKDTree: 157Mb
  • Peak Memory during Tree creation: 550MB
  • Peak Memory during Nearest Neighbor Search: 510MB

It' s super fast, but unfortunately the memory requirements aren't compatible with most hosted PHP setups.

@hexogen
Copy link
Owner

hexogen commented Jun 10, 2023

It's possible to make low memory usage implementation.
For this 2 signs should be done.

  1. Create a disk tree builder. That will not use RAM.
  2. Remove caching during the search.

@hexogen
Copy link
Owner

hexogen commented Jun 10, 2023

Removing caching or constraining its debt is easy, but for tree builder, a new implementation is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants