Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better handling of very large and/or generators for vectorstore's add methods #32

Open
hemidactylus opened this issue May 23, 2024 · 0 comments

Comments

@hemidactylus
Copy link
Collaborator

The iterables over documents to insert are materialized at once in the current code. What if it's a billion documents?
Also, (e.g. the vectorize path) they are materialized in consuming them and then used again later. This won't work except for Lists.

These two points need to be addressed, esp. thinking of very large amounts of documents. Batching an iterable comes to mind, (e.g. batches of 1k docs or so, each in turn doing "the usual thing" as is now (but with more care around materializing what possibly are iterables)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant