-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: introduce SparseEmbedding
#7382
Conversation
@@ -114,7 +119,8 @@ def _create_id(self): | |||
mime_type = self.blob.mime_type if self.blob is not None else None | |||
meta = self.meta or {} | |||
embedding = self.embedding if self.embedding is not None else None | |||
data = f"{text}{dataframe}{blob}{mime_type}{meta}{embedding}" | |||
sparse_embedding = self.sparse_embedding.to_dict() if self.sparse_embedding is not None else "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This differs a bit from the other ones to not alter the id of existing Documents.
I can change it if you think it's better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! This approach looks good to me 👍
Pull Request Test Coverage Report for Build 8345165529Details
💛 - Coveralls |
@@ -114,7 +119,8 @@ def _create_id(self): | |||
mime_type = self.blob.mime_type if self.blob is not None else None | |||
meta = self.meta or {} | |||
embedding = self.embedding if self.embedding is not None else None | |||
data = f"{text}{dataframe}{blob}{mime_type}{meta}{embedding}" | |||
sparse_embedding = self.sparse_embedding.to_dict() if self.sparse_embedding is not None else "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! This approach looks good to me 👍
* introduce SparseEmbedding * reno * add to pydoc config
Related Issues
Proposed Changes:
Introduce a new class to store sparse embeddings.
Contains two fields:
indices
andvalues
, which must have the same length.How did you test it?
New unit tests, change some existing tests, CI.
Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
.