Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: introduce SparseEmbedding #7382

Merged
merged 3 commits into from
Mar 19, 2024
Merged

feat: introduce SparseEmbedding #7382

merged 3 commits into from
Mar 19, 2024

Conversation

anakin87
Copy link
Member

@anakin87 anakin87 commented Mar 19, 2024

Related Issues

Proposed Changes:

Introduce a new class to store sparse embeddings.
Contains two fields: indices and values, which must have the same length.

How did you test it?

New unit tests, change some existing tests, CI.

Checklist

@github-actions github-actions bot added topic:tests 2.x Related to Haystack v2.0 type:documentation Improvements on the docs labels Mar 19, 2024
@@ -114,7 +119,8 @@ def _create_id(self):
mime_type = self.blob.mime_type if self.blob is not None else None
meta = self.meta or {}
embedding = self.embedding if self.embedding is not None else None
data = f"{text}{dataframe}{blob}{mime_type}{meta}{embedding}"
sparse_embedding = self.sparse_embedding.to_dict() if self.sparse_embedding is not None else ""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This differs a bit from the other ones to not alter the id of existing Documents.
I can change it if you think it's better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! This approach looks good to me 👍

@anakin87 anakin87 changed the title introduce SparseEmbedding feat: introduce SparseEmbedding Mar 19, 2024
@coveralls
Copy link
Collaborator

coveralls commented Mar 19, 2024

Pull Request Test Coverage Report for Build 8345165529

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 6 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.02%) to 89.233%

Files with Coverage Reduction New Missed Lines %
dataclasses/document.py 6 93.4%
Totals Coverage Status
Change from base Build 8339327256: 0.02%
Covered Lines: 5395
Relevant Lines: 6046

💛 - Coveralls

@anakin87 anakin87 marked this pull request as ready for review March 19, 2024 15:13
@anakin87 anakin87 requested review from a team as code owners March 19, 2024 15:13
@anakin87 anakin87 requested review from dfokina, davidsbatista, masci and silvanocerza and removed request for a team and davidsbatista March 19, 2024 15:13
@@ -114,7 +119,8 @@ def _create_id(self):
mime_type = self.blob.mime_type if self.blob is not None else None
meta = self.meta or {}
embedding = self.embedding if self.embedding is not None else None
data = f"{text}{dataframe}{blob}{mime_type}{meta}{embedding}"
sparse_embedding = self.sparse_embedding.to_dict() if self.sparse_embedding is not None else ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! This approach looks good to me 👍

@anakin87 anakin87 merged commit dbfd351 into main Mar 19, 2024
39 checks passed
@anakin87 anakin87 deleted the sparse-embedding branch March 19, 2024 17:04
@anakin87 anakin87 added this to the 2.0.1 milestone Mar 20, 2024
silvanocerza pushed a commit that referenced this pull request Apr 8, 2024
* introduce SparseEmbedding

* reno

* add to pydoc config
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to Haystack v2.0 topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants