Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improvements to FastEmbed integration #558

Merged
merged 2 commits into from
Mar 7, 2024
Merged

Conversation

anakin87
Copy link
Member

@anakin87 anakin87 commented Mar 7, 2024

While looking at #554 and at the cookbook, I found several opportunities to improve this integration.

Fixes #554

@github-actions github-actions bot added type:documentation Improvements or additions to documentation integration:fastembed labels Mar 7, 2024
Comment on lines -161 to +167
source_pkgs = ["src", "tests"]
source = ["haystack_integrations"]
branch = true
parallel = true


[tool.coverage.paths]
fastembed_haystack = ["src/haystack_integrations", "*/fastembed-haystack/src"]
tests = ["tests", "*/fastembed-haystack/tests"]
parallel = false

[tool.coverage.report]
omit = ["*/tests/*", "*/__init__.py"]
show_missing=true
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

show coverage

@@ -35,7 +35,6 @@ def __init__(
threads: Optional[int] = None,
prefix: str = "",
suffix: str = "",
batch_size: int = 256,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the text embedder only accepts a single string,
so batch_size was misleading

Comment on lines +44 to +51
def embed(self, data: List[str], progress_bar=True, **kwargs) -> List[List[float]]:
# the embed method returns a Iterable[np.ndarray], so we convert it to a list of lists
embeddings = [np_array.tolist() for np_array in self.model.embed(data, **kwargs)]
embeddings = []
embeddings_iterable = self.model.embed(data, **kwargs)
for np_array in tqdm(
embeddings_iterable, disable=not progress_bar, desc="Calculating embeddings", total=len(data)
):
embeddings.append(np_array.tolist())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the original library does not provide this feature,
we create the progress bar using tqdm in the embedding backend.

@anakin87 anakin87 requested a review from davidsbatista March 7, 2024 15:06
@anakin87 anakin87 marked this pull request as ready for review March 7, 2024 15:06
@anakin87 anakin87 requested a review from a team as a code owner March 7, 2024 15:06
@anakin87 anakin87 requested review from silvanocerza and removed request for a team March 7, 2024 15:06
@anakin87 anakin87 merged commit 24c06fb into main Mar 7, 2024
7 checks passed
@anakin87 anakin87 deleted the fastembed-fix-progress-bar branch March 7, 2024 15:13
@davidsbatista
Copy link
Contributor

sorry I had to come to the office to handle packaging, only now I saw this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integration:fastembed type:documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fastembed: progress_bar is not used
3 participants