-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improvements to FastEmbed integration #558
Conversation
source_pkgs = ["src", "tests"] | ||
source = ["haystack_integrations"] | ||
branch = true | ||
parallel = true | ||
|
||
|
||
[tool.coverage.paths] | ||
fastembed_haystack = ["src/haystack_integrations", "*/fastembed-haystack/src"] | ||
tests = ["tests", "*/fastembed-haystack/tests"] | ||
parallel = false | ||
|
||
[tool.coverage.report] | ||
omit = ["*/tests/*", "*/__init__.py"] | ||
show_missing=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
show coverage
@@ -35,7 +35,6 @@ def __init__( | |||
threads: Optional[int] = None, | |||
prefix: str = "", | |||
suffix: str = "", | |||
batch_size: int = 256, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the text embedder only accepts a single string,
so batch_size
was misleading
def embed(self, data: List[str], progress_bar=True, **kwargs) -> List[List[float]]: | ||
# the embed method returns a Iterable[np.ndarray], so we convert it to a list of lists | ||
embeddings = [np_array.tolist() for np_array in self.model.embed(data, **kwargs)] | ||
embeddings = [] | ||
embeddings_iterable = self.model.embed(data, **kwargs) | ||
for np_array in tqdm( | ||
embeddings_iterable, disable=not progress_bar, desc="Calculating embeddings", total=len(data) | ||
): | ||
embeddings.append(np_array.tolist()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the original library does not provide this feature,
we create the progress bar using tqdm
in the embedding backend.
sorry I had to come to the office to handle packaging, only now I saw this |
While looking at #554 and at the cookbook, I found several opportunities to improve this integration.
Fixes #554