Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add INSTRUCTOR Embedder (v2) #32

Merged
merged 15 commits into from
Oct 3, 2023
Merged

feat: Add INSTRUCTOR Embedder (v2) #32

merged 15 commits into from
Oct 3, 2023

Conversation

awinml
Copy link
Contributor

@awinml awinml commented Sep 28, 2023

Code Migrated from deepset-ai/haystack#5836.

Related Issues

Fixes #5242

Proposed Changes:

Add support for the INSTRUCTOR family of Embedding Models. They tailor the embeddings for different tasks and domains using a prompt (instruction) for each embedding.

  • The implementation is based on the Embedders Proposal and Embedder Design (#5312).

  • Adds InstructorEmbeddingBackend, responsible for performing the actual embedding computation, implemented as a singleton class in order to reuse instances.

  • Adds InstructorTextEmbedder, a component that embeds a list of strings into a list of vectors.

  • Adds InstructorDocumentEmbedder, a component that embeds a list of Documents. The embedding of each Document is stored in the embedding field of the Document.

Since the INSTRUCTOR Embedding Models are a modification of the Sentence Transformers models, the implementation is similar to the Sentence Transformers implementation (#5567).

This code was written collaboratively with @vrunm.

How did you test it?

Unit Tests have been added for each component.

@awinml awinml requested a review from a team as a code owner September 28, 2023 12:08
@awinml awinml requested review from ZanSara and removed request for a team September 28, 2023 12:08
@anakin87 anakin87 self-requested a review September 28, 2023 12:36
@anakin87 anakin87 self-assigned this Sep 28, 2023
@ZanSara
Copy link
Contributor

ZanSara commented Sep 29, 2023

@anakin87 note that CI is not running, so let's run the tests locally at least before approving. We'll take care of the CI separately.


Maybe that's because this file is missing the .yaml extension 👀 https://github.com/deepset-ai/haystack-extras/blob/main/.github/workflows/components_instructor_embedders

Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(@dfokina can you have a quick look at the docstrings?)

I felt free to make some minor changes, but
kudos again for a great job --> @awinml and @vrunm

It would also be nice if you open a PR in https://github.com/deepset-ai/haystack-integrations,
to showcase these new INSTRUCTOR Embedders, so that they can also appear on this web page.
Feel free to ask for help (if needed). 💙

@anakin87 anakin87 requested a review from dfokina October 2, 2023 14:30
@awinml
Copy link
Contributor Author

awinml commented Oct 2, 2023

@anakin87 Thanks!

The changes look good. I will open a PR in https://github.com/deepset-ai/haystack-integrations with examples to showcase these new INSTRUCTOR Embedders.

@anakin87 anakin87 merged commit 7621f3b into deepset-ai:main Oct 3, 2023
ElenaKusevska added a commit to Anant/haystack-core-integrations that referenced this pull request Dec 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for new embedding models
4 participants