Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InstructorDocumentEmbedder is breaking #691

Closed
touhi99 opened this issue Apr 25, 2024 · 3 comments
Closed

InstructorDocumentEmbedder is breaking #691

touhi99 opened this issue Apr 25, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@touhi99
Copy link

touhi99 commented Apr 25, 2024

Hi, I am trying to load InstructorDocumentEmbedder:

doc_embedder = InstructorDocumentEmbedder(
model="intfloat/multilingual-e5-large-instruct",
instruction=doc_embedding_instruction,
batch_size=32
)

but on sentence-transformers 2.7.0 getting this error,

TypeError: INSTRUCTOR._load_sbert_model() got an unexpected keyword argument 'token'

tried reverting it back to sentence-transformers 2.2.2 but then getting another error, .cache/torch/sentence_transformers/intfloat_multilingual-e5-large-instruct/sentence_xlnet_config.json'

Please let me know if there's other way around to make it running or alternative to run instruct base embeddings. Much appreciated.

Describe your environment (please complete the following information):

  • OS: ubuntu 22.04
  • Haystack version: haystack-ai==2.0.1
  • Integration version: instructor-embedders-haystack==0.4.0
@touhi99 touhi99 added the bug Something isn't working label Apr 25, 2024
@anakin87
Copy link
Member

Hey!

You should use https://docs.haystack.deepset.ai/docs/sentencetransformersdocumentembedder. It's well documented.

This integration focuses instead on INSTRUCTOR embedding models: https://haystack.deepset.ai/integrations/instructor-embedder

@touhi99
Copy link
Author

touhi99 commented Apr 25, 2024

ok, should i use for e5-instruct in this approach?

doc_embedder = SentenceTransformersDocumentEmbedder(model="intfloat/multilingual-e5-large-instruct",
prefix=TASK_SPECIFIC_INSTRUCTION)

I saw instruct embedding integration, so got confused if I should use that one instead.

@anakin87
Copy link
Member

I'm closing the issue.
Feel free to open another issue for bugs or a discussion at https://github.com/deepset-ai/haystack/discussions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants