Replies: 2 comments
-
Ingestion is a bit more tricky with neural search, it requires an ingestion pipeline. |
Beta Was this translation helpful? Give feedback.
0 replies
-
i might look into it, not a promise on my part unfortunately. However, currently i am working, and already submitting an embedding using opensearch pretrained model. Still waiting for review, #27025 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Checked
Feature request
OpenSearch has plug-in support for local and externally hosted models that allows OpenSearch to generate its own embeddings from text during both ingest and search. It has a Predict API (for generating embeddings directly) as well as a Neural Search feature that can do similarity searches (and other searches) without having to pass in a pre-calculated vector.
Motivation
We populate our OpenSearch indices outside of LangChain, and use a model-aware ingest pipeline inside OpenSearch to generate embeddings. When searching without LangChain, OpenSearch uses that same model internally to do similarity searching. It would be great to use LangChain's abstractions to access OpenSearch without having to define an embedding function on the LangChain side (since OpenSearch already has one it can use automatically).
Proposal (If applicable)
Since the existing
OpenSearchVectorSearch
class requires an embedding function and assumes that it needs to be in charge of creating and passing around the vectors used for both indexing and searching, maybe an entirely separate class (OpenSearchNeuralSearch
, maybe?) would be best for this use case. It would be a pretty thin abstraction layer on top ofopensearch-py
's regular client, passing in the appropriate ML-aware OpenSearch parameters (model_id
for searching; optionalpipeline
for ingest) as needed.It would work with standalone OpenSearch clusters as well as AWS OpenSearch Service domains, but would not be compatible with AWS OpenSearch Serverless, because that service doesn't (yet) support the ML Commons plugin that drives the model-aware ingest and search functionality.
Beta Was this translation helpful? Give feedback.
All reactions