OpenSearchNeuralSearch vector store using OpenSearch's built-in model and embeddings features #18161

mbklein · 2024-02-27T01:35:34Z

mbklein
Feb 27, 2024

Checked

I searched existing ideas and did not find a similar one
I added a very descriptive title
I've clearly described the feature request and motivation for it

Feature request

OpenSearch has plug-in support for local and externally hosted models that allows OpenSearch to generate its own embeddings from text during both ingest and search. It has a Predict API (for generating embeddings directly) as well as a Neural Search feature that can do similarity searches (and other searches) without having to pass in a pre-calculated vector.

Motivation

We populate our OpenSearch indices outside of LangChain, and use a model-aware ingest pipeline inside OpenSearch to generate embeddings. When searching without LangChain, OpenSearch uses that same model internally to do similarity searching. It would be great to use LangChain's abstractions to access OpenSearch without having to define an embedding function on the LangChain side (since OpenSearch already has one it can use automatically).

Proposal (If applicable)

Since the existing OpenSearchVectorSearch class requires an embedding function and assumes that it needs to be in charge of creating and passing around the vectors used for both indexing and searching, maybe an entirely separate class (OpenSearchNeuralSearch, maybe?) would be best for this use case. It would be a pretty thin abstraction layer on top of opensearch-py's regular client, passing in the appropriate ML-aware OpenSearch parameters (model_id for searching; optional pipeline for ingest) as needed.

It would work with standalone OpenSearch clusters as well as AWS OpenSearch Service domains, but would not be compatible with AWS OpenSearch Serverless, because that service doesn't (yet) support the ML Commons plugin that drives the model-aware ingest and search functionality.

pberger514 · 2024-09-10T19:59:20Z

pberger514
Sep 10, 2024

Ingestion is a bit more tricky with neural search, it requires an ingestion pipeline.

0 replies

komikndr · 2024-10-04T12:28:09Z

komikndr
Oct 4, 2024

i might look into it, not a promise on my part unfortunately. However, currently i am working, and already submitting an embedding using opensearch pretrained model. Still waiting for review, #27025

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenSearchNeuralSearch vector store using OpenSearch's built-in model and embeddings features #18161

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

OpenSearchNeuralSearch vector store using OpenSearch's built-in model and embeddings features #18161

mbklein Feb 27, 2024

Checked

Feature request

Motivation

Proposal (If applicable)

Replies: 2 comments

pberger514 Sep 10, 2024

komikndr Oct 4, 2024

mbklein
Feb 27, 2024

pberger514
Sep 10, 2024

komikndr
Oct 4, 2024