Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: Fixes default asimilarity_search_with_relevance_scores call #17330

Closed
wants to merge 2 commits into from

Conversation

dokato
Copy link
Contributor

@dokato dokato commented Feb 9, 2024

Description:
While using AzureSearch I realized that the asimilarity_search_with_relevance_scores is a direct copy of similarity_search_with_relevance_scores. When child class inherits VectorStore and overrides similarity_search_with_relevance_scores but without async version, then it refers to a private method that is not always defined and results with NotImplementedError.

Given that this is a problem affecting also others: #11539 , #3242 maybe it's okay to have a default implementation of async calls that wrap the non-async versions with run_in_executor. The child classes can amend it if needed.
Sorry I didn't find any unittests that cover that, but happy to add if suggested.

Details described in ⏬
Issue: #17329
**Dependencies:**Nope
Twitter handle: dokatox

@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Feb 9, 2024
Copy link

vercel bot commented Feb 9, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Feb 19, 2024 10:29pm

@dosubot dosubot bot added Ɑ: vector store Related to vector store module 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Feb 9, 2024
@dokato
Copy link
Contributor Author

dokato commented Feb 20, 2024

as asked in the PR message, pinging @baskaryan for review, thanks!

@@ -346,32 +345,13 @@ async def asimilarity_search_with_relevance_scores(
Returns:
List of Tuples of (doc, similarity_score)
"""
score_threshold = kwargs.pop("score_threshold", None)

docs_and_similarities = await self._asimilarity_search_with_relevance_scores(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe _asimilarity_search_with_relevance_scores already has a default implementation (that asynchronously runs _similarity_search_with_relevance_scores), so why is this change needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of this change is that it shouldn't default to the previous implementation. Here's why:

Copy link
Collaborator

@baskaryan baskaryan Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't _similarity_search_with_relevance_scores, which default implementation of similarity_search_with_relevance_scores relies on, make same assumption about _select_relevance_score_fn?

think the correct fix would be for classes that overwrite similarity_search_with_relevance_scores and don't implement _select_relevance_score_fn to also overwrite asimilarity_search_with_relevance_scores (could be as simple as using the implementation you've provided here). but don't think that belongs on the base class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't _similarity_search_with_relevance_scores, which default implementation of similarity_search_with_relevance_scores relies on, make same assumption about _select_relevance_score_fn?

It does, but if child class overrides similarity_search_with_relevance_scores such that it doesn't use _similarity_search_with_relevance_scores then it doesn't matter.

think the correct fix would be for classes that overwrite similarity_search_with_relevance_scores and don't implement _select_relevance_score_fn to also overwrite asimilarity_search_with_relevance_scores (could be as simple as using the implementation you've provided here).

So what you are saying is we need to fix all children classes (note that this affected not only azuresearch but elastic as well, maybe others that are not tested too), rather than fixing one parent class. To me this sounds against DRY principle, but if you stay strongly about it happy to revamp this change to fix my Azure problem and let the others worry about their async communications.

I still think my proposal is more robust fix that prevents similar problems to occur in the future and gives flexibility to implement custom async functions as needed. Plus, if you look what asimilarity_search_with_score, asimilarity_search, and asimilarity_search_by_vector are doing is exactly that. I don't see why asimilarity_search_with_relevance_scores should be an exception?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think async relying on sync should only happen at the lowest level functions if at all

class FooBar(ABC):
  def foo(self):
    raise NotImplementedError()
 
  async def afoo(self):
    # child classes should overwrite if they have native async support
    return await run_in_executor(None, self.foo, ...)
    
  def foo_plus_2(self):
    return self.foo() + 2
    
  # what i think we should do
  async def afoo_plus_2(self):
    return (await self.afoo()) + 2

  # what i think we should *not* do
  async def afoo_plus_2(self):
    return await run_in_executor(None, self.foo_plus_2, ...)

what do you think? cc @eyurtsev @nfcampos as well

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's unfortunate here is some integrations overwrite equivalent of foo_plus_2 without ever implementing foo, which is what's causing us problems now

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason i prefer afoo_plus_2 relying on afoo instead of foo_plus_2 is that if someone implements afoo to use native async functionality then afoo_plus_2 will automatically get that as well, without having to also be overwritten

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yer, normally I'd agree with you but seems like we have empirical evidence already that afoo_plus_2 is not always self.afoo() + 2. Like Azuresearch doesn't need _select_relevance_score_fn to return relevance scores (which is a +2 part in your example).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so we either have to:

  1. update all child classes that don't implement _select_relevance_score_fn to use run_in_executor approach / otherwise implement asimilarity_search_with...
  2. update base class to use run_in_executor (this pr) and update all child classes that do implement _select_relevance_score_fn to use the _asimilarity_search_with... that's currently in base class (otherwise the base class change would make it so those classes that do implement _select_relevance... are losing the actual use of async)

either way we'll have to make updates to child classes, and my sense is option 1. would touch fewer classes than option 2. but we can check that explicitly if we want by counting how many classes don't override _select_relevance_score_fn

@elmargasimov
Copy link

Can someone give an update on the status of this issue? I get the NotImplementedError with SupabaseVectorStore when trying to use "similarity_score_threshold". I also can't use "mmr", but that is probably related to a separate issue.

@ccurme ccurme added the Ɑ: core Related to langchain-core label Jun 19, 2024
@hwchase17
Copy link
Contributor

we are revisiting vectorstore abstractions next month. closing this for now, we will address during that push

@hwchase17 hwchase17 closed this Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature Ɑ: core Related to langchain-core size:M This PR changes 30-99 lines, ignoring generated files. Ɑ: vector store Related to vector store module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants