-
Notifications
You must be signed in to change notification settings - Fork 15.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core: Fixes default asimilarity_search_with_relevance_scores call #17330
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
as asked in the PR message, pinging @baskaryan for review, thanks! |
@@ -346,32 +345,13 @@ async def asimilarity_search_with_relevance_scores( | |||
Returns: | |||
List of Tuples of (doc, similarity_score) | |||
""" | |||
score_threshold = kwargs.pop("score_threshold", None) | |||
|
|||
docs_and_similarities = await self._asimilarity_search_with_relevance_scores( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i believe _asimilarity_search_with_relevance_scores already has a default implementation (that asynchronously runs _similarity_search_with_relevance_scores), so why is this change needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point of this change is that it shouldn't default to the previous implementation. Here's why:
_asimilarity_search_with_relevance_scores
makes some hidden assumptions about the existence of_select_relevance_score_fn
method. Some classes that inherit from the vector score extendsimilarity_search_with_relevance_scores
and without updatingasimilarity_search_with_relevance_scores
it gives errors, see asimilarity_search_with_relevance_scores returns NotImplementedError with AzureSearch #17329 for azure, and The ElasticsearchStore implementation is not correct. #11539 for elastic search.- rather than fixing children classes by c&p we can fix the root of the problem here and avoid similar issues in case someone decides to inherit in the future.
asimilarity_search_with_relevance_scores
is basically copy and paste fromsimilarity_search_with_relevance_scores
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't _similarity_search_with_relevance_scores
, which default implementation of similarity_search_with_relevance_scores
relies on, make same assumption about _select_relevance_score_fn
?
think the correct fix would be for classes that overwrite similarity_search_with_relevance_scores
and don't implement _select_relevance_score_fn
to also overwrite asimilarity_search_with_relevance_scores
(could be as simple as using the implementation you've provided here). but don't think that belongs on the base class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't _similarity_search_with_relevance_scores, which default implementation of similarity_search_with_relevance_scores relies on, make same assumption about _select_relevance_score_fn?
It does, but if child class overrides similarity_search_with_relevance_scores
such that it doesn't use _similarity_search_with_relevance_scores
then it doesn't matter.
think the correct fix would be for classes that overwrite similarity_search_with_relevance_scores and don't implement _select_relevance_score_fn to also overwrite asimilarity_search_with_relevance_scores (could be as simple as using the implementation you've provided here).
So what you are saying is we need to fix all children classes (note that this affected not only azuresearch but elastic as well, maybe others that are not tested too), rather than fixing one parent class. To me this sounds against DRY principle, but if you stay strongly about it happy to revamp this change to fix my Azure problem and let the others worry about their async communications.
I still think my proposal is more robust fix that prevents similar problems to occur in the future and gives flexibility to implement custom async functions as needed. Plus, if you look what asimilarity_search_with_score
, asimilarity_search
, and asimilarity_search_by_vector
are doing is exactly that. I don't see why asimilarity_search_with_relevance_scores
should be an exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think async relying on sync should only happen at the lowest level functions if at all
class FooBar(ABC):
def foo(self):
raise NotImplementedError()
async def afoo(self):
# child classes should overwrite if they have native async support
return await run_in_executor(None, self.foo, ...)
def foo_plus_2(self):
return self.foo() + 2
# what i think we should do
async def afoo_plus_2(self):
return (await self.afoo()) + 2
# what i think we should *not* do
async def afoo_plus_2(self):
return await run_in_executor(None, self.foo_plus_2, ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's unfortunate here is some integrations overwrite equivalent of foo_plus_2
without ever implementing foo
, which is what's causing us problems now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason i prefer afoo_plus_2
relying on afoo
instead of foo_plus_2
is that if someone implements afoo
to use native async functionality then afoo_plus_2
will automatically get that as well, without having to also be overwritten
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yer, normally I'd agree with you but seems like we have empirical evidence already that afoo_plus_2
is not always self.afoo() + 2
. Like Azuresearch doesn't need _select_relevance_score_fn
to return relevance scores (which is a +2
part in your example).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, so we either have to:
- update all child classes that don't implement _select_relevance_score_fn to use run_in_executor approach / otherwise implement asimilarity_search_with...
- update base class to use run_in_executor (this pr) and update all child classes that do implement _select_relevance_score_fn to use the _asimilarity_search_with... that's currently in base class (otherwise the base class change would make it so those classes that do implement _select_relevance... are losing the actual use of async)
either way we'll have to make updates to child classes, and my sense is option 1. would touch fewer classes than option 2. but we can check that explicitly if we want by counting how many classes don't override _select_relevance_score_fn
Can someone give an update on the status of this issue? I get the |
we are revisiting vectorstore abstractions next month. closing this for now, we will address during that push |
Description:
While using AzureSearch I realized that the
asimilarity_search_with_relevance_scores
is a direct copy ofsimilarity_search_with_relevance_scores
. When child class inheritsVectorStore
and overridessimilarity_search_with_relevance_scores
but without async version, then it refers to a private method that is not always defined and results with NotImplementedError.Given that this is a problem affecting also others: #11539 , #3242 maybe it's okay to have a default implementation of async calls that wrap the non-async versions with
run_in_executor
. The child classes can amend it if needed.Sorry I didn't find any unittests that cover that, but happy to add if suggested.
Details described in ⏬
Issue: #17329
**Dependencies:**Nope
Twitter handle: dokatox