Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(integrations): Add integration for qdrant #3623

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

mxrcooo
Copy link

@mxrcooo mxrcooo commented Oct 8, 2024

Adds an integration for Qdrant, supporting both REST and gRPC mode.

Qdrant is a vector-search database. Mentioned here: #3007 (comment)

Opening this as draft PR as communicated in the community Discord since there are still a few open questions from my side. Also, there's a lot of work (tests) to be done to finalize this.

  1. The Qdrant service offers two APIs: REST and gRPC. They are almost the same in terms of data sent and regarding Qdrant SDK usage they only differ on the prefer_grpc parameter when creating the (Async)QdrantClient. While the data sent to the server is almost the same, they are still structurally different. Question: I currently handle this by simply using different op arguments (db.qdrant.rest and db.qdrant.grpc respectively) when creating the span. Is that fine or should they be merged? If yes, what about the description? They differ and imo it is not clean to merge them.

  2. The HttpxIntegration captures the REST request caused by a Qdrant SDK call, leading to an almost duplicate span with less information than the one from our integration. QdrantIntegration offers the ability to mute this span which is done by accessing the _span_recorder of the current transaction and removing subsequent span from our current span. This feels very hacky.. is there a better way? Should this be done at all? Same goes for the GRPCIntegration when using Qdrant in gRPC mode.

  3. Any ideas on writing tests for this? Qdrant supports a :memory: option but the monkey patches do not apply in this case, since it doesn't simulate a server but instead does all operations locally (see https://github.com/qdrant/qdrant-client/tree/master/qdrant_client/local). Mocking the responses would work but would be extremely cumbersome as - I think - we'd have to write different mocks for every single endpoint to not break the Qdrant SDK when handling the response. This would also have to be done twice, once for REST and once for gRPC. I guess we could parse their docs and auto-generate mock responses?

Copy link
Member

@szokeasaurusrex szokeasaurusrex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few suggestions, where I think we could simplify the logic. I also replied to your questions on Discord


# created from https://github.com/qdrant/qdrant/blob/master/docs/redoc/v1.11.x/openapi.json
# only used for qdrants REST API. gRPC is using other identifiers
_PATH_TO_OPERATION_ID = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure it is such a good idea to hardcode this dictionary based on something from QDrant which could change in future QDrant versions. It would be better to somehow obtain this information from QDrant at runtime, to maintain compatibility with future versions.

from typing import Any, Dict, Optional, List


class TrieNode:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to define a custom data structure here? Is there no way to do this with one of the APIs exposed by QDrant or with one of the built-in data structures?

Comment on lines +43 to +56
else:
# Fake ParamSpec
class ParamSpec:
def __init__(self, _):
self.args = None
self.kwargs = None

# Callable[anything] will return None
class _Callable:
def __getitem__(self, _):
return None

# Make instances
Callable = _Callable()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can (and should) remove this, because you are using type comments.

Suggested change
else:
# Fake ParamSpec
class ParamSpec:
def __init__(self, _):
self.args = None
self.kwargs = None
# Callable[anything] will return None
class _Callable:
def __getitem__(self, _):
return None
# Make instances
Callable = _Callable()

Comment on lines +23 to +26
# Hack to get new Python features working in older versions
# without introducing a hard dependency on `typing_extensions`
# from: https://stackoverflow.com/a/71944042/300572
# taken from sentry_sdk.integrations.grpc.__init__.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed, see comment below

Suggested change
# Hack to get new Python features working in older versions
# without introducing a hard dependency on `typing_extensions`
# from: https://stackoverflow.com/a/71944042/300572
# taken from sentry_sdk.integrations.grpc.__init__.py

Comment on lines +37 to +42
from grpc import Channel, ClientCallDetails
from grpc.aio import UnaryUnaryCall
from grpc.aio import Channel as AsyncChannel
from google.protobuf.message import Message
from sentry_sdk.tracing import Span, Transaction, _SpanRecorder
from sentry_sdk.integrations.qdrant import QdrantIntegration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these only being used in type comments? If not, they should be imported outside the if TYPE_CHECKING block

return wrapper


# taken from grpc integration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use the grpc integration (or use this code from the grpc integration) directly?

return response


def _protobuf_to_dict(message, prefix=""):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does GRPC not provide a built-in way to do this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants