feat(integrations): Add integration for qdrant #3623

mxrcooo · 2024-10-08T00:23:08Z

Adds an integration for Qdrant, supporting both REST and gRPC mode.

Qdrant is a vector-search database. Mentioned here: #3007 (comment)

Opening this as draft PR as communicated in the community Discord since there are still a few open questions from my side. Also, there's a lot of work (tests) to be done to finalize this.

The Qdrant service offers two APIs: REST and gRPC. They are almost the same in terms of data sent and regarding Qdrant SDK usage they only differ on the prefer_grpc parameter when creating the (Async)QdrantClient. While the data sent to the server is almost the same, they are still structurally different. Question: I currently handle this by simply using different op arguments (db.qdrant.rest and db.qdrant.grpc respectively) when creating the span. Is that fine or should they be merged? If yes, what about the description? They differ and imo it is not clean to merge them.
The HttpxIntegration captures the REST request caused by a Qdrant SDK call, leading to an almost duplicate span with less information than the one from our integration. QdrantIntegration offers the ability to mute this span which is done by accessing the _span_recorder of the current transaction and removing subsequent span from our current span. This feels very hacky.. is there a better way? Should this be done at all? Same goes for the GRPCIntegration when using Qdrant in gRPC mode.
Any ideas on writing tests for this? Qdrant supports a :memory: option but the monkey patches do not apply in this case, since it doesn't simulate a server but instead does all operations locally (see https://github.com/qdrant/qdrant-client/tree/master/qdrant_client/local). Mocking the responses would work but would be extremely cumbersome as - I think - we'd have to write different mocks for every single endpoint to not break the Qdrant SDK when handling the response. This would also have to be done twice, once for REST and once for gRPC. I guess we could parse their docs and auto-generate mock responses?

szokeasaurusrex

Left a few suggestions, where I think we could simplify the logic. I also replied to your questions on Discord

szokeasaurusrex · 2024-10-14T11:53:55Z

sentry_sdk/integrations/qdrant/consts.py

+
+# created from https://github.com/qdrant/qdrant/blob/master/docs/redoc/v1.11.x/openapi.json
+# only used for qdrants REST API. gRPC is using other identifiers
+_PATH_TO_OPERATION_ID = {


I am not sure it is such a good idea to hardcode this dictionary based on something from QDrant which could change in future QDrant versions. It would be better to somehow obtain this information from QDrant at runtime, to maintain compatibility with future versions.

szokeasaurusrex · 2024-10-14T11:58:40Z

sentry_sdk/integrations/qdrant/path_matching.py

+from typing import Any, Dict, Optional, List
+
+
+class TrieNode:


Why do we need to define a custom data structure here? Is there no way to do this with one of the APIs exposed by QDrant or with one of the built-in data structures?

szokeasaurusrex · 2024-10-14T12:02:59Z