Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QdrantDocumentStore issue #642

Closed
marygriffus opened this issue Apr 4, 2024 · 2 comments
Closed

QdrantDocumentStore issue #642

marygriffus opened this issue Apr 4, 2024 · 2 comments
Labels

Comments

@marygriffus
Copy link

Describe the bug
When creating a QdrantDocumentStore, if there is already a preexisting qdrant instance with data in it, and the params do not match precisely, recreate_index will destroy the old index and create a new one, blowing away the old data and making it so that incoming data does not match. Even after turning this off and updating settings to match our Qdrant params, I ran into an issue that seemed to be a mismatch between the QdrantDocumentStore and the QdrantClient.

To Reproduce
Bring up a qdrant instance and start a collection with the config:

{
  "params": {
    "vectors": {
      "fast-all-minilm-l6-v2": {
        "size": 384,
        "distance": "Cosine"
      }
    },
    "shard_number": 1,
    "replication_factor": 1,
    "write_consistency_factor": 1,
    "on_disk_payload": true
  },
  "hnsw_config": {
    "m": 16,
    "ef_construct": 100,
    "full_scan_threshold": 10000,
    "max_indexing_threads": 0,
    "on_disk": false
  },
  "optimizer_config": {
    ...
  },
  "wal_config": {
    ...
  },
  "quantization_config": null
}

Then instantiate a QdrantDocumentStore like below and use it in a pipeline.

        document_store = QdrantDocumentStore(
            url=qdrant_host,
            port=qdrant_port,
            index=fusion_payload.datastore,
            embedding_dim=384,
            similarity="cosine",
            recreate_index=False,
            hnsw_config={"m": 16, "ef_construct": 100}
        )

At first, I had recreate_index=True and left embedding_dim and hnsw_config blank, which blew my collection away and any new data failed to be added; ideally I think the document store would default to the settings discovered through the qdrant client. However, when I switched to recreate_index=False, I continued to have issues.

The first error I ran into with this setup was this:

  File "/src/app/.venv/lib/python3.11/site-packages/haystack_integrations/document_stores/qdrant/document_store.py", line 138, in __init__
    self._set_up_collection(index, embedding_dim, recreate_index, similarity, on_disk, payload_fields_to_index)
  File "/src/app/.venv/lib/python3.11/site-packages/haystack_integrations/document_stores/qdrant/document_store.py", line 389, in _set_up_collection
    current_distance = collection_info.config.params.vectors.distance
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'dict' object has no attribute 'distance'

I overrode those lines in _set_up_collection with these:

current_distance = collection_info.config.params.vectors["fast-all-minilm-l6-v2"].distance
current_vector_size = collection_info.config.params.vectors["fast-all-minilm-l6-v2"].size

and I then instead got this error:

...
  File "/src/app/.venv/lib/python3.11/site-packages/haystack_integrations/document_stores/qdrant/document_store.py", line 311, in query_by_embedding
    points = self.client.search(
             ^^^^^^^^^^^^^^^^^^^
  File "/src/app/.venv/lib/python3.11/site-packages/qdrant_client/qdrant_client.py", line 336, in search
    return self._client.search(
           ^^^^^^^^^^^^^^^^^^^^
  File "/src/app/.venv/lib/python3.11/site-packages/qdrant_client/qdrant_remote.py", line 497, in search
    search_result = self.http.points_api.search_points(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/src/app/.venv/lib/python3.11/site-packages/qdrant_client/http/api/points_api.py", line 1388, in search_points
    return self._build_for_search_points(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/src/app/.venv/lib/python3.11/site-packages/qdrant_client/http/api/points_api.py", line 636, in _build_for_search_points
    return self.api_client.request(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/src/app/.venv/lib/python3.11/site-packages/qdrant_client/http/api_client.py", line 76, in request
    return self.send(request, type_)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/src/app/.venv/lib/python3.11/site-packages/qdrant_client/http/api_client.py", line 99, in send
    raise UnexpectedResponse.for_response(response)
qdrant_client.http.exceptions.UnexpectedResponse: Unexpected Response: 400 (Bad Request)
Raw response content:
b'{"status":{"error":"Wrong input: Vector params for  are not specified in config"},"time":0.007150167}'

From the context I would expect this to be an issue with the embedding model, but there is no method to add the embedding model to the document store, so I might be misunderstanding.

Describe your environment (please complete the following information):

  • OS: iOS
  • Haystack version: 2.0.0
  • Integration version:
  • qdrant-client: 1.7.3
  • qdrant: 1.7.3
@marygriffus marygriffus added the bug Something isn't working label Apr 4, 2024
@anakin87
Copy link
Member

anakin87 commented Apr 16, 2024

Hey @marygriffus,
QdrantDocumentStore creates an opinionated Qdrant collection, which is meant to work well with Haystack.

The best way to use it is to create a new Document Store and then continue using it via Haystack.
If you already have a Qdrant collection, you should probably need to manually migrate it.

Resources:

@anakin87
Copy link
Member

anakin87 commented May 6, 2024

I'm closing this issue.
Feel free to reopen it if something is unclear or does not work.

@anakin87 anakin87 closed this as completed May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants