-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AstraDB delete_documents performs an unnecessary count #1047
Comments
Removed the expensive check to see if the collection is non-empty by performing a full count. This is to fix issue deepset-ai#1047
Thank you @brendancicchi for opening this issue. I can see that you already worked on a fix. Would you like to open a pull request and contribute the fix to Haystack? 🙂 You can find our contributing guidelines here: https://github.com/deepset-ai/haystack/blob/main/CONTRIBUTING.md#contribute-code |
@julian-risch Yes, I'll open a PR. Due to this test https://github.com/deepset-ai/haystack/blob/main/haystack/testing/document_store.py#L161C5-L163C61, it looks like I can't just change the logic to do the same on an empty collection as when no document_ids are deleted. So, instead, I'll grab any single document as a check for a non-empty collection, which will be much cheaper than the full count. |
Removed the expensive check to see if the collection is non-empty by performing a full count. This is to fix issue deepset-ai#1047
Removed the expensive check to see if the collection is non-empty by performing a full count. This is to fix issue #1047 Co-authored-by: Vladimir Blagojevic <[email protected]>
Removed the expensive check to see if the collection is non-empty by performing a full count. This is to fix issue #1047 Co-authored-by: Vladimir Blagojevic <[email protected]>
https://github.com/deepset-ai/haystack-core-integrations/blob/main/integrations/astra/src/haystack_integrations/document_stores/astra/document_store.py#L406-L427
In delete_documents, an unnecessary check is being done, which is performing a full count of the collection as a naive check to ensure the collection is not empty. This is an extremely expensive operation and not one that is necessary, as performing a delete on nonexisting documents is not harmful. In reality, the existing check does not actually check if the document exists or not anyway.
The count_documents check should be removed.
The text was updated successfully, but these errors were encountered: