Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

community: ClickHouse: Make it possible to not specify a vector index #18381

Closed
wants to merge 4 commits into from

Conversation

mneedham
Copy link
Contributor

@mneedham mneedham commented Mar 1, 2024

Vector indexes in ClickHouse are experimental at the moment and can sometimes break/change behaviour. So this PR makes it possible to say that you don't want to specify an index type.

Any queries against the embedding column will be brute force/linear scan, but that gives reasonable performance for small-medium dataset sizes.

In the other PR (#17247), @efriis asked:

> index_type: Optional[str] = "annoy"

What is this supposed to do? I think this will have unintended consequences in some string substitution stuff lower in the file

This makes index_type optional. I think I've defended against it being None inside the _schema function, which is where it's used. But let me know if you think I've missed somewhere else where it's used.

Copy link

vercel bot commented Mar 1, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Mar 13, 2024 9:51am

@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. Ɑ: vector store Related to vector store module 🤖:improvement Medium size change to existing code to handle new use-cases labels Mar 1, 2024
@baskaryan baskaryan requested a review from efriis March 1, 2024 19:24
@@ -209,6 +199,41 @@ def __init__(
self.client.command(f"SET allow_experimental_{self.config.index_type}_index=1")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mneedham just confirming - I'm pretty sure this change turns this into self.client.command(f"SET allow_experimental_None_index=1") unless I'm mistaken. This seems undesirable?

@mneedham mneedham force-pushed the clickhouse-community branch from 761533f to 39cb90d Compare March 4, 2024 09:47
@mneedham
Copy link
Contributor Author

mneedham commented Mar 4, 2024

@efriis I think I'm handling the empty index_type now, but it was also telling me that the width of the docstring that I added for the _schema function was too wide. I think I've fixed that, but not sure how to get it to run the linter on CI again?

@mneedham mneedham force-pushed the clickhouse-community branch from 39cb90d to 69fc31d Compare March 5, 2024 09:05
@mneedham
Copy link
Contributor Author

@efriis how do I get the build to run? I think I've fixed the lint issues that were there before.

@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Mar 25, 2024
@efriis
Copy link
Member

efriis commented Mar 25, 2024

Fixing in #19527

In the future if you make PRs from a personal branch, can push directly to your PR branch! https://python.langchain.com/docs/contributing/faq#how-do-i-allow-maintainers-to-edit-my-pr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:improvement Medium size change to existing code to handle new use-cases lgtm PR looks good. Use to confirm that a PR is ready for merging. size:M This PR changes 30-99 lines, ignoring generated files. Ɑ: vector store Related to vector store module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants