Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary maxsim via hamming #191

Open
michaelbridge opened this issue Feb 17, 2025 · 4 comments
Open

Binary maxsim via hamming #191

michaelbridge opened this issue Feb 17, 2025 · 4 comments
Labels
type/question 🙋 Further information is requested

Comments

@michaelbridge
Copy link

michaelbridge commented Feb 17, 2025

Binary quantization and hamming distance are critical for scaling multi-vector representations (i.e., Colbert).

It looks as though hamming for binary vectors has already been implemented.

While a hamming-based maxsim can be implemented over this with a postgres function per approach here, is this something that might be supported/optimized within the library?

Beyond this, is an unpack_bits operation to convert a binary vector into a float representation (to improve accuracy in a subsequent rerank step) something contemplated?

@michaelbridge michaelbridge changed the title Binary quantization and hamming distance Binary maxsim via hamming Feb 17, 2025
@michaelbridge
Copy link
Author

Ah, looks like RaBitQ is an implementation of binary quant, so perhaps this question is better restated as, how can Colbert/Colpali late interaction be optimized within this framework?

@gaocegege gaocegege added the type/question 🙋 Further information is requested label Feb 27, 2025
@gaocegege
Copy link
Member

@michaelbridge
Copy link
Author

Thanks, but I linked that blog post above. Without a multi-vector index, that doesn't scale.

@VoVAllen
Copy link
Member

We're baking some approach at #197 based on https://github.com/jlscheerer/xtr-warp/tree/main/warp/search. Please stay tuned!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/question 🙋 Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants