You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am experiencing an issue with FAISS where batch retrieval of multiple embeddings using IndexIDMap(IndexFlatIP) behaves incorrectly. Specifically, while single-vector retrieval works flawlessly, retrieving multiple vectors simultaneously results in all queries returning the same ID with similarity scores converging to zero as the batch size increases.
When performing batch retrieval with multiple embeddings, each query should independently return the most similar ID with high similarity scores, similar to single-vector retrieval.
Actual Behavior:
Single Retrieval: Functions correctly, returning accurate IDs with high similarity scores.
Batch Retrieval: As the number of queries increases, we begin to derive strange result values, and certain id is frequently seen. If the number of queries exceeds 50, only the same id 31 is always returned. Similarity scores decrease towards zero as the number of queries increases.
Additional Information:
Index Configuration
Using IndexIDMap with IndexFlatIP for inner product similarity.
Normalization
Both goal embeddings and query embeddings are L2-normalized using faiss.normalize_L2.
Data Characteristics:
Embedding dimension: 3072
Number of goal embeddings in index: ~2000
Each query embedding is a 1x3072 vector.
The text was updated successfully, but these errors were encountered:
Please try again with installing the faiss-gpu package from conda, following directions here. The faiss-gpu pypi package is not supported by this repository.
Summary
I am experiencing an issue with FAISS where batch retrieval of multiple embeddings using IndexIDMap(IndexFlatIP) behaves incorrectly. Specifically, while single-vector retrieval works flawlessly, retrieving multiple vectors simultaneously results in all queries returning the same ID with similarity scores converging to zero as the batch size increases.
OS: Ubuntu 20.04.6 LTS (Focal Fossa)
Faiss version: 1.7.2
Installed from: pip (faiss-gpu package)
Faiss compilation options: GPU enabled, running on Python 3.8.10
Running on:
Interface:
Reproduction instructions
Setup FAISS Index:
Retrieve IDs for Single and Multiple Embeddings:
Observation:
Single Retrieval:
Batch Retrieval (e.g., 50 embeddings):
Expected Behavior:
Actual Behavior:
Additional Information:
Index Configuration
Normalization
Data Characteristics:
The text was updated successfully, but these errors were encountered: