Faiss Index Lookup is a tool tailored for querying within a user-provided Faiss-based vector store. In combination with our Large Language Model (LLM) tool, it empowers users to extract contextually relevant information from a domain knowledge base.
-
For AzureML users, the tool is installed in default image, you can use the tool without extra installation.
-
For local users, if your index is stored in local path,
pip install promptflow-vectordb
if your index is stored in Azure storage,
pip install promptflow-vectordb[azure]
-
step 1. Prepare an accessible path on Azure Blob Storage. Here's the guide if a new storage account needs to be created: Azure Storage Account.
-
step 2. Create related Faiss-based index files on Azure Blob Storage. We support the LangChain format (index.faiss + index.pkl) for the index files, which can be prepared either by employing our promptflow-vectordb SDK or following the quick guide from LangChain documentation. Please refer to the instructions of An example code for creating Faiss index for building index using promptflow-vectordb SDK.
-
step 3. Based on where you put your own index files, the identity used by the promptflow runtime should be granted with certain roles. Please refer to Steps to assign an Azure role:
Location Role workspace datastores or workspace default blob AzureML Data Scientist other blobs Storage Blob Data Reader
- Create Faiss-based index files in local path by only doing step 2 above.
The tool accepts the following inputs:
Name | Type | Description | Required |
---|---|---|---|
path | string | URL or path for the vector store. local path (for local users): <local_path_to_the_index_folder> Azure blob URL format (with [azure] extra installed): https:// <account_name> .blob.core.windows.net/<container_name> /<path_and_folder_name> .AML datastore URL format (with [azure] extra installed): azureml://subscriptions/ <your_subscription> /resourcegroups/<your_resource_group> /workspaces/<your_workspace> /data/<data_path> public http/https URL (for public demonstration): http(s):// <path_and_folder_name> |
Yes |
vector | list[float] | The target vector to be queried, which can be generated by the LLM tool. | Yes |
top_k | integer | The count of top-scored entities to return. Default value is 3. | No |
The following is an example for JSON format response returned by the tool, which includes the top-k scored entities. The entity follows a generic schema of vector search result provided by our promptflow-vectordb SDK. For the Faiss Index Search, the following fields are populated:
Field Name | Type | Description |
---|---|---|
text | string | Text of the entity |
score | float | Distance between the entity and the query vector |
metadata | dict | Customized key-value pairs provided by user when create the index |
Output
[
{
"metadata": {
"link": "http://sample_link_0",
"title": "title0"
},
"original_entity": null,
"score": 0,
"text": "sample text #0",
"vector": null
},
{
"metadata": {
"link": "http://sample_link_1",
"title": "title1"
},
"original_entity": null,
"score": 0.05000000447034836,
"text": "sample text #1",
"vector": null
},
{
"metadata": {
"link": "http://sample_link_2",
"title": "title2"
},
"original_entity": null,
"score": 0.20000001788139343,
"text": "sample text #2",
"vector": null
}
]