Skip to content

Latest commit

 

History

History
87 lines (70 loc) · 4.04 KB

faiss_index_lookup_tool.md

File metadata and controls

87 lines (70 loc) · 4.04 KB

Faiss Index Lookup

Faiss Index Lookup is a tool tailored for querying within a user-provided Faiss-based vector store. In combination with our Large Language Model (LLM) tool, it empowers users to extract contextually relevant information from a domain knowledge base.

Requirements

  • For AzureML users, the tool is installed in default image, you can use the tool without extra installation.

  • For local users, if your index is stored in local path,

    pip install promptflow-vectordb

    if your index is stored in Azure storage,

    pip install promptflow-vectordb[azure]

Prerequisites

For AzureML users,

  • step 1. Prepare an accessible path on Azure Blob Storage. Here's the guide if a new storage account needs to be created: Azure Storage Account.

  • step 2. Create related Faiss-based index files on Azure Blob Storage. We support the LangChain format (index.faiss + index.pkl) for the index files, which can be prepared either by employing our promptflow-vectordb SDK or following the quick guide from LangChain documentation. Please refer to the instructions of An example code for creating Faiss index for building index using promptflow-vectordb SDK.

  • step 3. Based on where you put your own index files, the identity used by the promptflow runtime should be granted with certain roles. Please refer to Steps to assign an Azure role:

    Location Role
    workspace datastores or workspace default blob AzureML Data Scientist
    other blobs Storage Blob Data Reader

For local users,

  • Create Faiss-based index files in local path by only doing step 2 above.

Inputs

The tool accepts the following inputs:

Name Type Description Required
path string URL or path for the vector store.

local path (for local users):
<local_path_to_the_index_folder>

Azure blob URL format (with [azure] extra installed):
https://<account_name>.blob.core.windows.net/<container_name>/<path_and_folder_name>.

AML datastore URL format (with [azure] extra installed):
azureml://subscriptions/<your_subscription>/resourcegroups/<your_resource_group>/workspaces/<your_workspace>/data/<data_path>

public http/https URL (for public demonstration):
http(s)://<path_and_folder_name>
Yes
vector list[float] The target vector to be queried, which can be generated by the LLM tool. Yes
top_k integer The count of top-scored entities to return. Default value is 3. No

Outputs

The following is an example for JSON format response returned by the tool, which includes the top-k scored entities. The entity follows a generic schema of vector search result provided by our promptflow-vectordb SDK. For the Faiss Index Search, the following fields are populated:

Field Name Type Description
text string Text of the entity
score float Distance between the entity and the query vector
metadata dict Customized key-value pairs provided by user when create the index
Output
[
  {
    "metadata": {
      "link": "http://sample_link_0",
      "title": "title0"
    },
    "original_entity": null,
    "score": 0,
    "text": "sample text #0",
    "vector": null
  },
  {
    "metadata": {
      "link": "http://sample_link_1",
      "title": "title1"
    },
    "original_entity": null,
    "score": 0.05000000447034836,
    "text": "sample text #1",
    "vector": null
  },
  {
    "metadata": {
      "link": "http://sample_link_2",
      "title": "title2"
    },
    "original_entity": null,
    "score": 0.20000001788139343,
    "text": "sample text #2",
    "vector": null
  }
]