From d2bb013b3a1218e3cd176b5e3e4e3611d457ac00 Mon Sep 17 00:00:00 2001 From: YanivHyper-Space <124336435+YanivHyper-Space@users.noreply.github.com> Date: Tue, 26 Mar 2024 08:36:47 +0200 Subject: [PATCH] Add files via upload --- .../BinaryVector/Binary_Vector_Search.ipynb | 1898 +++++++---------- 1 file changed, 821 insertions(+), 1077 deletions(-) diff --git a/DataSets/BinaryVector/Binary_Vector_Search.ipynb b/DataSets/BinaryVector/Binary_Vector_Search.ipynb index f48a661..a6fbe69 100644 --- a/DataSets/BinaryVector/Binary_Vector_Search.ipynb +++ b/DataSets/BinaryVector/Binary_Vector_Search.ipynb @@ -1,1098 +1,842 @@ { - "cells": [ - { - "cell_type": "markdown", - "source": [ - "![63f78014766fd30436c18a79_Hyperspace - navbar logo.png]()" - ], - "metadata": { - "id": "yZ7FEZEejbP9" - }, - "id": "yZ7FEZEejbP9" - }, - { - "cell_type": "markdown", - "source": [ - "# Binary Vector and Metadata Search with Hyperspace\n", - "This notebook demonstrates the use of Hyperspace engine for a hybrid search that combines vector search of binary vectors and metadata filtering over their corresponding metadata.\n", - "\n", - "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/hyper-space-io/QuickStart/blob/master/DataSets/BinaryVector/Binary_Vector_Search.ipynb)\n", - "# Hyperspace Hybrid search\n", - "This notenook combines approximate KNN (using IVF approximation) with metadata filtering. In this scheme, Hyperspace uses the post-filtering approach, by which the ANN matching is first perfomed, followed by metadata filtering.\n", - "\n", - "![PostFiltering.PNG]()\n", - "\n", - "This approach optimizes the query recall, at the expanse of latency, as applying the filtering first may cause the data graph to become sparse and omit relevant results from the search.\n", - "![SparseGraph.PNG]()\n" - ], - "metadata": { - "id": "DQf1g53MLo9r" - }, - "id": "DQf1g53MLo9r" - }, - { - "cell_type": "markdown", - "source": [ - "# The Dataset\n", - "The dataset includes randomly generated binary vectors of dimension 800 and corresponding metadata, that describes stores for a recommednation engine.\n", - "\n", - "## The Dataset Fields\n", - "The metadata can be downloaded from [here](https://github.com/hyper-space-io/QuickStart/blob/main/DataSets/BinaryVector/Generated_data.hsv) includes the following fields:\n", - "1. **country** [string] - The Country in which the store is located\n", - "2. **city** [string] - The city in which the store is located\n", - "3. **street** [keyword] - The street in which the store is located\n", - "4. **zip_code** [integer] - The store zipcode\n", - "5. **open_now** [boolean] - Is the store open\n", - "6. **vertical** [keyword] - The store vertical (industry)\n", - "\n", - "# Setting up the Hyperspace Environment\n", - "Setting the environment and running the query includes the following steps\n", - "\n", - "1. Download and install the client API\n", - "2. Connect to a server\n", - "3. Create data schema file\n", - "4. Create collection\n", - "5. Ingest data\n", - "6. Define Logic and Run a Query" - ], - "metadata": { - "id": "-JeIE7JqVTu_" - }, - "id": "-JeIE7JqVTu_" - }, - { - "cell_type": "markdown", - "source": [ - "## 1. Install the Hyperspace client API\n", - "Hyperspace API can be installed directly from git, using the following command:" - ], - "metadata": { - "id": "qol984iQq_O4" - }, - "id": "qol984iQq_O4" - }, - { - "cell_type": "code", - "source": [ - "pip install git+https://github.com/hyper-space-io/hyperspace-py" - ], - "metadata": { - "id": "U6p1YrVOL19j", - "outputId": "d46b3aa4-dc98-4be9-9bfc-da829b083830", - "colab": { - "base_uri": "https://localhost:8080/" + "cells": [ + { + "cell_type": "markdown", + "source": [ + "![63f78014766fd30436c18a79_Hyperspace - navbar logo.png]()" + ], + "metadata": { + "id": "yZ7FEZEejbP9" + }, + "id": "yZ7FEZEejbP9" }, - "ExecuteTime": { - "end_time": "2024-01-01T19:34:20.670682300Z", - "start_time": "2024-01-01T19:34:13.179071800Z" - } - }, - "id": "U6p1YrVOL19j", - "execution_count": 3, - "outputs": [ { - "name": "stdout", - "output_type": "stream", - "text": [ - "Collecting git+https://github.com/hyper-space-io/hyperspace-py\n", - " Cloning https://github.com/hyper-space-io/hyperspace-py to c:\\users\\tamirbracha\\appdata\\local\\temp\\pip-req-build-5xyh181q\n", - " Resolved https://github.com/hyper-space-io/hyperspace-py to commit c49c83710c5b466a4bb299f7ad70f618d3a6df94\n", - " Preparing metadata (setup.py): started\n", - " Preparing metadata (setup.py): finished with status 'done'\n", - "Requirement already satisfied: certifi>=14.05.14 in c:\\users\\tamirbracha\\appdata\\local\\programs\\python\\python310\\lib\\site-packages (from hyperspace==1.0.0) (2023.7.22)\n", - "Requirement already satisfied: six>=1.10 in c:\\users\\tamirbracha\\appdata\\local\\programs\\python\\python310\\lib\\site-packages (from hyperspace==1.0.0) (1.12.0)\n", - "Requirement already satisfied: python_dateutil>=2.5.3 in c:\\users\\tamirbracha\\appdata\\local\\programs\\python\\python310\\lib\\site-packages (from hyperspace==1.0.0) (2.8.2)\n", - "Requirement already satisfied: setuptools>=21.0.0 in c:\\users\\tamirbracha\\appdata\\local\\programs\\python\\python310\\lib\\site-packages (from hyperspace==1.0.0) (68.0.0)\n", - "Requirement already satisfied: urllib3>=1.15.1 in c:\\users\\tamirbracha\\appdata\\local\\programs\\python\\python310\\lib\\site-packages (from hyperspace==1.0.0) (2.0.3)\n", - "Requirement already satisfied: msgpack in c:\\users\\tamirbracha\\appdata\\local\\programs\\python\\python310\\lib\\site-packages (from hyperspace==1.0.0) (1.0.7)\n", - "Note: you may need to restart the kernel to use updated packages.\n" - ] + "cell_type": "markdown", + "source": [ + "# Binary Vector and Metadata Search with Hyperspace\n", + "This notebook demonstrates the use of Hyperspace engine for a hybrid search that combines vector search of binary vectors and metadata filtering over their corresponding metadata.\n", + "\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/hyper-space-io/QuickStart/blob/master/DataSets/BinaryVector/Binary_Vector_Search.ipynb)\n", + "# Hyperspace Hybrid search\n", + "This notenook combines approximate KNN (using IVF approximation) with metadata filtering. In this scheme, Hyperspace uses the post-filtering approach, by which the ANN matching is first perfomed, followed by metadata filtering.\n", + "\n", + "![PostFiltering.PNG]()\n", + "\n", + "This approach optimizes the query recall, at the expanse of latency, as applying the filtering first may cause the data graph to become sparse and omit relevant results from the search.\n", + "![SparseGraph.PNG]()\n" + ], + "metadata": { + "id": "DQf1g53MLo9r" + }, + "id": "DQf1g53MLo9r" }, { - "name": "stderr", - "output_type": "stream", - "text": [ - " Running command git clone --filter=blob:none --quiet https://github.com/hyper-space-io/hyperspace-py 'C:\\Users\\TamirBracha\\AppData\\Local\\Temp\\pip-req-build-5xyh181q'\n", - "DEPRECATION: textract 1.6.5 has a non-standard dependency specifier extract-msg<=0.29.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of textract or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063\n", - "WARNING: There was an error checking the latest version of pip.\n" - ] - } - ] - }, - { - "cell_type": "markdown", - "source": [ - "## 2. Connect to a server\n", - "\n", - "Once the Hyperspace API is installed, the database can be accessed by creating a local instance of the Hyperspace client. This step requires host address, username and password." - ], - "metadata": { - "id": "EtEeKuxJ7E2Q" - }, - "id": "EtEeKuxJ7E2Q" - }, - { - "cell_type": "code", - "execution_count": 4, - "id": "4161f9f8", - "metadata": { - "id": "4161f9f8", - "ExecuteTime": { - "end_time": "2024-01-01T19:34:21.863030600Z", - "start_time": "2024-01-01T19:34:20.675693200Z" - } - }, - "outputs": [], - "source": [ - "import hyperspace\n", - "from getpass import getpass\n", - "\n", - "username = \"USERNAME\"\n", - "host = \"HOST_URL\"\n", - "\n", - "hyperspace_client = hyperspace.HyperspaceClientApi(host=host, username=username, password=getpass())" - ] - }, - { - "cell_type": "markdown", - "source": [ - "Before continuing, let us check that the cluster is live" - ], - "metadata": { - "id": "yQb-y862X7Ec" - }, - "id": "yQb-y862X7Ec" - }, - { - "cell_type": "code", - "source": [ - "collections_info = hyperspace_client.collections_info()\n", - "display(collections_info)" - ], - "metadata": { - "id": "pl7oBXaTYDMq", - "outputId": "daa8cca8-ed8e-4b0b-b54e-354c7fd77900", - "colab": { - "base_uri": "https://localhost:8080/", - "height": 199 + "cell_type": "markdown", + "source": [ + "# The Dataset\n", + "The dataset includes randomly generated binary vectors of dimension 800 and corresponding metadata, that describes stores for a recommednation engine.\n", + "\n", + "## The Dataset Fields\n", + "The metadata can be downloaded from [here](https://github.com/hyper-space-io/QuickStart/blob/main/DataSets/BinaryVector/Generated_data.hsv) includes the following fields:\n", + "1. **country** [string] - The Country in which the store is located\n", + "2. **city** [string] - The city in which the store is located\n", + "3. **street** [keyword] - The street in which the store is located\n", + "4. **zip_code** [integer] - The store zipcode\n", + "5. **open_now** [boolean] - Is the store open\n", + "6. **vertical** [keyword] - The store vertical (industry)\n", + "\n", + "# Setting up the Hyperspace Environment\n", + "Setting the environment and running the query includes the following steps\n", + "\n", + "1. Download and install the client API\n", + "2. Connect to a server\n", + "3. Create data schema file\n", + "4. Create collection\n", + "5. Ingest data\n", + "6. Define Logic and Run a Query" + ], + "metadata": { + "id": "-JeIE7JqVTu_" + }, + "id": "-JeIE7JqVTu_" }, - "ExecuteTime": { - "end_time": "2024-01-01T19:34:22.117269100Z", - "start_time": "2024-01-01T19:34:21.866029900Z" - } - }, - "id": "pl7oBXaTYDMq", - "execution_count": 5, - "outputs": [ { - "data": { - "text/plain": "{'collections': {'all-MiniLM-L6-v2_arXiv': {'creation_time': '2024-01-01T19:23:25Z',\n 'last_query_time': '2024-01-01T19:25:41Z',\n 'size': 100001}}}" - }, - "metadata": {}, - "output_type": "display_data" - } - ] - }, - { - "cell_type": "markdown", - "source": [ - "## 3. Create a Data Schema File\n", - "\n", - "Similarly to other search databases, Hyper-Space database requires a configuration file that outlines the data schema. Here, we create a config file that corresponds to the fields of the given dataset.\n", - "\n", - "For vector fields, we also provide the index type to be used, and the metric. . Current options for index include \"**brute_force**\", \"**hnsw**\", \"**ivf**\", and \"**bin_ivf**\" for binary vectors, and \"**IP**\" (inner product) as a metric for floating point vectors and \"**Hamming**\" ([hamming distance](https://en.wikipedia.org/wiki/Hamming_distance)) for binary vectors.\n", - "Note that the key 'low_cardinality' enables faster search for low cardinality fields." - ], - "metadata": { - "id": "6BG-FyOlujHE" - }, - "id": "6BG-FyOlujHE" - }, - { - "cell_type": "code", - "execution_count": 6, - "id": "976b2177", - "metadata": { - "id": "976b2177", - "ExecuteTime": { - "end_time": "2024-01-01T19:34:22.189609900Z", - "start_time": "2024-01-01T19:34:22.122809Z" - } - }, - "outputs": [], - "source": [ - "import json\n", - "vector_dimension = 800 # bits\n", - "config = {\n", - " \"configuration\": {\n", - " \"id\": {\n", - " \"type\": \"keyword\",\n", - " \"id\": True\n", - " },\n", - " 'city': {\"type\": 'keyword'},\n", - " 'country': {\"type\": 'keyword'},\n", - " 'open_now': {\"type\": 'boolean'},\n", - " 'zip_code': {\"type\": 'integer'},\n", - " 'street': {\"type\": 'keyword'},\n", - " 'vertical': {\"type\": 'keyword', 'low_cardinality': True},\n", - " \"vector\": {\n", - " \"type\": \"dense_vector\",\n", - " \"index_type\": \"bin_ivf\",\n", - " \"dim\": vector_dimension,\n", - " \"metric\": \"hamming\"\n", - " }\n", - " }\n", - "}\n", - "\n", - "with open('config.json', 'w') as f:\n", - " f.write(json.dumps(config, indent=2))\n" - ] - }, - { - "cell_type": "markdown", - "source": [ - "## 4. Create Collection\n", - "The Hyerspace engine stroes data in Collections, where each collecction commonly hosts data of similar context, etc. Each search is then perfomed within a collection. We create a collection using the command \"**create_collection**(schema_filename, collection_name)\"." - ], - "metadata": { - "id": "g1DHlx75uklY" - }, - "id": "g1DHlx75uklY" - }, - { - "cell_type": "code", - "source": [ - "collection_name = 'GeneratedData'\n", - "\n", - "if collection_name not in hyperspace_client.collections_info()[\"collections\"]:\n", - " hyperspace_client.create_collection('config.json', collection_name)\n", - "\n", - "hyperspace_client.collections_info()" - ], - "metadata": { - "id": "z43Vz3nabLZe", - "outputId": "75803675-218a-41a3-fa3e-89c3f67ad3a2", - "colab": { - "base_uri": "https://localhost:8080/" + "cell_type": "markdown", + "source": [ + "## 1. Install the Hyperspace client API\n", + "Hyperspace API can be installed directly from git, using the following command:" + ], + "metadata": { + "id": "qol984iQq_O4" + }, + "id": "qol984iQq_O4" }, - "ExecuteTime": { - "end_time": "2024-01-01T19:34:22.583664500Z", - "start_time": "2024-01-01T19:34:22.149799700Z" - } - }, - "id": "z43Vz3nabLZe", - "execution_count": 7, - "outputs": [ { - "data": { - "text/plain": "{'collections': {'GeneratedData': {'creation_time': '2024-01-01T19:34:22Z',\n 'size': 0},\n 'all-MiniLM-L6-v2_arXiv': {'creation_time': '2024-01-01T19:23:25Z',\n 'last_query_time': '2024-01-01T19:25:41Z',\n 'size': 100001}}}" - }, - "execution_count": 7, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "cell_type": "markdown", - "source": [ - "# 5. Ingest data\n", - "\n", - "In the next step we ingest the dataset in batches of 250 documents. This number can be controlled by user, and in particular, can be increased in order improve ingestion time. We add batches of data using the command **add_batch**(batch, collection_name)." - ], - "metadata": { - "id": "9eUT0cBRu31m" - }, - "id": "9eUT0cBRu31m" - }, - { - "cell_type": "code", - "source": [ - "import random\n", - "import secrets\n", - "import base64\n", - "\n", - "def generate_data(metadata, vector_dimension):\n", - " data_point = random.choice(metadata)\n", - " random_bytes = secrets.token_bytes(vector_dimension // 8)\n", - " data_point['vector'] = base64.b64encode(random_bytes).decode()\n", - " return data_point" - ], - "metadata": { - "id": "_XCNBaIHLjZZ", - "ExecuteTime": { - "end_time": "2024-01-01T19:34:22.597661700Z", - "start_time": "2024-01-01T19:34:22.585667600Z" - } - }, - "id": "_XCNBaIHLjZZ", - "execution_count": 8, - "outputs": [] - }, - { - "cell_type": "code", - "execution_count": 9, - "id": "95086784", - "metadata": { - "id": "95086784", - "ExecuteTime": { - "end_time": "2024-01-01T19:35:07.446200300Z", - "start_time": "2024-01-01T19:34:22.603667200Z" - } - }, - "outputs": [ + "cell_type": "code", + "source": [ + "pip install git+https://github.com/hyper-space-io/hyperspace-py" + ], + "metadata": { + "id": "U6p1YrVOL19j", + "outputId": "16986439-fb1d-4884-927d-4fa40228ccf9", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "ExecuteTime": { + "end_time": "2024-01-01T19:34:20.670682300Z", + "start_time": "2024-01-01T19:34:13.179071800Z" + } + }, + "id": "U6p1YrVOL19j", + "execution_count": 2, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Collecting git+https://github.com/hyper-space-io/hyperspace-py\n", + " Cloning https://github.com/hyper-space-io/hyperspace-py to /tmp/pip-req-build-_8mrdqbv\n", + " Running command git clone --filter=blob:none --quiet https://github.com/hyper-space-io/hyperspace-py /tmp/pip-req-build-_8mrdqbv\n", + " Resolved https://github.com/hyper-space-io/hyperspace-py to commit 70d23409dc1b8be4a73845f17f6b8f84a104b4ea\n", + " Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + "Requirement already satisfied: certifi>=14.05.14 in /usr/local/lib/python3.10/dist-packages (from hyperspace==1.0.0) (2024.2.2)\n", + "Requirement already satisfied: six>=1.10 in /usr/local/lib/python3.10/dist-packages (from hyperspace==1.0.0) (1.16.0)\n", + "Requirement already satisfied: python_dateutil>=2.5.3 in /usr/local/lib/python3.10/dist-packages (from hyperspace==1.0.0) (2.8.2)\n", + "Requirement already satisfied: setuptools>=21.0.0 in /usr/local/lib/python3.10/dist-packages (from hyperspace==1.0.0) (67.7.2)\n", + "Requirement already satisfied: urllib3>=1.15.1 in /usr/local/lib/python3.10/dist-packages (from hyperspace==1.0.0) (2.0.7)\n", + "Requirement already satisfied: msgpack in /usr/local/lib/python3.10/dist-packages (from hyperspace==1.0.0) (1.0.8)\n", + "Building wheels for collected packages: hyperspace\n", + " Building wheel for hyperspace (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + " Created wheel for hyperspace: filename=hyperspace-1.0.0-py3-none-any.whl size=38874 sha256=3afc7add90e2b5b300414e43ff89ba6d65e03f1e9626725dda95efd1de900225\n", + " Stored in directory: /tmp/pip-ephem-wheel-cache-e67f250h/wheels/c4/96/59/f4b91d653fdbfc819e48a7dacbea1c9f3de59a1bc113aa840d\n", + "Successfully built hyperspace\n", + "Installing collected packages: hyperspace\n", + "Successfully installed hyperspace-1.0.0\n" + ] + } + ] + }, { - "name": "stdout", - "output_type": "stream", - "text": [ - "250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "1000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "1250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "1500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "1750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "2000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "2250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "2500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "2750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "3000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "3250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "3500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "3750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "4000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "4250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "4500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "4750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "5000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "5250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "5500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "5750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "6000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "6250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "6500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "6750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "7000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "7250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "7500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "7750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "8000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "8250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "8500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "8750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "9000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "9250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "9500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "9750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "10000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "10250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "10500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "10750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "11000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "11250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "11500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "11750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "12000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "12250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "12500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "12750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "13000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "13250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "13500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "13750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "14000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "14250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "14500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "14750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "15000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "15250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "15500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "15750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "16000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "16250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "16500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "16750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "17000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "17250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "17500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "17750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "18000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "18250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "18500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "18750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "19000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "19250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "19500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "19750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "20000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "20250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "20500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "20750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "21000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "21250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "21500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "21750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "22000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "22250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "22500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "22750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "23000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "23250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "23500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "23750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "24000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "24250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "24500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "24750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "25000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "25250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "25500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "25750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "26000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "26250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "26500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "26750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "27000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "27250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "27500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "27750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "28000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "28250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "28500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "28750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "29000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "29250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "29500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "29750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "30000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "30250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "30500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "30750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "31000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "31250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "31500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "31750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "32000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "32250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "32500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "32750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "33000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "33250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "33500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "33750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "34000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "34250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "34500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "34750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "35000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "35250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "35500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "35750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "36000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "36250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "36500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "36750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "37000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "37250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "37500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "37750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "38000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "38250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "38500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "38750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "39000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "39250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "39500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "39750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "40000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "40250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "40500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "40750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "41000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "41250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "41500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "41750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "42000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "42250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "42500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "42750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "43000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "43250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "43500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "43750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "44000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "44250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "44500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "44750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "45000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "45250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "45500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "45750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "46000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "46250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "46500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "46750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "47000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "47250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "47500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "47750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "48000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "48250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "48500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "48750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "49000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "49250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "49500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "49750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "50000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "50250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "50500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "50750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "51000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "51250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "51500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "51750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "52000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "52250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "52500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "52750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "53000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "53250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "53500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "53750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "54000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "54250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "54500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "54750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "55000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "55250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "55500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "55750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "56000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "56250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "56500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "56750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "57000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "57250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "57500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "57750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "58000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "58250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "58500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "58750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "59000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "59250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "59500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "59750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "60000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "60250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "60500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "60750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "61000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "61250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "61500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "61750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "62000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "62250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "62500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "62750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "63000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "63250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "63500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "63750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "64000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "64250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "64500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "64750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "65000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "65250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "65500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "65750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "66000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "66250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "66500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "66750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "67000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "67250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "67500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "67750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "68000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "68250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "68500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "68750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "69000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "69250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "69500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "69750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "70000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "70250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "70500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "70750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "71000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "71250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "71500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "71750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "72000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "72250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "72500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "72750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "73000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "73250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "73500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "73750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "74000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "74250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "74500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "74750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "75000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "75250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "75500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "75750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "76000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "76250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "76500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "76750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "77000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "77250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "77500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "77750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "78000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "78250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "78500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "78750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "79000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "79250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "79500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "79750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "80000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "80250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "80500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "80750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "81000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "81250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "81500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "81750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "82000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "82250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "82500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "82750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "83000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "83250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "83500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "83750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "84000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "84250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "84500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "84750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "85000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "85250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "85500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "85750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "86000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "86250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "86500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "86750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "87000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "87250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "87500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "87750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "88000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "88250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "88500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "88750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "89000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "89250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "89500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "89750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "90000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "90250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "90500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "90750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "91000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "91250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "91500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "91750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "92000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "92250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "92500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "92750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "93000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "93250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "93500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "93750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "94000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "94250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "94500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "94750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "95000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "95250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "95500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "95750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "96000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "96250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "96500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "96750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "97000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "97250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "97500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "97750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "98000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "98250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "98500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "98750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "99000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "99250 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "99500 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "99750 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n", - "100000 {'code': 200, 'message': 'Batch successfully added', 'status': 'OK'}\n" - ] + "cell_type": "markdown", + "source": [ + "## 2. Connect to a server\n", + "\n", + "Once the Hyperspace API is installed, the database can be accessed by creating a local instance of the Hyperspace client. This step requires host address, username and password." + ], + "metadata": { + "id": "EtEeKuxJ7E2Q" + }, + "id": "EtEeKuxJ7E2Q" }, { - "data": { - "text/plain": "{'code': 200, 'message': 'Dataset committed successfully', 'status': 'OK'}" - }, - "execution_count": 9, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "import pickle\n", - "\n", - "data_path = 'Generated_data.hsv'\n", - "with open(data_path, 'rb') as file:\n", - " metadata = pickle.load(file)\n", - "\n", - "BATCH_SIZE = 250\n", - "\n", - "batch = []\n", - "data = []\n", - "\n", - "\n", - "for i, vec in enumerate(range(100000)):\n", - " data_point = generate_data(metadata, vector_dimension)\n", - " data_point[\"id\"] = str(i)\n", - " batch.append(dict(data_point))\n", - "\n", - " if (i+1) % BATCH_SIZE == 0:\n", - " response = hyperspace_client.add_batch(batch, collection_name)\n", - " print(i + 1, response)\n", - " batch.clear()\n", - "\n", - "if batch:\n", - " hyperspace_client.add_batch(batch, collection_name)\n", - " response = hyperspace_client.add_batch(batch, collection_name)\n", - "\n", - "hyperspace_client.commit(collection_name)\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "source": [ - "#6. Define Logic and Run a Query\n", - "We will build a hybrid search query using Hyper-space. In the query, we will select a document and find similar ones. We must first compile the score function using the \"set_function\" command." - ], - "metadata": { - "id": "Yjo-yZwXDjkS" - }, - "id": "Yjo-yZwXDjkS" - }, - { - "cell_type": "code", - "source": [ - "import inspect\n", - "\n", - "def set_score_function(func, collection_name, score_function_name='func'):\n", - " source = inspect.getsource(func)\n", - " with open('sf.py', 'w') as f:\n", - " f.write(source)\n", - " hyperspace_client.set_function('sf.py', collection_name, score_function_name)\n" - ], - "metadata": { - "id": "Q-DQqoBpeETW", - "ExecuteTime": { - "end_time": "2024-01-01T19:35:07.490319600Z", - "start_time": "2024-01-01T19:35:07.450154700Z" - } - }, - "id": "Q-DQqoBpeETW", - "execution_count": 10, - "outputs": [] - }, - { - "cell_type": "code", - "execution_count": 11, - "id": "5c6e2ccb", - "metadata": { - "id": "5c6e2ccb", - "ExecuteTime": { - "end_time": "2024-01-01T19:35:08.570400500Z", - "start_time": "2024-01-01T19:35:07.465928200Z" - } - }, - "outputs": [], - "source": [ - "def similarity_score(params, doc):\n", - " score = 0.0\n", - " if match(\"country\"):\n", - " score = 5.0\n", - " if match(\"street\"):\n", - " score = 10.0\n", - " return score\n", - "\n", - "set_score_function(similarity_score, collection_name, score_function_name='similarity_score')" - ] - }, - { - "cell_type": "markdown", - "source": [ - "The next step is to retrieve a document and perform simialrity search." - ], - "metadata": { - "id": "RnQ4hgtueZ2j" - }, - "id": "RnQ4hgtueZ2j" - }, - { - "cell_type": "code", - "source": [ - "input_document = hyperspace_client.get_document(collection_name, 65)\n", - "input_document" - ], - "metadata": { - "id": "JpPhJG0wMsNg", - "outputId": "0b20a2d0-f55a-413e-8519-db377d5646c4", - "colab": { - "base_uri": "https://localhost:8080/" + "cell_type": "code", + "execution_count": 3, + "id": "4161f9f8", + "metadata": { + "id": "4161f9f8", + "ExecuteTime": { + "end_time": "2024-01-01T19:34:21.863030600Z", + "start_time": "2024-01-01T19:34:20.675693200Z" + } + }, + "outputs": [], + "source": [ + "import hyperspace\n", + "from getpass import getpass\n", + "\n", + "username = \"USERNAME\"\n", + "host = \"HOST_URL\"\n", + "\n", + "hyperspace_client = hyperspace.HyperspaceClientApi(host=host, username=username, password=getpass())" + ] }, - "ExecuteTime": { - "end_time": "2024-01-01T19:35:08.664430400Z", - "start_time": "2024-01-01T19:35:08.574444200Z" - } - }, - "id": "JpPhJG0wMsNg", - "execution_count": 12, - "outputs": [ { - "data": { - "text/plain": "{'city': 'North Andrew',\n 'country': 'Croatia',\n 'open_now': False,\n 'zip_code': 92543,\n 'vector': 'X5gYG7oS8VMDBzhYHRhtqIEylHrh/II+tXpBVtZFPotal7gRAq2++M1Bf5MgtxD3ebghfg1SD9H8IMWrVhJXnBr/kGx3pINOOjewX6Ei+yUB40fUx9QSA6aSMmHGt63svrzY0Q==',\n 'id': '65'}" - }, - "execution_count": 12, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "cell_type": "markdown", - "source": [ - "# Vector Search" - ], - "metadata": { - "id": "URXMq9eUHxKh" - }, - "id": "URXMq9eUHxKh" - }, - { - "cell_type": "code", - "execution_count": 13, - "id": "5ff98e9d", - "metadata": { - "id": "5ff98e9d", - "outputId": "100117d8-839b-46d4-dee8-7fee5c413c39", - "colab": { - "base_uri": "https://localhost:8080/" + "cell_type": "markdown", + "source": [ + "Before continuing, let us check that the cluster is live" + ], + "metadata": { + "id": "yQb-y862X7Ec" + }, + "id": "yQb-y862X7Ec" }, - "ExecuteTime": { - "end_time": "2024-01-01T19:35:08.757574300Z", - "start_time": "2024-01-01T19:35:08.664430400Z" - } - }, - "outputs": [ { - "name": "stdout", - "output_type": "stream", - "text": [ - "Query run time: 2.00ms\n", - "[{'document_id': '17462', 'score': 356.0},\n", - " {'document_id': '99138', 'score': 356.0},\n", - " {'document_id': '35344', 'score': 355.0},\n", - " {'document_id': '74306', 'score': 355.0},\n", - " {'document_id': '96339', 'score': 355.0},\n", - " {'document_id': '99973', 'score': 355.0},\n", - " {'document_id': '43130', 'score': 354.0},\n", - " {'document_id': '97431', 'score': 354.0},\n", - " {'document_id': '44183', 'score': 353.0},\n", - " {'document_id': '17057', 'score': 351.0},\n", - " {'document_id': '37998', 'score': 349.0},\n", - " {'document_id': '88328', 'score': 349.0},\n", - " {'document_id': '68018', 'score': 347.0},\n", - " {'document_id': '76859', 'score': 346.0},\n", - " {'document_id': '65', 'score': 0.0}]\n" - ] - } - ], - "source": [ - "import random\n", - "from pprint import pprint\n", - "\n", - "\n", - "query_with_knn = {\n", - " 'params': input_document,\n", - " \"query\": {\"boost\": 0},\n", - " 'vector':{\"boost\": 10}\n", - "}\n", - "\n", - "results = hyperspace_client.search(query_with_knn,\n", - " size=15,\n", - " collection_name=collection_name)\n", - "candidates = results['candidates']\n", - "\n", - "print(f\"Query run time: {results['took_ms']:.2f}ms\")\n", - "pprint(results['similarity'])\n" - ] - }, - { - "cell_type": "markdown", - "source": [ - "# Results\n", - "The results are sorted by Hamming distance, as expected." - ], - "metadata": { - "id": "5lzGiQ3ON9bW" - }, - "id": "5lzGiQ3ON9bW" - }, - { - "cell_type": "code", - "execution_count": 14, - "id": "ddd7d6a1", - "metadata": { - "id": "ddd7d6a1", - "outputId": "ad5c3ea4-aa01-4c6a-8208-6b66d5c3e5ea", - "colab": { - "base_uri": "https://localhost:8080/" + "cell_type": "code", + "source": [ + "collections_info = hyperspace_client.collections_info()\n", + "display(collections_info)" + ], + "metadata": { + "id": "pl7oBXaTYDMq", + "outputId": "2476a6ca-3369-4fcf-a031-633f06c06774", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 121 + }, + "ExecuteTime": { + "end_time": "2024-01-01T19:34:22.117269100Z", + "start_time": "2024-01-01T19:34:21.866029900Z" + } + }, + "id": "pl7oBXaTYDMq", + "execution_count": 4, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "{'collections': {'DocRetrievalEmbedded': {'creation_time': '2024-03-20T10:50:16Z',\n", + " 'last_query_time': '2024-03-25T09:27:27Z',\n", + " 'size': 39},\n", + " 'GeneratedData': {'creation_time': '2024-03-26T05:27:09Z',\n", + " 'last_query_time': '2024-03-26T06:24:53Z',\n", + " 'size': 100000}}}" + ] + }, + "metadata": {} + } + ] }, - "ExecuteTime": { - "end_time": "2024-01-01T19:35:09.886289900Z", - "start_time": "2024-01-01T19:35:08.762573600Z" - } - }, - "outputs": [ { - "name": "stdout", - "output_type": "stream", - "text": [ - "0 {'city': 'South Steven', 'country': 'Honduras', 'open_now': True, 'zip_code': 58113, 'vertical': 'jungle', 'vector': 'XV1vmYSScRUYB5qk/i3y4TBwLiD1vZUcpxv6ZFluLKVXL3hrqWruMf7C3jn16Havay2JHFOLbCWscma0Y32jeIMWKWRjRxsMDjsQL2sig9QFhmD6kh+oEqWy8NbUNCF1eUq2lw==', 'id': '17462'}\n", - "1 {'city': 'Robertview', 'country': 'Zambia', 'open_now': False, 'zip_code': 81403, 'vector': 'zJgBmeiToUsRCigTjQsFO9ixMjcg0QXYp9IX+bJJjQHnTjRSTEYXgDR9z6PUlDeTT6eRdihEGo0im3TLAiKI6YqrGLPysb1KQR+poSiKOmwxwiiKW/XQnGSH9BWPM88pj0ZmXA==', 'id': '99138'}\n", - "2 {'city': 'East Tracy', 'country': 'Belgium', 'open_now': False, 'zip_code': 64267, 'vertical': 'noodle', 'vector': 'W0HW2kU6OMmKl7RxZTjmqaN1Fq4l1u3l1J4sZqoovp2JDLrZSnrK/b8hO3BX5iOBE6ANYpowNAIkcfOtW5SPRw6TB3LjqQI+fBJT85hTkycl4PNH0BrK7trqUgLKqJRiopocuQ==', 'id': '35344'}\n", - "3 {'city': 'New Emilystad', 'country': 'Jersey', 'open_now': False, 'zip_code': 10183, 'street': 'zebra', 'vector': 'zSgcO42N0RcabLDbH36NesI1SSqYpOWGaiEbB/UhiUR6nm1On/g4OqrZL66hB25yWtT9ag5GCpHNTbiGuT4tjA9PjVqtlvAio/4umTw2lYUNpdSgC5UJUXOaIAbt/XjWvvbsGw==', 'id': '74306'}\n", - "4 {'city': 'Lake Johnstad', 'country': 'Ecuador', 'open_now': False, 'zip_code': 98539, 'street': 'ocean', 'vector': 'rpGVVsc6iZ1AN0dtIyiOiQVIyXAnXRoylaqRddykxkXcjbWc9ZQ3KTfW4AMc9luUcRuo+qh7N6h4IUrhySIIDVtb489Klkm0sn2kGkrT8/Vn4KZSs1bMi7JmcgrdvC/ICb8T6Q==', 'id': '96339'}\n", - "5 {'city': 'Bradleyton', 'country': 'Cape Verde', 'open_now': False, 'zip_code': 53080, 'street': 'xylophone', 'vector': 'SXRJsDJoM3NaJivcv8qvCtmxQCxAg4gA8+MBYA7TPxEo9IGyqvmKey8CTE2ThWrice1Lhnnebs7PbnTM7sl0yToSoLHUlEHctmy/viWzr6T7xlsXsqGm05MXsCEohhtYoKzLkA==', 'id': '99973'}\n", - "6 {'city': 'Jenniferberg', 'country': 'Congo', 'open_now': False, 'zip_code': 30878, 'vector': 'uHtIlBNl+GMbAyUexBLqX/bYWemiREVUI/55RTY/JBlWwbc0Sy4eWK7aZp/8sxrSedwzuNeQN1ckJ9+u6QJuCc93LVjWEpS9ghjUy6BDjkzvd025XYLWAP7BB2x5qqdN53QYdQ==', 'id': '43130'}\n", - "7 {'city': 'New Sheila', 'country': 'Belarus', 'open_now': False, 'zip_code': 16013, 'street': 'ice cream', 'vector': '75Tq/ZwCaH3VK3vVDln37khiZnCXuo/LOLlXJUB7Khu/cJDgDkf06Gy9eyPUCRlfYjQCD8XWF1LzDBwqqhHmqh0xWPFFjSVYvjrmClia3n8189oaOtwxQDPT8/AGMSeDvoFsPg==', 'id': '97431'}\n", - "8 {'city': 'Valdezborough', 'country': 'Serbia', 'open_now': True, 'zip_code': 44900, 'street': 'sun', 'vector': 'rq4LMe8ZDX2fx9F4uzC8uEMm8mtz/XBF96jBoZcjRnYWJ4pWa/OKk9RJAEEDlv49FQFRW4xzEaJ596m73tP+moOkxNDtlC8uPJoyMSgF4G6Hx0m5hXNZTTIxun1A9nodjWbNww==', 'id': '44183'}\n", - "9 {'city': 'Lake Michaelberg', 'country': 'Ghana', 'open_now': False, 'zip_code': 15316, 'street': 'rainbow', 'vector': 'kpE3ihzWu04xHfjQ1aY34M1kZDXpxZ98N3KUNKXYlhN+QUhSDCsu5c8RtipY10b1vhb1z2DCFWjPYIOfnXrlpeV0miUxAZ7UPmqkX76t1bshMOfM0znFGLAJQONTQ7Z3jZKG8w==', 'id': '17057'}\n", - "10 {'city': 'West Markmouth', 'country': 'Sierra Leone', 'open_now': False, 'zip_code': 76021, 'vector': 'n8MiB7Itk3QPubsEAJ3kzr/6Th9j1w5GEGvWUsBQRIPlT9AGiovS2A+j19ciY9BzFuvj+/scTRtMPRLDpwUZuR/Fs7I7vsN12uXIQqGkaOXBunCRP4i6j2xwsHv4Ub4gvwJz4w==', 'id': '37998'}\n", - "11 {'city': 'Jonesville', 'country': 'Niger', 'open_now': True, 'zip_code': 25186, 'vertical': 'xylophone', 'street': 'sun', 'vector': '0gpQE/7HXBMrXzHGTSW8GgCwamuyboh5YfNldgQInRmUHt5ESj3ceA+rjxIkdXEn9YdyfG/AX+WO7CMbXufQH4fhzgLaIXHkGnvQbkXakXRyPlzP+NXGeM8j4cGlV4qHrRl2Rg==', 'id': '88328'}\n", - "12 {'city': 'East Rachel', 'country': 'Grenada', 'open_now': True, 'zip_code': 63920, 'vector': '93DenhewDQUyOjZo/SIimZDxpPhpvSkY9vRQtkDfeZ56GfBkm8meMrWPyPjg+CnX7jyYNANW+pozZXYhV9ZUA0j3XURYFkIETkHNNaAm9QxTp3eOr2bDa4kcIqDMbpJwo3PN/Q==', 'id': '68018'}\n", - "13 {'city': 'Lake Ericton', 'country': 'Bangladesh', 'open_now': True, 'zip_code': 82176, 'vertical': 'sun', 'street': 'noodle', 'vector': 'n5KTYYcSII+fJ+fXfIqZo+oVWCvhY4B/r2oP1eIRXpSqNmlYQz8j6eQtNgAO0REDd9Zv9LtOLtAT0WQQ8BO2fthprvx+ZuFg9ZJSXwh2peAGQylkb9WfN4nSTjqSDwOEpi8iWQ==', 'id': '76859'}\n", - "14 {'city': 'North Andrew', 'country': 'Croatia', 'open_now': False, 'zip_code': 92543, 'vector': 'X5gYG7oS8VMDBzhYHRhtqIEylHrh/II+tXpBVtZFPotal7gRAq2++M1Bf5MgtxD3ebghfg1SD9H8IMWrVhJXnBr/kGx3pINOOjewX6Ei+yUB40fUx9QSA6aSMmHGt63svrzY0Q==', 'id': '65'}\n" - ] - } - ], - "source": [ - "for i, x in enumerate(results['similarity']):\n", - " document = hyperspace_client.get_document(collection_name, x[\"document_id\"])\n", - " print(i, document)" - ] - }, - { - "cell_type": "markdown", - "source": [ - "#Hybrid Search" - ], - "metadata": { - "id": "sd6AQ5Y3IEaH" - }, - "id": "sd6AQ5Y3IEaH" - }, - { - "cell_type": "code", - "source": [ - "import random\n", - "from pprint import pprint\n", - "\n", - "\n", - "query_with_knn = {\n", - " 'params': input_document,\n", - " \"query\": {\"boost\": 0},\n", - " 'vector':{\"boost\": 10}\n", - "}\n", - "\n", - "results = hyperspace_client.search(query_with_knn,\n", - " size=15,\n", - " function_name='similarity_score',\n", - " collection_name=collection_name)\n", - "candidates = results['candidates']\n", - "\n", - "print(f\"Query run time: {results['took_ms']:.2f}ms\")\n", - "pprint(results['similarity'])\n" - ], - "metadata": { - "id": "m6pMqw6NIH_K", - "ExecuteTime": { - "end_time": "2024-01-01T19:35:09.979062300Z", - "start_time": "2024-01-01T19:35:09.889289800Z" - } - }, - "id": "m6pMqw6NIH_K", - "execution_count": 15, - "outputs": [ + "cell_type": "markdown", + "source": [ + "## 3. Create a Data Schema File\n", + "\n", + "Similarly to other search databases, Hyper-Space database requires a configuration file that outlines the data schema. Here, we create a config file that corresponds to the fields of the given dataset.\n", + "\n", + "For vector fields, we also provide the index type to be used, and the metric. . Current options for index include \"**brute_force**\", \"**hnsw**\", \"**ivf**\", and \"**bin_ivf**\" for binary vectors, and \"**IP**\" (inner product) as a metric for floating point vectors and \"**Hamming**\" ([hamming distance](https://en.wikipedia.org/wiki/Hamming_distance)) for binary vectors.\n", + "Note that the key 'low_cardinality' enables faster search for low cardinality fields." + ], + "metadata": { + "id": "6BG-FyOlujHE" + }, + "id": "6BG-FyOlujHE" + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "976b2177", + "metadata": { + "id": "976b2177", + "ExecuteTime": { + "end_time": "2024-01-01T19:34:22.189609900Z", + "start_time": "2024-01-01T19:34:22.122809Z" + } + }, + "outputs": [], + "source": [ + "import json\n", + "vector_dimension = 800 # bits\n", + "config = {\n", + " \"configuration\": {\n", + " \"id\": {\n", + " \"type\": \"keyword\",\n", + " \"id\": True\n", + " },\n", + " 'city': {\"type\": 'keyword'},\n", + " 'country': {\"type\": 'keyword'},\n", + " 'open_now': {\"type\": 'boolean'},\n", + " 'zip_code': {\"type\": 'integer'},\n", + " 'street': {\"type\": 'keyword'},\n", + " 'vertical': {\"type\": 'keyword', 'low_cardinality': True},\n", + " \"vector\": {\n", + " \"type\": \"dense_vector\",\n", + " \"index_type\": \"bin_ivf\",\n", + " \"dim\": vector_dimension,\n", + " \"metric\": \"hamming\"\n", + " }\n", + " }\n", + "}\n", + "\n", + "with open('config.json', 'w') as f:\n", + " f.write(json.dumps(config, indent=2))\n" + ] + }, + { + "cell_type": "markdown", + "source": [ + "## 4. Create Collection\n", + "The Hyerspace engine stroes data in Collections, where each collecction commonly hosts data of similar context, etc. Each search is then perfomed within a collection. We create a collection using the command \"**create_collection**(schema_filename, collection_name)\"." + ], + "metadata": { + "id": "g1DHlx75uklY" + }, + "id": "g1DHlx75uklY" + }, + { + "cell_type": "code", + "source": [ + "collection_name = 'GeneratedData'\n", + "\n", + "if collection_name not in hyperspace_client.collections_info()[\"collections\"]:\n", + " hyperspace_client.create_collection('config.json', collection_name)\n", + "\n", + "hyperspace_client.collections_info()" + ], + "metadata": { + "id": "z43Vz3nabLZe", + "outputId": "3cfbf9e6-5b79-4bb3-fd30-76f7fe80f7b4", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "ExecuteTime": { + "end_time": "2024-01-01T19:34:22.583664500Z", + "start_time": "2024-01-01T19:34:22.149799700Z" + } + }, + "id": "z43Vz3nabLZe", + "execution_count": 6, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'collections': {'DocRetrievalEmbedded': {'creation_time': '2024-03-20T10:50:16Z',\n", + " 'last_query_time': '2024-03-25T09:27:27Z',\n", + " 'size': 39},\n", + " 'GeneratedData': {'creation_time': '2024-03-26T05:27:09Z',\n", + " 'last_query_time': '2024-03-26T06:24:53Z',\n", + " 'size': 100000}}}" + ] + }, + "metadata": {}, + "execution_count": 6 + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "# 5. Ingest data\n", + "\n", + "In the next step we ingest the dataset in batches of 250 documents. This number can be controlled by user, and in particular, can be increased in order improve ingestion time. We add batches of data using the command **add_batch**(batch, collection_name)." + ], + "metadata": { + "id": "9eUT0cBRu31m" + }, + "id": "9eUT0cBRu31m" + }, + { + "cell_type": "code", + "source": [ + "import random\n", + "import secrets\n", + "import base64\n", + "\n", + "def generate_data(metadata, vector_dimension):\n", + " data_point = random.choice(metadata)\n", + " random_bytes = secrets.token_bytes(vector_dimension // 8)\n", + " data_point['vector'] = base64.b64encode(random_bytes).decode()\n", + " return data_point" + ], + "metadata": { + "id": "_XCNBaIHLjZZ", + "ExecuteTime": { + "end_time": "2024-01-01T19:34:22.597661700Z", + "start_time": "2024-01-01T19:34:22.585667600Z" + } + }, + "id": "_XCNBaIHLjZZ", + "execution_count": 7, + "outputs": [] + }, { - "name": "stdout", - "output_type": "stream", - "text": [ - "Query run time: 2.43ms\n", - "[{'document_id': '65', 'score': 5.0}]\n" - ] + "cell_type": "code", + "source": [ + "pip install pandas requests\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "nnl5DAtITBbX", + "outputId": "246ac488-2883-4508-b204-b3387ce0766d" + }, + "id": "nnl5DAtITBbX", + "execution_count": 24, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (1.5.3)\n", + "Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (2.31.0)\n", + "Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.10/dist-packages (from pandas) (2.8.2)\n", + "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas) (2023.4)\n", + "Requirement already satisfied: numpy>=1.21.0 in /usr/local/lib/python3.10/dist-packages (from pandas) (1.25.2)\n", + "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests) (3.3.2)\n", + "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests) (3.6)\n", + "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests) (2.0.7)\n", + "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests) (2024.2.2)\n", + "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.1->pandas) (1.16.0)\n" + ] + } + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "95086784", + "metadata": { + "id": "95086784", + "ExecuteTime": { + "end_time": "2024-01-01T19:35:07.446200300Z", + "start_time": "2024-01-01T19:34:22.603667200Z" + } + }, + "outputs": [], + "source": [ + "import pickle\n", + "\n", + "data_path = 'Generated_data.hsv'\n", + "with open(data_path, 'rb') as file:\n", + " metadata = pickle.load(file)\n", + "\n", + "BATCH_SIZE = 250\n", + "\n", + "batch = []\n", + "data = []\n", + "\n", + "\n", + "for i, vec in enumerate(range(100000)):\n", + " data_point = generate_data(metadata, vector_dimension)\n", + " data_point[\"id\"] = str(i)\n", + " batch.append(dict(data_point))\n", + "\n", + " if (i+1) % BATCH_SIZE == 0:\n", + " response = hyperspace_client.add_batch(batch, collection_name)\n", + " print(i + 1, response)\n", + " batch.clear()\n", + "\n", + "if batch:\n", + " hyperspace_client.add_batch(batch, collection_name)\n", + " response = hyperspace_client.add_batch(batch, collection_name)\n", + "\n", + "hyperspace_client.commit(collection_name)\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "source": [ + "#6. Define Logic and Run a Query\n", + "We will build a hybrid search query using Hyper-space. In the query, we will select a document and find similar ones. We must first compile the score function using the \"set_function\" command." + ], + "metadata": { + "id": "Yjo-yZwXDjkS" + }, + "id": "Yjo-yZwXDjkS" + }, + { + "cell_type": "code", + "source": [ + "import inspect\n", + "\n", + "def set_score_function(func, collection_name, score_function_name='func'):\n", + " source = inspect.getsource(func)\n", + " with open('sf.py', 'w') as f:\n", + " f.write(source)\n", + " hyperspace_client.set_function('sf.py', collection_name, score_function_name)\n" + ], + "metadata": { + "id": "Q-DQqoBpeETW", + "ExecuteTime": { + "end_time": "2024-01-01T19:35:07.490319600Z", + "start_time": "2024-01-01T19:35:07.450154700Z" + } + }, + "id": "Q-DQqoBpeETW", + "execution_count": 11, + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "5c6e2ccb", + "metadata": { + "id": "5c6e2ccb", + "ExecuteTime": { + "end_time": "2024-01-01T19:35:08.570400500Z", + "start_time": "2024-01-01T19:35:07.465928200Z" + } + }, + "outputs": [], + "source": [ + "def similarity_score(params, doc):\n", + " score = 0.0\n", + " if match(\"country\"):\n", + " score = 5.0\n", + " if match(\"street\"):\n", + " score = 10.0\n", + " return score\n", + "\n", + "set_score_function(similarity_score, collection_name, score_function_name='similarity_score')" + ] + }, + { + "cell_type": "markdown", + "source": [ + "The next step is to retrieve a document and perform simialrity search." + ], + "metadata": { + "id": "RnQ4hgtueZ2j" + }, + "id": "RnQ4hgtueZ2j" + }, + { + "cell_type": "code", + "source": [ + "input_document = hyperspace_client.get_document(collection_name, 65)\n", + "input_document" + ], + "metadata": { + "id": "JpPhJG0wMsNg", + "outputId": "de3ec953-eb9d-4b1f-f4b9-1255dd3fb41e", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "ExecuteTime": { + "end_time": "2024-01-01T19:35:08.664430400Z", + "start_time": "2024-01-01T19:35:08.574444200Z" + } + }, + "id": "JpPhJG0wMsNg", + "execution_count": 13, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'city': 'Valerieport',\n", + " 'country': 'Maldives',\n", + " 'open_now': False,\n", + " 'zip_code': 11157,\n", + " 'vertical': 'xylophone',\n", + " 'vector': 'XojoT/WswsFMjLMX5p8tjUMDcWuGBX3GnBqPQUN/13unesI2n96T3c/vnE+eprGwtCX4ygyu3dSLC2PmPcUc3tlE8sm9VuSuf3gM64qv6St2tjBlcK2qBo/UPtUGXcRnlUD9rA==',\n", + " 'id': '65'}" + ] + }, + "metadata": {}, + "execution_count": 13 + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "# Vector Search" + ], + "metadata": { + "id": "URXMq9eUHxKh" + }, + "id": "URXMq9eUHxKh" + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "5ff98e9d", + "metadata": { + "id": "5ff98e9d", + "outputId": "9ec098ca-e6b0-40d6-ebd8-203777f996c0", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "ExecuteTime": { + "end_time": "2024-01-01T19:35:08.757574300Z", + "start_time": "2024-01-01T19:35:08.664430400Z" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Query run time: 1.90ms\n", + "[{'document_id': '10481', 'score': 357.0},\n", + " {'document_id': '48826', 'score': 357.0},\n", + " {'document_id': '61898', 'score': 357.0},\n", + " {'document_id': '80750', 'score': 357.0},\n", + " {'document_id': '65745', 'score': 356.0},\n", + " {'document_id': '62508', 'score': 354.0},\n", + " {'document_id': '85200', 'score': 354.0},\n", + " {'document_id': '9903', 'score': 354.0},\n", + " {'document_id': '12331', 'score': 353.0},\n", + " {'document_id': '29742', 'score': 353.0},\n", + " {'document_id': '79680', 'score': 352.0},\n", + " {'document_id': '63826', 'score': 350.0},\n", + " {'document_id': '34345', 'score': 347.0},\n", + " {'document_id': '41859', 'score': 338.0},\n", + " {'document_id': '65', 'score': 0.0}]\n" + ] + } + ], + "source": [ + "import random\n", + "from pprint import pprint\n", + "\n", + "\n", + "query_with_knn = {\n", + " 'params': input_document,\n", + " \"knn\": [{'field': \"vector\", 'boost': 1.0}]\n", + "}\n", + "\n", + "results = hyperspace_client.search(query_with_knn,\n", + " size=15,\n", + " collection_name=collection_name)\n", + "candidates = results['candidates']\n", + "\n", + "print(f\"Query run time: {results['took_ms']:.2f}ms\")\n", + "pprint(results['similarity'])\n" + ] + }, + { + "cell_type": "markdown", + "source": [ + "# Results\n", + "The results are sorted by Hamming distance, as expected." + ], + "metadata": { + "id": "5lzGiQ3ON9bW" + }, + "id": "5lzGiQ3ON9bW" + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "ddd7d6a1", + "metadata": { + "id": "ddd7d6a1", + "outputId": "49a225f8-5e00-4e76-9b40-e3eb686f26db", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "ExecuteTime": { + "end_time": "2024-01-01T19:35:09.886289900Z", + "start_time": "2024-01-01T19:35:08.762573600Z" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "0 {'city': 'South Alexandriaborough', 'country': 'Norway', 'open_now': True, 'zip_code': 24436, 'vertical': 'rainbow', 'vector': 'zbmrb+ImfooU/XHWHoAh1UcBpMQiBsdyTCE9BpA1HN0t8ybwjB+qhUbEbCqJuLH7U0amg1f9XSRcuQXmaR15HWXsiTRouCr7f/3cr9u/qQPOvYvK0Pj7HOG8LJc3c8FEhmbaGw==', 'id': '10481'}\n", + "1 {'city': 'West Brenda', 'country': 'Senegal', 'open_now': False, 'zip_code': 99714, 'street': 'banana', 'vector': '14EkJWFiA3TaWbXuexvICxNxeWAN8a4RgFokbREu8sfLwxzEggeHTFCniJSPDstmrSDIRE5+ERd0CYX/CKeKH0omxnO2ifUKkekWnoCD0C8sN5bqAkmX5uZGTtdcffQHVNOF7A==', 'id': '48826'}\n", + "2 {'city': 'North Markton', 'country': 'Mexico', 'open_now': True, 'zip_code': 15235, 'vertical': 'house', 'vector': 'wC3UHIUu5YoqnT1jdXAAtfMVP/3TpcPfXHheHgsHqYoSWibf2QIDlTALue/FRYYlJCAxyT0cmlSemGjs+hudM7pEg4hhZ/Li47kr6gi+gS3bJ9V4GHP+3C9OfF83lEIt7bCwPg==', 'id': '61898'}\n", + "3 {'city': 'Berryville', 'country': 'Costa Rica', 'open_now': True, 'zip_code': 70504, 'vector': 'EpNYUK4hlbhneymAh1IBreIq4fEo79H0VCHWLOIP56pLPNpHl7BmsxmumNeYihZ8vXU66kW5xeK+uzTeM+fKkERcNy2750SkZMnViwK3QqNBhaHjUAa4A4FCeWWTZZ3Y2O+3Xw==', 'id': '80750'}\n", + "4 {'city': 'Amberside', 'country': 'Honduras', 'open_now': False, 'zip_code': 43653, 'vector': 'jromyt5Ay8BIQFU58qe1rHUZmpuSb0q3TjMLHI23gBKF/UscjgQWw3aIU1fXy+U+9QTQSIOAfCXmkoLIHw8REhwH3uXzyCSAwYNVuzAspLqWZlx5MXyI8ZwUvvrfbK4tx6bejQ==', 'id': '65745'}\n", + "5 {'city': 'Candaceberg', 'country': 'Serbia', 'open_now': True, 'zip_code': 37172, 'vertical': 'umbrella', 'vector': '2mJRDKfwhhyIxifTr9IgxeTe7NKWrw0hobTqNEBoHmuA18J4m5adgB7Tv4vXdiU0h887z7D6+UUwQlxODd1USmhi/aEA/AR/xkNYWXufiFvunEprSQs6duMYqusFveQJckRzAQ==', 'id': '62508'}\n", + "6 {'city': 'South Lucasshire', 'country': 'Swaziland', 'open_now': False, 'zip_code': 56503, 'vertical': 'piano', 'vector': '5yhVFQ3swS1yZJD3s64mvEbWYCBttVZK7/BPPJVYI0a0dsDGuCytmcmSbNbanoBcPWf2tS2781czGbNPpS9R+TBJ8Fm9F42Kzf2oUlXJ7KOmh0fjvH/wut3KOlmQIE3kPgTZ/A==', 'id': '85200'}\n", + "7 {'city': 'Randolphville', 'country': 'Turkmenistan', 'open_now': False, 'zip_code': 33400, 'vector': 'nnmy6MCmWiOYvfiu3jFcXK30IxwaLluCKd/L4FEcBVJOr8XDAtNRDe7eG1YDiomw4husv2Doi0CZBu2wF8cw/VpS6RHkIvTIf8B1znAgoytF7ZHOIKz/XDleptMPqoStnwuYKg==', 'id': '9903'}\n", + "8 {'city': 'Morrisstad', 'country': 'Pakistan', 'open_now': False, 'zip_code': 82241, 'street': 'banana', 'vector': 'weIaSWP9SHFUQmLQ3Jlo/e+L8vlLZJWOXbzHVrEnuZfO4+MWK4bVjWCZLKO0RRWol17YzYTo3NIYQdSzPR9YUcBhwp+wldcalFEIGdf34cBskOVPGQLmC5rXNqcZ79coFpiIUQ==', 'id': '12331'}\n", + "9 {'city': 'Reginamouth', 'country': 'Kuwait', 'open_now': False, 'zip_code': 40156, 'street': 'ocean', 'vector': 'dsddzp1RhuUQHIl3xFSFqXCM5s4zQS32W0odxQkk2Ep0LH5QxtRZr0gTcIzlszuxhGt5TmpLD26YekanjMEAbA1DsimL2dToIyQEhQ5aiMartmNmPjA249+5cVPJMVwXuSwd7g==', 'id': '29742'}\n", + "10 {'city': 'Cochranhaven', 'country': 'Palau', 'open_now': True, 'zip_code': 97940, 'street': 'tiger', 'vector': 'PoiiJyifhc/vuY33c9u6Qd+i578fo8gq/cpplrp3nlv/06nT0m93AShP0pMh6HCdnbF5WSgDw4uvG3czeVgdYNrfMuedXeQJ42n8WoAvyu1W1E8EWMJDVb+QJ0VLjy7jpSpeLg==', 'id': '79680'}\n", + "11 {'city': 'South Paulbury', 'country': 'Timor-Leste', 'open_now': True, 'zip_code': 60407, 'street': 'rainbow', 'vector': 'Wbu6Z7IsH0fXxTBwkB4vi6PnMnqCkHihuJvQ2bCo8TmQ1fmi1o8Hmw6BzUG7fCHQuJh7kUsSt1e4OzzLXfyviOzGoqpB/vO4INw6aeie8q9RfsSWBW3LU8qQMMQbfyLLlTNr4A==', 'id': '63826'}\n", + "12 {'city': 'Knappshire', 'country': 'Bhutan', 'open_now': True, 'zip_code': 57177, 'vector': '2e52qRJ1cswHvznU7pmpDFtHoiSC110b/NoB3egK0DijNxH3kh6yMI+kImkmrCccBdU2ciWt+tzKidk7lbbwj/jQ7YXN16CoTOxh+xbJewuma+Wvss2O5ENT0sv9OMJPUYmhcA==', 'id': '34345'}\n", + "13 {'city': 'Barryport', 'country': 'South Georgia and the South Sandwich Islands', 'open_now': True, 'zip_code': 83769, 'street': 'rainbow', 'vector': 'jvplA+EtAr1JrFYfTgOvC4OZ+uPiUu413BjRzUtaZOvmRosxnA8oo+m6DUNXoNkh1txfeO2HLM8Zgh4qJ98K2uKYuNv5DzAwXH8AY2hFlQDdm5bcedgCtT7UZPiipbVNlWK6gA==', 'id': '41859'}\n", + "14 {'city': 'Valerieport', 'country': 'Maldives', 'open_now': False, 'zip_code': 11157, 'vertical': 'xylophone', 'vector': 'XojoT/WswsFMjLMX5p8tjUMDcWuGBX3GnBqPQUN/13unesI2n96T3c/vnE+eprGwtCX4ygyu3dSLC2PmPcUc3tlE8sm9VuSuf3gM64qv6St2tjBlcK2qBo/UPtUGXcRnlUD9rA==', 'id': '65'}\n" + ] + } + ], + "source": [ + "for i, x in enumerate(results['similarity']):\n", + " document = hyperspace_client.get_document(collection_name, x[\"document_id\"])\n", + " print(i, document)" + ] + }, + { + "cell_type": "markdown", + "source": [ + "#Hybrid Search\n", + "In the next step, we will perform Hybrid Search using two methods. In the first method, we will combine a classic score function with vector search through a linear combination of the scores. In the second method, we will use a Hybrid score function. In both methods we will use the pre-filtering approach, determined by the indexing method in the config schema file.\n", + "\n", + "## Hybrid Search Using Linear Combination of Scores" + ], + "metadata": { + "id": "sd6AQ5Y3IEaH" + }, + "id": "sd6AQ5Y3IEaH" + }, + { + "cell_type": "code", + "source": [ + "import random\n", + "from pprint import pprint\n", + "\n", + "\n", + "query_with_knn = {\n", + " 'params': input_document,\n", + " \"knn\": [{'field': \"vector\", 'boost': 1.0},\n", + " {'field': \"query\", 'boost': 0.1}]\n", + "}\n", + "\n", + "results = hyperspace_client.search(query_with_knn,\n", + " size=15,\n", + " function_name='similarity_score',\n", + " collection_name=collection_name)\n", + "candidates = results['candidates']\n", + "\n", + "print(f\"Query run time: {results['took_ms']:.2f}ms\")\n", + "pprint(results['similarity'])\n" + ], + "metadata": { + "id": "m6pMqw6NIH_K", + "ExecuteTime": { + "end_time": "2024-01-01T19:35:09.979062300Z", + "start_time": "2024-01-01T19:35:09.889289800Z" + }, + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "3f4422a7-90ec-4a27-c655-b0812cfb4438" + }, + "id": "m6pMqw6NIH_K", + "execution_count": 16, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Query run time: 2.71ms\n", + "[{'document_id': '1107', 'score': 0.5},\n", + " {'document_id': '1304', 'score': 0.5},\n", + " {'document_id': '1437', 'score': 0.5},\n", + " {'document_id': '1607', 'score': 0.5},\n", + " {'document_id': '1769', 'score': 0.5},\n", + " {'document_id': '1882', 'score': 0.5},\n", + " {'document_id': '2151', 'score': 0.5},\n", + " {'document_id': '312', 'score': 0.5},\n", + " {'document_id': '422', 'score': 0.5},\n", + " {'document_id': '459', 'score': 0.5},\n", + " {'document_id': '65', 'score': 0.5},\n", + " {'document_id': '696', 'score': 0.5},\n", + " {'document_id': '822', 'score': 0.5},\n", + " {'document_id': '825', 'score': 0.5},\n", + " {'document_id': '982', 'score': 0.5}]\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "## Hybrid Search Using Hybrid Score Function\n", + "The following function combines a classic score and manual boost with vector score." + ], + "metadata": { + "id": "PwtApSv-Y8yW" + }, + "id": "PwtApSv-Y8yW" + }, + { + "cell_type": "code", + "source": [ + "def hybrid_similarity_score(params, doc):\n", + " score = 0.0\n", + " boost = 1.0\n", + " if match(\"country\"):\n", + " score = 5.0\n", + " boost = 2.0\n", + " if match(\"street\"):\n", + " score = 10.0\n", + " return score + boost * distance(\"vector\")\n", + "\n", + "set_score_function(hybrid_similarity_score, collection_name, score_function_name='hybrid_similarity_score')" + ], + "metadata": { + "id": "GKgiEhC5bKwV" + }, + "id": "GKgiEhC5bKwV", + "execution_count": 22, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "import random\n", + "from pprint import pprint\n", + "\n", + "\n", + "query_with_knn = {\n", + " 'params': input_document\n", + "}\n", + "\n", + "results = hyperspace_client.search(query_with_knn,\n", + " size=15,\n", + " function_name='hybrid_similarity_score',\n", + " collection_name=collection_name)\n", + "candidates = results['candidates']\n", + "\n", + "print(f\"Query run time: {results['took_ms']:.2f}ms\")\n", + "pprint(results['similarity'])\n" + ], + "metadata": { + "id": "Q-jYAsUca2aW", + "outputId": "b4258a21-ff98-45f7-c03f-711e505a982c", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "id": "Q-jYAsUca2aW", + "execution_count": 23, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Query run time: 3.00ms\n", + "[{'document_id': '1107', 'score': 5.0},\n", + " {'document_id': '1304', 'score': 5.0},\n", + " {'document_id': '1437', 'score': 5.0},\n", + " {'document_id': '1607', 'score': 5.0},\n", + " {'document_id': '1769', 'score': 5.0},\n", + " {'document_id': '1882', 'score': 5.0},\n", + " {'document_id': '2151', 'score': 5.0},\n", + " {'document_id': '312', 'score': 5.0},\n", + " {'document_id': '422', 'score': 5.0},\n", + " {'document_id': '459', 'score': 5.0},\n", + " {'document_id': '65', 'score': 5.0},\n", + " {'document_id': '696', 'score': 5.0},\n", + " {'document_id': '822', 'score': 5.0},\n", + " {'document_id': '825', 'score': 5.0},\n", + " {'document_id': '982', 'score': 5.0}]\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "This notebook gave a simple example of the use of the Hyperspace engine for hybrid search. Hyperspace can support signficantly more complicated use cases with large databases, in extremley low latency." + ], + "metadata": { + "id": "HHUPLwHjNE67" + }, + "id": "HHUPLwHjNE67" + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.13" + }, + "colab": { + "provenance": [] } - ] - }, - { - "cell_type": "markdown", - "source": [ - "This notebook gave a simple example of the use of the Hyperspace engine for hybrid search. Hyperspace can support signficantly more complicated use cases with large databases, in extremley low latency." - ], - "metadata": { - "id": "HHUPLwHjNE67" - }, - "id": "HHUPLwHjNE67" - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.13" }, - "colab": { - "provenance": [] - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file