[Bug]: search vectors doesnt work with partition key #2331

nightblure · 2024-11-06T12:56:01Z

Is there an existing issue for this?

I have searched the existing issues

Describe the bug

hi!

I want to inform you that the search with the partition key does not work

Let's first look at the documentation here, what's notable is that you don't call the load collection or load partition method into memory. Apparently it is assumed that the presence of a partitioning key will cause the partition to be loaded from the filter expression automatically inside Milvus, but it doesnt work, because Milvus throw the errors: <MilvusException: (code=65535, message=not support manually specifying the partition names if partition key mode is used)>, <MilvusException: (code=65535, message=disable load partitions if partition key mode is used)>

Workaround

For example, here is my collection, which has a partition key - category field:

I made my code by analogy with the code in the documentation: this means that I call the search method without calling the load_collection or load_partitions, while in the filter expression I have a filter on the field which is the partition key. In this case Milvus throw the next error: <MilvusException: (code=101, message=failed to search: collection not loaded[collection=452934647721745216])>

Ok... if milvus asks to load a collection, then knowing that the collection has a partition key, I don’t want to load the entire collection, but only the required partition. Hence I will call the load_partitions method, but Milvus throw error <MilvusException: (code=65535, message=not support manually specifying the partition names if partition key mode is used)>. In this situation Milvus literally leaves no choice and forces me to unload the entire collection into memory

I have two questions:

Is the example code in the documentation wrong?
How can I load only the required partition into memory if I have a partitioning key?

I make a full code example where I demonstrated cases of how I tried to do a search by partition

Expected Behavior

I see two ways to fix the situation:

I want to be able to load partitions using the load_partitions method in the current configuration (when we have a partition key in the collection);
Milvus automatically loads partitions into memory if we pass the corresponding field names in the filter expression

Steps/Code To Reproduce behavior

import logging
from random import Random

from pymilvus import MilvusClient, DataType


def create_milvus_client(uri: str, token: str) -> MilvusClient:
    return MilvusClient(uri=uri, token=token)


class BaseMilvusCollection:
    db_name: str = 'default'
    collection_name = None

    def __init__(self, *, client: MilvusClient):
        self.client = client

    def get_schema(self):
        raise NotImplementedError

    def get_index_params(self):
        raise NotImplementedError

    def create(self):
        schema = self.get_schema()
        index_params = self.get_index_params()

        self.client.create_collection(
            schema=schema,
            index_params=index_params,
            collection_name=self.collection_name,
        )


class Detections(BaseMilvusCollection):
    collection_name = 'test_collection'

    def get_schema(self):
        # https://milvus.io/docs/use-partition-key.md
        schema = self.client.create_schema(
            partition_key_field='category',
        )
        schema.add_field(
            field_name='id',
            datatype=DataType.INT64,
            auto_id=True,
            is_primary=True,
        )

        schema.add_field(
            field_name='category',
            datatype=DataType.VARCHAR,
            max_length=200,
        )

        schema.add_field(field_name='metadata', datatype=DataType.JSON)
        schema.add_field(field_name='detection_pk', datatype=DataType.INT64)
        schema.add_field(field_name='vector', datatype=DataType.FLOAT_VECTOR, dim=64)
        return schema

    def get_index_params(self):
        index_params = self.client.prepare_index_params()

        index_params.add_index(
            field_name='vector',
            index_type='IVF_FLAT',
            metric_type='COSINE',
            params={'nlist': 128},
        )

        return index_params


class BaseMilvusRepository:
    collection_name = None

    def __init__(self, *, client: MilvusClient):
        if self.collection_name is None:
            msg = 'Collection name not specified!'
            raise Exception(msg)

        self.logger = logging.getLogger(__name__)
        self.client = client

    def release_collection(self) -> None:
        self.client.release_collection(collection_name=self.collection_name)

    def insert(self, data: list[dict]) -> int:
        return self.client.insert(collection_name=self.collection_name, data=data)['insert_count']

    def load_partitions(self, partitions: list[str]):
        self.client.load_partitions(
            collection_name=self.collection_name, partition_names=partitions
        )


class MilvusDetections(BaseMilvusRepository):
    collection_name = 'test_collection'

    def search_vectors(
            self,
            *,
            vectors: list[list[float]],
            filter_expression: str,
            limit: int,
            partition_names: list[str] | None = None,
    ):
        search_params = {
            'metric_type': 'COSINE',
            'ignore_growing': False,
            'offset': 0,
            'params': {'nprobe': 128},
        }

        search_result = self.client.search(
            limit=limit,
            data=vectors,
            embedding_field='vector',
            search_params=search_params,
            partition_names=partition_names,
            filter_expression=filter_expression,
            collection_name=self.collection_name,
        )

        return search_result


def case_collection_not_loaded_by_docs_example(
        repo: MilvusDetections, filter_expression: str, embedding: list[float]
):
    # https://milvus.io/docs/use-partition-key.md#Use-Partition-Key
    return repo.search_vectors(
        filter_expression=filter_expression, vectors=[embedding], limit=100
    )


def case_with_error_on_load_partition(repo: MilvusDetections, filter_expression: str, embedding: list[float]):
    repo.load_partitions(['dairy'])
    return repo.search_vectors(
        filter_expression=filter_expression, vectors=[embedding], limit=100
    )


def case_with_error_search_with_partition(
        repo: MilvusDetections, filter_expression: str, embedding: list[float]
):
    return repo.search_vectors(
        filter_expression=filter_expression, vectors=[embedding], limit=100, partition_names=['dairy']
    )


def main():
    MILVUS_URI = ''
    MILVUS_TOKEN = ''
    client = create_milvus_client(uri=MILVUS_URI, token=MILVUS_TOKEN)

    test_collection = Detections(client=client)
    repo = MilvusDetections(client=client)

    test_collection.create()

    rnd = Random()
    dimension = 64
    embedding = [rnd.random() for _ in range(dimension)]
    filter_expression = f"category in ['dairy']"

    item = {
        'category': 'dairy',
        'vector': embedding,
        'metadata': {},
        'detection_pk': 5,
    }

    assert repo.insert([item]) > 0
    repo.release_collection()

    # throw <MilvusException: (code=101, message=failed to search: collection not loaded[collection=452934647721745216])>
    result = case_collection_not_loaded_by_docs_example(repo, filter_expression, embedding)

    # throw <MilvusException: (code=65535, message=disable load partitions if partition key mode is used)>
    # result = case_with_error_on_load_partition(repo, filter_expression, embedding)

    # throw <MilvusException: (code=65535, message=not support manually specifying the partition names if partition key mode is used)>
    # result = case_with_error_search_with_partition(repo, filter_expression, embedding)

    print(result)


if __name__ == '__main__':
    main()

Environment details

- Pymilvus client version: v2.4.9
- Milvus configuration: Zilliz Cloud Cluster
- Cluster Compatible With: Milvus 2.4.x
- Cluster CU Type: Performance-optimized
- Cluster Plan: Dedicated (Enterprise)

Anything else?

No response

The text was updated successfully, but these errors were encountered:

XuanYang-cn · 2024-11-15T04:08:48Z

@nightblure Hi,

Is the example code in the documentation wrong?

The code is correct, we will load the collection into memory automatically if you create collection with schema and index together. You can refer to the https://milvus.io/docs/manage-collections.md#Create-Collection, Custmized setup - step3

How can I load only the required partition into memory if I have a partitioning key?

Enable Mmap might be your answer. We introduced partition key to support millions of partition keys in Milvus, it's really impossible to managa all those partition keys. And to avoid mix-usage of partition and partitionkey, we banned each other when whichever one is enabled. So when you enabled partitionkey, you gained the possiblity to speed up search by filtering on millions of partitionkey, but lose the control over partitions entirely.

Since you're a user of ZillizCloud, it's recommanded to open a ticket on ZillizCloud and describe your requirements. We values our customs needs, and perhaps features like "LazyLoad" could be on our roadmaps.

XuanYang-cn · 2024-11-15T07:15:40Z

@nightblure I check the recent release of Milvus, The partial load feature might meet some of your requirements. It'll save lots of the memory. https://milvus.io/docs/release_notes.md#v2410

nightblure added the kind/bug Something isn't working label Nov 6, 2024

nightblure changed the title ~~[Bug]: partitions doesnt work~~ [Bug]: search vectors doesnt work with partition key Nov 6, 2024

XuanYang-cn added kind/feature and removed kind/bug Something isn't working labels Nov 15, 2024

drawnwren mentioned this issue Nov 21, 2024

[Bug]: Partition Key doesn't appear to function in pymilvus? #2368

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: search vectors doesnt work with partition key #2331

[Bug]: search vectors doesnt work with partition key #2331

nightblure commented Nov 6, 2024 •

edited

Loading

XuanYang-cn commented Nov 15, 2024 •

edited

Loading

XuanYang-cn commented Nov 15, 2024

[Bug]: search vectors doesnt work with partition key #2331

[Bug]: search vectors doesnt work with partition key #2331

Comments

nightblure commented Nov 6, 2024 • edited Loading

Is there an existing issue for this?

Describe the bug

Expected Behavior

Steps/Code To Reproduce behavior

Environment details

Anything else?

XuanYang-cn commented Nov 15, 2024 • edited Loading

XuanYang-cn commented Nov 15, 2024

nightblure commented Nov 6, 2024 •

edited

Loading

XuanYang-cn commented Nov 15, 2024 •

edited

Loading