Enhance Data Storage and Filtering by Additional Metadata Dimensions #548
Replies: 4 comments 2 replies
-
In thinking about this more, there really should be more metadata for storage and retrieval thats expandable. There's so much metadata that's possible that you'd want to prefilter. Looks like Pinecone has metadata filtering and I'm assuming there'll eventually be a pinecone implementation for the memory store? |
Beta Was this translation helpful? Give feedback.
-
@shawncal can you take a look at this? |
Beta Was this translation helpful? Give feedback.
-
I found some documentation on pinecone's metadata filtering. It seems useful for all the scenarios I plan on using and it seems powerful: https://docs.pinecone.io/docs/metadata-filtering I haven't used it yet myself, but came across this really interesting article where they talk about how hard this actually is to do this filtering on the db level and how they do it by combining the vector and metadata index. I do believe I would like to use pinecone for this support to quickly trim the data up and provide filtered relevent context in my semantic kernal usage: https://www.pinecone.io/learn/vector-search-filtering/ |
Beta Was this translation helpful? Give feedback.
-
My first reaction is that further query patterns such as filtering and joining, are not part of the Semantic Memory where we deal with unstructured data. On the other hand, one could build more complex scenarios going directly to the storage features, without using SK Memory, or develop bespoke versions of Memory for personal scenarios. For instance Azure Cognitive Search has a very advanced set of features that one could build on. |
Beta Was this translation helpful? Give feedback.
-
I'd like to propose a new feature that expands the current data storage and search capabilities in the repository by introducing additional dimensions for metadata. At present, we can store and retrieve data by collection name and ExternalSourceName for reference data, but I believe there are use cases that would benefit from even more flexible options.
Feature Request:
Add the ability to store and search data by more dimensions, such as TenantID and Permission Level.
Use Cases:
TenantID: This field could represent a customer code, tenant, or any other logical grouping, allowing users to better organize and filter data based on their specific needs. For example, consider a knowledge base containing data from multiple tenants or customers. With TenantID, users could easily filter search results to only display information relevant to a particular tenant.
Permission Level: This field could be used to store and filter data based on the allowed permissions for a particular group, such as admins and non-admins. This would provide better access control and security, ensuring that users only see the information they are authorized to view.
It might be tempting to assume the collection name could be used for this purpose, but there are scenarios where this is not sufficient. For instance, a developer might need to perform a global search across all tenants, which would not be possible if the collection name were used to represent both the tenant and the permission level. By introducing additional dimensions, we would have more flexibility and efficiency for users with diverse requirements.
Ideally, this optional metadata could be passed somehow as part of the query to be filtered on the datastore end then to be more efficient.
In summary, I believe that extending the data storage and search functionality by adding the ability to filter by TenantID and Permission Level would greatly improve the user experience and enable a wider range of use cases. Let me know your thoughts on this proposal and if there are any additional considerations we should take into account.
Beta Was this translation helpful? Give feedback.
All reactions