Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.Net: Add PostgresVectorStore Memory connector. #9324

Open
wants to merge 73 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
8778d5f
Add PostgresVectorStore Memory connector.
Oct 18, 2024
ddad99a
Add UpsertBatch, GetBatch, and DeleteBatch
Oct 18, 2024
5447815
Remove unused CreateMapping
Oct 18, 2024
7533f8c
Merge remote-tracking branch 'upstream/main' into feature/postgres-ve…
Oct 18, 2024
9a4f836
Merge remote-tracking branch 'upstream/main' into feature/postgres-ve…
Oct 21, 2024
68a000e
Add vector search to PostgresVectorStore
Oct 22, 2024
317f6af
create index on collection creation
Oct 23, 2024
f4f5ba2
Support Guid, test distance functions
Oct 23, 2024
2acf118
Format tests
Oct 23, 2024
5db2c59
Merge remote-tracking branch 'upstream/main' into feature/postgres-ve…
Oct 23, 2024
f4b4dc5
Add service and kernel extensions
Oct 24, 2024
5c58400
Default to Euclidean distance if no distance function is specified
Oct 24, 2024
8ea21cd
Add Postgres sample to concepts
Oct 24, 2024
4dcd222
Add docs for setting configuration in samples\Concepts
Oct 24, 2024
74b3764
Merge remote-tracking branch 'upstream/main' into feature/postgres-ve…
Oct 24, 2024
5c3e63f
Enforce dimension size in index creation
Oct 24, 2024
6d9f1fd
Create index for CreateTableIfNotExistsAsyc
Oct 24, 2024
b4266cc
Log warning when index not created due to dimensions
Oct 24, 2024
f86613a
Refactor and tests; make SqlBuilder internal
Oct 24, 2024
716b794
Merge remote-tracking branch 'upstream/main' into feature/postgres-ve…
Oct 24, 2024
8d8283b
Remove old migration note
Oct 25, 2024
89027fc
Fix docstring
Oct 25, 2024
8f45d9c
Use parameter for tableName
Oct 25, 2024
48811bd
Fix support for DateTime, DateTimeOffset
Oct 25, 2024
1d6082d
Fix warnings in test
Oct 25, 2024
eb0a683
Remove kernel extensions, improve service extensions
Oct 25, 2024
a66d835
Make PostgresSqlCommandInfo internal
Oct 25, 2024
53f1009
Default to a Hnsw index
Oct 25, 2024
08ea55f
Default to cosine distance
Oct 25, 2024
319648b
Consistently use includeVectors
Oct 25, 2024
5b52bdc
Simplify AsyncEnumerable return
Oct 25, 2024
cd845ee
Pass properties instead of full definition
Oct 25, 2024
1d09a21
Throw instead of log for too high dimensionality
Oct 25, 2024
74e9757
Remove DefaultVectorSize
Oct 25, 2024
ad5628c
Remove unused using statements
Oct 25, 2024
dbf1aef
Remove VectorStore constructor that creates datsaource
Oct 25, 2024
a355bf7
Fix duplicate mapper call
Oct 25, 2024
e499a80
Fix docstring typo
Oct 25, 2024
c95e2b3
Comment clarifying that multiple keys should be previously validated
Oct 25, 2024
9d972b3
Refactor ExecuteNonQueryAsync calls to reduce code dupe
Oct 25, 2024
6eb3793
Forward Schema option.
Oct 25, 2024
ed59fed
Make PostgresVectorStoreDbClient internal
Oct 25, 2024
1749adb
Support more enumerable types
Oct 25, 2024
ea7b01c
Merge remote-tracking branch 'upstream/main' into feature/postgres-ve…
Oct 25, 2024
86486d7
Refactor to support default + transactions
Oct 28, 2024
b9b4a44
Fix issue with converting readonly array on upsert
Oct 28, 2024
c53a8ee
Merge remote-tracking branch 'upstream/main' into feature/postgres-ve…
Oct 28, 2024
97ef60a
Fix SLN merge error
Oct 28, 2024
81e1805
Improve error handling
Oct 28, 2024
a587260
Avoid CA1859 in test class
Oct 28, 2024
e8fe800
Account for ngpsql missing func in .net std 2.0
Oct 28, 2024
96c088e
Fix servicecollection tests
Oct 28, 2024
0fc76f6
Logic for dimension max moved and tested elsewhere
Oct 28, 2024
266310b
Remove unused using statement
Oct 28, 2024
08f110c
Remove logger from PostgresVectorStoreRecordCollection
Oct 29, 2024
26516c5
Merge branch 'main' into feature/postgres-vector-store-dotnet
lossyrob Oct 30, 2024
5b44a80
Use Flat instead of None index kind
Oct 30, 2024
b9b2487
Merge remote-tracking branch 'upstream/main' into feature/postgres-ve…
Oct 31, 2024
24577a0
Remove unnecessary overloads
Oct 31, 2024
60d6512
Change tests to be true to name
Oct 31, 2024
5a66a13
Remove reduntant key type based test
Oct 31, 2024
581b6ab
Remove unnecessary overloads
Oct 31, 2024
494a0d4
Better error handling for IAsyncEnumerable
Oct 31, 2024
5f19889
Default to Flat (no index) instead of Hnsw
Oct 31, 2024
62ac8eb
Add enumerable to record mapper test
Oct 31, 2024
364b592
Remove unused fixture properties
Oct 31, 2024
bf58cab
Test StoragePropertyName in sql builder tests
Oct 31, 2024
aa592de
Remove dynamic from integration test
Nov 1, 2024
9a3b216
Add test to read from manually inserted record
Nov 1, 2024
1ee09c1
Merge remote-tracking branch 'upstream/main' into feature/postgres-ve…
Nov 1, 2024
b037075
Formatting, spelling
Nov 1, 2024
29d91ba
Merge remote-tracking branch 'upstream/main' into feature/postgres-ve…
Nov 1, 2024
c2937e0
Fix test.
Nov 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,11 @@ namespace Microsoft.SemanticKernel.Connectors.Postgres;
/// <summary>
/// Interface for client managing postgres database operations.
/// </summary>
/// <remarks>
/// This interface is used with the PostgresMemoryStore, which is being deprecated.
/// Use the <see cref="IPostgresVectorStoreDbClient"/> interface with the PostgresVectorStore
/// and related classes instead.
/// </remarks>
public interface IPostgresDbClient
Comment on lines +14 to 19
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IPostgresVectorStoreDbClient is internal, but this interface and documentation is public. Instead of this XML documentation we can mark this interface as Obsolete and recommend using new PostgresVectorStore.

{
/// <summary>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
// Copyright (c) Microsoft. All rights reserved.

using System.Collections.Generic;
using Microsoft.Extensions.VectorData;
using Pgvector;

namespace Microsoft.SemanticKernel.Connectors.Postgres;

/// <summary>
/// Interface for constructing SQL commands for Postgres vector store collections.
/// </summary>
public interface IPostgresVectorStoreCollectionSqlBuilder
{
/// <summary>
/// Builds a SQL command to check if a table exists in the Postgres vector store.
/// </summary>
/// <param name="schema">The schema of the table.</param>
/// <param name="tableName">The name of the table.</param>
/// <returns>The built SQL command.</returns>
/// <remarks>
/// The command must return a single row with a single column named "table_name" if the table exists.
/// </remarks>
PostgresSqlCommandInfo BuildDoesTableExistCommand(string schema, string tableName);

/// <summary>
/// Builds a SQL command to fetch all tables in the Postgres vector store.
/// </summary>
/// <param name="schema">The schema of the tables.</param>
PostgresSqlCommandInfo BuildGetTablesCommand(string schema);

/// <summary>
/// Builds a SQL command to create a table in the Postgres vector store.
/// </summary>
/// <param name="schema">The schema of the table.</param>
/// <param name="tableName">The name of the table.</param>
/// <param name="properties">The properties of the table.</param>
/// <param name="ifNotExists">Specifies whether to include IF NOT EXISTS in the command.</param>
/// <returns>The built SQL command info.</returns>
PostgresSqlCommandInfo BuildCreateTableCommand(string schema, string tableName, IReadOnlyList<VectorStoreRecordProperty> properties, bool ifNotExists = true);

/// <summary>
/// Builds a SQL command to drop a table in the Postgres vector store.
/// </summary>
/// <param name="schema">The schema of the table.</param>
/// <param name="tableName">The name of the table.</param>
/// <returns>The built SQL command info.</returns>
PostgresSqlCommandInfo BuildDropTableCommand(string schema, string tableName);

/// <summary>
/// Builds a SQL command to upsert a record in the Postgres vector store.
/// </summary>
/// <param name="schema">The schema of the table.</param>
/// <param name="tableName">The name of the table.</param>
/// <param name="keyColumn">The key column of the table.</param>
/// <param name="row">The row to upsert.</param>
/// <returns>The built SQL command info.</returns>
PostgresSqlCommandInfo BuildUpsertCommand(string schema, string tableName, string keyColumn, Dictionary<string, object?> row);

/// <summary>
/// Builds a SQL command to upsert a batch of records in the Postgres vector store.
/// </summary>
/// <param name="schema">The schema of the table.</param>
/// <param name="tableName">The name of the table.</param>
/// <param name="keyColumn">The key column of the table.</param>
/// <param name="rows">The rows to upsert.</param>
/// <returns>The built SQL command info.</returns>
PostgresSqlCommandInfo BuildUpsertBatchCommand(string schema, string tableName, string keyColumn, List<Dictionary<string, object?>> rows);

/// <summary>
/// Builds a SQL command to get a record from the Postgres vector store.
/// </summary>
/// <param name="schema">The schema of the table.</param>
/// <param name="tableName">The name of the table.</param>
/// <param name="properties">The properties of the table.</param>
/// <param name="key">The key of the record to get.</param>
/// <param name="includeVectors">Specifies whether to include vectors in the record.</param>
/// <returns>The built SQL command info.</returns>
PostgresSqlCommandInfo BuildGetCommand<TKey>(string schema, string tableName, IReadOnlyList<VectorStoreRecordProperty> properties, TKey key, bool includeVectors = false) where TKey : notnull;

/// <summary>
/// Builds a SQL command to get a batch of records from the Postgres vector store.
/// </summary>
/// <param name="schema">The schema of the table.</param>
/// <param name="tableName">The name of the table.</param>
/// <param name="properties">The properties of the table.</param>
/// <param name="keys">The keys of the records to get.</param>
/// <param name="includeVectors">Specifies whether to include vectors in the records.</param>
/// <returns>The built SQL command info.</returns>
PostgresSqlCommandInfo BuildGetBatchCommand<TKey>(string schema, string tableName, IReadOnlyList<VectorStoreRecordProperty> properties, List<TKey> keys, bool includeVectors = false) where TKey : notnull;

/// <summary>
/// Builds a SQL command to delete a record from the Postgres vector store.
/// </summary>
/// <param name="schema">The schema of the table.</param>
/// <param name="tableName">The name of the table.</param>
/// <param name="keyColumn">The key column of the table.</param>
/// <param name="key">The key of the record to delete.</param>
/// <returns>The built SQL command info.</returns>
PostgresSqlCommandInfo BuildDeleteCommand<TKey>(string schema, string tableName, string keyColumn, TKey key);

/// <summary>
/// Builds a SQL command to delete a batch of records from the Postgres vector store.
/// </summary>
/// <param name="schema">The schema of the table.</param>
/// <param name="tableName">The name of the table.</param>
/// <param name="keyColumn">The key column of the table.</param>
/// <param name="keys">The keys of the records to delete.</param>
/// <returns>The built SQL command info.</returns>
PostgresSqlCommandInfo BuildDeleteBatchCommand<TKey>(string schema, string tableName, string keyColumn, List<TKey> keys);

/// <summary>
/// Builds a SQL command to get the nearest match from the Postgres vector store.
/// </summary>
/// <param name="schema">The schema of the table.</param>
/// <param name="tableName">The name of the table.</param>
/// <param name="properties">The properties of the table.</param>
/// <param name="vectorProperty">The property which the vectors to compare are stored in.</param>
/// <param name="vectorValue">The vector to match.</param>
/// <param name="filter">The filter conditions for the query.</param>
/// <param name="skip">The number of records to skip.</param>
/// <param name="withEmbeddings">Specifies whether to include embeddings in the result.</param>
/// <param name="limit">The maximum number of records to return.</param>
/// <returns>The built SQL command info.</returns>
PostgresSqlCommandInfo BuildGetNearestMatchCommand(string schema, string tableName, IReadOnlyList<VectorStoreRecordProperty> properties, VectorStoreRecordVectorProperty vectorProperty, Vector vectorValue, VectorSearchFilter? filter, int? skip, bool withEmbeddings, int limit);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: other methods are using includeVectors instead of withEmbeddings for naming.

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
// Copyright (c) Microsoft. All rights reserved.

using System.Collections.Generic;
using System.Threading;
using System.Threading.Tasks;
using Microsoft.Extensions.VectorData;
using Pgvector;

namespace Microsoft.SemanticKernel.Connectors.Postgres;

/// <summary>
/// Internal interface for client managing postgres database operations.
/// </summary>
public interface IPostgresVectorStoreDbClient
{
/// <summary>
/// Check if a table exists.
/// </summary>
/// <param name="tableName">The name assigned to a table of entries.</param>
/// <param name="cancellationToken">The <see cref="CancellationToken"/> to monitor for cancellation requests. The default is <see cref="CancellationToken.None"/>.</param>
/// <returns></returns>
Task<bool> DoesTableExistsAsync(string tableName, CancellationToken cancellationToken = default);

/// <summary>
/// Get all tables.
/// </summary>
/// <param name="cancellationToken">The <see cref="CancellationToken"/> to monitor for cancellation requests. The default is <see cref="CancellationToken.None"/>.</param>
/// <returns>A group of tables.</returns>
IAsyncEnumerable<string> GetTablesAsync(CancellationToken cancellationToken = default);
/// <summary>
/// Create a table.
/// </summary>
/// <param name="tableName">The name assigned to a table of entries.</param>
/// <param name="recordDefinition">The record definition of the table.</param>
/// <param name="ifNotExists">Specifies whether to include IF NOT EXISTS in the command.</param>
/// <param name="cancellationToken">The <see cref="CancellationToken"/> to monitor for cancellation requests. The default is <see cref="CancellationToken.None"/>.</param>
/// <returns></returns>
Task CreateTableAsync(string tableName, VectorStoreRecordDefinition recordDefinition, bool ifNotExists = true, CancellationToken cancellationToken = default);

/// <summary>
/// Drop a table.
/// </summary>
/// <param name="tableName">The name assigned to a table of entries.</param>
/// <param name="cancellationToken">The <see cref="CancellationToken"/> to monitor for cancellation requests. The default is <see cref="CancellationToken.None"/>.</param>
Task DeleteTableAsync(string tableName, CancellationToken cancellationToken = default);

/// <summary>
/// Upsert entry into a table.
/// </summary>
/// <param name="tableName">The name assigned to a table of entries.</param>
/// <param name="row">The row to upsert into the table.</param>
/// <param name="keyColumn">The key column of the table.</param>
/// <param name="cancellationToken">The <see cref="CancellationToken"/> to monitor for cancellation requests. The default is <see cref="CancellationToken.None"/>.</param>
/// <returns></returns>
Task UpsertAsync(string tableName, Dictionary<string, object?> row, string keyColumn, CancellationToken cancellationToken = default);

/// <summary>
/// Upsert multiple entries into a table.
/// </summary>
/// <param name="tableName">The name assigned to a table of entries.</param>
/// <param name="rows">The rows to upsert into the table.</param>
/// <param name="keyColumn">The key column of the table.</param>
/// <param name="cancellationToken">The <see cref="CancellationToken"/> to monitor for cancellation requests. The default is <see cref="CancellationToken.None"/>.</param>
/// <returns></returns>
Task UpsertBatchAsync(string tableName, IEnumerable<Dictionary<string, object?>> rows, string keyColumn, CancellationToken cancellationToken = default);

/// <summary>
/// Get a entry by its key.
/// </summary>
/// <param name="tableName">The name assigned to a table of entries.</param>
/// <param name="key">The key of the entry to get.</param>
/// <param name="properties">The properties to include in the entry.</param>
/// <param name="includeVectors">If true, the vectors will be included in the entry.</param>
/// <param name="cancellationToken">The <see cref="CancellationToken"/> to monitor for cancellation requests. The default is <see cref="CancellationToken.None"/>.</param>
/// <returns>The row if the key is found, otherwise null.</returns>
Task<Dictionary<string, object?>?> GetAsync<TKey>(string tableName, TKey key, IReadOnlyList<VectorStoreRecordProperty> properties, bool includeVectors = false, CancellationToken cancellationToken = default)
where TKey : notnull;

/// <summary>
/// Get multiple entries by their keys.
/// </summary>
/// <param name="tableName">The name assigned to a table of entries.</param>
/// <param name="keys">The keys of the entries to get.</param>
/// <param name="properties">The properties of the table.</param>
/// <param name="includeVectors">If true, the vectors will be included in the entries.</param>
/// <param name="cancellationToken">The <see cref="CancellationToken"/> to monitor for cancellation requests. The default is <see cref="CancellationToken.None"/>.</param>
/// <returns>The rows that match the given keys.</returns>
IAsyncEnumerable<Dictionary<string, object?>> GetBatchAsync<TKey>(string tableName, IEnumerable<TKey> keys, IReadOnlyList<VectorStoreRecordProperty> properties, bool includeVectors = false, CancellationToken cancellationToken = default)
where TKey : notnull;

/// <summary>
/// Delete a entry by its key.
/// </summary>
/// <param name="tableName">The name assigned to a table of entries.</param>
/// <param name="keyColumn">The name of the key column.</param>
/// <param name="key">The key of the entry to delete.</param>
/// <param name="cancellationToken">The <see cref="CancellationToken"/> to monitor for cancellation requests. The default is <see cref="CancellationToken.None"/>.</param>
/// <returns></returns>
Task DeleteAsync<TKey>(string tableName, string keyColumn, TKey key, CancellationToken cancellationToken = default);

/// <summary>
/// Delete multiple entries by their keys.
/// </summary>
/// <param name="tableName">The name assigned to a table of entries.</param>
/// <param name="keyColumn">The name of the key column.</param>
/// <param name="keys">The keys of the entries to delete.</param>
/// <param name="cancellationToken">The <see cref="CancellationToken"/> to monitor for cancellation requests. The default is <see cref="CancellationToken.None"/>.</param>
/// <returns></returns>
Task DeleteBatchAsync<TKey>(string tableName, string keyColumn, IEnumerable<TKey> keys, CancellationToken cancellationToken = default);

/// <summary>
/// Gets the nearest matches to the <see cref="Vector"/>.
/// </summary>
/// <param name="tableName">The name assigned to a table of entries.</param>
/// <param name="properties">The properties to retrieve.</param>
/// <param name="vectorProperty">The property which the vectors to compare are stored in.</param>
/// <param name="vectorValue">The <see cref="Vector"/> to compare the table's vector with.</param>
/// <param name="limit">The maximum number of similarity results to return.</param>
/// <param name="filter">Optional conditions to filter the results.</param>
/// <param name="skip">The number of entries to skip.</param>
/// <param name="includeVectors">If true, the vectors will be returned in the entries.</param>
/// <param name="cancellationToken">The <see cref="CancellationToken"/> to monitor for cancellation requests. The default is <see cref="CancellationToken.None"/>.</param>
/// <returns>An asynchronous stream of <see cref="PostgresMemoryEntry"/> objects that the nearest matches to the <see cref="Vector"/>.</returns>
IAsyncEnumerable<(Dictionary<string, object?> Row, double Distance)> GetNearestMatchesAsync(string tableName, IReadOnlyList<VectorStoreRecordProperty> properties, VectorStoreRecordVectorProperty vectorProperty, Vector vectorValue, int limit,
VectorSearchFilter? filter = default, int? skip = default, bool includeVectors = false, CancellationToken cancellationToken = default);

// /// <summary>
// /// Read a entry by its key.
// /// </summary>
// /// <param name="tableName">The name assigned to a table of entries.</param>
// /// <param name="key">The key of the entry to read.</param>
// /// <param name="withEmbeddings">If true, the embeddings will be returned in the entry.</param>
// /// <param name="cancellationToken">The <see cref="CancellationToken"/> to monitor for cancellation requests. The default is <see cref="CancellationToken.None"/>.</param>
// /// <returns></returns>
// Task<PostgresMemoryEntry?> ReadAsync(string tableName, string key, bool withEmbeddings = false, CancellationToken cancellationToken = default);

// /// <summary>
// /// Read multiple entries by their keys.
// /// </summary>
// /// <param name="tableName">The name assigned to a table of entries.</param>
// /// <param name="keys">The keys of the entries to read.</param>
// /// <param name="withEmbeddings">If true, the embeddings will be returned in the entries.</param>
// /// <param name="cancellationToken">The <see cref="CancellationToken"/> to monitor for cancellation requests. The default is <see cref="CancellationToken.None"/>.</param>
// /// <returns>An asynchronous stream of <see cref="PostgresMemoryEntry"/> objects that match the given keys.</returns>
// IAsyncEnumerable<PostgresMemoryEntry> ReadBatchAsync(string tableName, IEnumerable<string> keys, bool withEmbeddings = false, CancellationToken cancellationToken = default);

// /// <summary>
// /// Delete a entry by its key.
// /// </summary>
// /// <param name="tableName">The name assigned to a table of entries.</param>
// /// <param name="key">The key of the entry to delete.</param>
// /// <param name="cancellationToken">The <see cref="CancellationToken"/> to monitor for cancellation requests. The default is <see cref="CancellationToken.None"/>.</param>
// /// <returns></returns>
// Task DeleteAsync(string tableName, string key, CancellationToken cancellationToken = default);

// /// <summary>
// /// Delete multiple entries by their key.
// /// </summary>
// /// <param name="tableName">The name assigned to a table of entries.</param>
// /// <param name="keys">The keys of the entries to delete.</param>
// /// <param name="cancellationToken">The <see cref="CancellationToken"/> to monitor for cancellation requests. The default is <see cref="CancellationToken.None"/>.</param>
// /// <returns></returns>
// Task DeleteBatchAsync(string tableName, IEnumerable<string> keys, CancellationToken cancellationToken = default);
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
// Copyright (c) Microsoft. All rights reserved.

using Microsoft.Extensions.VectorData;

namespace Microsoft.SemanticKernel.Connectors.Postgres;

/// <summary>
/// Interface for constructing <see cref="IVectorStoreRecordCollection{TKey, TRecord}"/> Postgres instances when using <see cref="IVectorStore"/> to retrieve these.
/// </summary>
public interface IPostgresVectorStoreRecordCollectionFactory
{
/// <summary>
/// Constructs a new instance of the <see cref="IVectorStoreRecordCollection{TKey, TRecord}"/>.
/// </summary>
/// <typeparam name="TKey">The data type of the record key.</typeparam>
/// <typeparam name="TRecord">The data model to use for adding, updating and retrieving data from storage.</typeparam>
/// <param name="client">The Postgres client.</param>
/// <param name="name">The name of the collection to connect to.</param>
/// <param name="vectorStoreRecordDefinition">An optional record definition that defines the schema of the record type. If not present, attributes on <typeparamref name="TRecord"/> will be used.</param>
/// <returns>The new instance of <see cref="IVectorStoreRecordCollection{TKey, TRecord}"/>.</returns>
IVectorStoreRecordCollection<TKey, TRecord> CreateVectorStoreRecordCollection<TKey, TRecord>(IPostgresVectorStoreDbClient client, string name, VectorStoreRecordDefinition? vectorStoreRecordDefinition)
where TKey : notnull;
}
Loading
Loading