Skip to content

Commit 25ae014

Browse files
authored
Merge branch 'main' into drop-table
2 parents 8a4f8d6 + 12a11d7 commit 25ae014

File tree

2 files changed

+330
-1
lines changed

2 files changed

+330
-1
lines changed

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ pip install -U langchain-postgres
2828
> [!WARNING]
2929
> In v0.0.14+, `PGVector` is deprecated. Please migrate to `PGVectorStore`
3030
> for improved performance and manageability.
31-
> See the [migration guide](https://github.com/langchain-ai/langchain-postgres/blob/main/examples/migrate_pgvector_to_pgvectorstore.md) for details on how to migrate from `PGVector` to `PGVectorStore`.
31+
> See the [migration guide](https://github.com/langchain-ai/langchain-postgres/blob/main/examples/migrate_pgvector_to_pgvectorstore.ipynb) for details on how to migrate from `PGVector` to `PGVectorStore`.
3232
3333
### Documentation
3434

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,329 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Migrate a `PGVector` vector store to `PGVectorStore`\n",
8+
"\n",
9+
"This guide shows how to migrate from the [`PGVector`](https://github.com/langchain-ai/langchain-postgres/blob/main/langchain_postgres/vectorstores.py) vector store class to the [`PGVectorStore`](https://github.com/langchain-ai/langchain-postgres/blob/main/langchain_postgres/vectorstore.py) class.\n",
10+
"\n",
11+
"## Why migrate?\n",
12+
"\n",
13+
"This guide explains how to migrate your vector data from a PGVector-style database (two tables) to an PGVectoStore-style database (one table per collection) for improved performance and manageability.\n",
14+
"\n",
15+
"Migrating to the PGVectorStore interface provides the following benefits:\n",
16+
"\n",
17+
"- **Simplified management**: A single table contains data corresponding to a single collection, making it easier to query, update, and maintain.\n",
18+
"- **Improved metadata handling**: It stores metadata in columns instead of JSON, resulting in significant performance improvements.\n",
19+
"- **Schema flexibility**: The interface allows users to add tables into any database schema.\n",
20+
"- **Improved performance**: The single-table schema can lead to faster query execution, especially for large collections.\n",
21+
"- **Clear separation**: Clearly separate table and extension creation, allowing for distinct permissions and streamlined workflows.\n",
22+
"- **Secure Connections:** The PGVectorStore interface creates a secure connection pool that can be easily shared across your application using the `engine` object.\n",
23+
"\n",
24+
"## Migration process\n",
25+
"\n",
26+
"> **_NOTE:_** The langchain-core library is installed to use the Fake embeddings service. To use a different embedding service, you'll need to install the appropriate library for your chosen provider. Choose embeddings services from [LangChain's Embedding models](https://python.langchain.com/v0.2/docs/integrations/text_embedding/)."
27+
]
28+
},
29+
{
30+
"cell_type": "markdown",
31+
"metadata": {
32+
"id": "IR54BmgvdHT_"
33+
},
34+
"source": [
35+
"### Library Installation\n",
36+
"Install the integration library, `langchain-postgres`."
37+
]
38+
},
39+
{
40+
"cell_type": "code",
41+
"execution_count": null,
42+
"metadata": {
43+
"colab": {
44+
"base_uri": "https://localhost:8080/",
45+
"height": 1000
46+
},
47+
"id": "0ZITIDE160OD",
48+
"outputId": "e184bc0d-6541-4e0a-82d2-1e216db00a2d"
49+
},
50+
"outputs": [],
51+
"source": [
52+
"%pip install --upgrade --quiet langchain-postgres langchain-core SQLAlchemy"
53+
]
54+
},
55+
{
56+
"cell_type": "markdown",
57+
"id": "f8f2830ee9ca1e01",
58+
"metadata": {
59+
"id": "f8f2830ee9ca1e01"
60+
},
61+
"source": [
62+
"## Data Migration"
63+
]
64+
},
65+
{
66+
"cell_type": "markdown",
67+
"id": "OMvzMWRrR6n7",
68+
"metadata": {
69+
"id": "OMvzMWRrR6n7"
70+
},
71+
"source": [
72+
"### Set the postgres connection url\n",
73+
"\n",
74+
"`PGVectorStore` can be used with the `asyncpg` and `psycopg3` drivers."
75+
]
76+
},
77+
{
78+
"cell_type": "code",
79+
"execution_count": null,
80+
"id": "irl7eMFnSPZr",
81+
"metadata": {
82+
"id": "irl7eMFnSPZr"
83+
},
84+
"outputs": [],
85+
"source": [
86+
"# @title Set Your Values Here { display-mode: \"form\" }\n",
87+
"POSTGRES_USER = \"langchain\" # @param {type: \"string\"}\n",
88+
"POSTGRES_PASSWORD = \"langchain\" # @param {type: \"string\"}\n",
89+
"POSTGRES_HOST = \"localhost\" # @param {type: \"string\"}\n",
90+
"POSTGRES_PORT = \"6024\" # @param {type: \"string\"}\n",
91+
"POSTGRES_DB = \"langchain\" # @param {type: \"string\"}"
92+
]
93+
},
94+
{
95+
"cell_type": "markdown",
96+
"id": "QuQigs4UoFQ2",
97+
"metadata": {
98+
"id": "QuQigs4UoFQ2"
99+
},
100+
"source": [
101+
"### PGEngine Connection Pool\n",
102+
"\n",
103+
"One of the requirements and arguments to establish PostgreSQL as a vector store is a `PGEngine` object. The `PGEngine` configures a shared connection pool to your Postgres database. This is an industry best practice to manage number of connections and to reduce latency through cached database connections.\n",
104+
"\n",
105+
"To create a `PGEngine` using `PGEngine.from_connection_string()` you need to provide:\n",
106+
"\n",
107+
"1. `url` : Connection string using the `postgresql+asyncpg` driver.\n"
108+
]
109+
},
110+
{
111+
"cell_type": "markdown",
112+
"metadata": {},
113+
"source": [
114+
"**Note:** This tutorial demonstrates the async interface. All async methods have corresponding sync methods."
115+
]
116+
},
117+
{
118+
"cell_type": "code",
119+
"execution_count": null,
120+
"metadata": {},
121+
"outputs": [],
122+
"source": [
123+
"# See docker command above to launch a Postgres instance with pgvector enabled.\n",
124+
"CONNECTION_STRING = (\n",
125+
" f\"postgresql+asyncpg://{POSTGRES_USER}:{POSTGRES_PASSWORD}@{POSTGRES_HOST}\"\n",
126+
" f\":{POSTGRES_PORT}/{POSTGRES_DB}\"\n",
127+
")\n",
128+
"# To use psycopg3 driver, set your connection string to `postgresql+psycopg://`"
129+
]
130+
},
131+
{
132+
"cell_type": "code",
133+
"execution_count": null,
134+
"metadata": {},
135+
"outputs": [],
136+
"source": [
137+
"from langchain_postgres import PGEngine\n",
138+
"\n",
139+
"engine = PGEngine.from_connection_string(url=CONNECTION_STRING)"
140+
]
141+
},
142+
{
143+
"cell_type": "markdown",
144+
"metadata": {},
145+
"source": [
146+
"To create a `PGEngine` using `PGEngine.from_engine()` you need to provide:\n",
147+
"\n",
148+
"1. `engine` : An object of `AsyncEngine`"
149+
]
150+
},
151+
{
152+
"cell_type": "code",
153+
"execution_count": null,
154+
"metadata": {},
155+
"outputs": [],
156+
"source": [
157+
"from sqlalchemy.ext.asyncio import create_async_engine\n",
158+
"\n",
159+
"# Create an SQLAlchemy Async Engine\n",
160+
"pool = create_async_engine(\n",
161+
" CONNECTION_STRING,\n",
162+
")\n",
163+
"\n",
164+
"engine = PGEngine.from_engine(engine=pool)"
165+
]
166+
},
167+
{
168+
"cell_type": "markdown",
169+
"metadata": {},
170+
"source": [
171+
"### Get all collections\n",
172+
"\n",
173+
"This script migrates each collection to a new Vector Store table."
174+
]
175+
},
176+
{
177+
"cell_type": "code",
178+
"execution_count": null,
179+
"metadata": {},
180+
"outputs": [],
181+
"source": [
182+
"from langchain_postgres.utils.pgvector_migrator import alist_pgvector_collection_names\n",
183+
"\n",
184+
"all_collection_names = await alist_pgvector_collection_names(engine)\n",
185+
"print(all_collection_names)"
186+
]
187+
},
188+
{
189+
"cell_type": "markdown",
190+
"metadata": {
191+
"id": "D9Xs2qhm6X56"
192+
},
193+
"source": [
194+
"### Create a new table(s) to migrate existing data\n",
195+
"The `PGVectorStore` class requires a database table. The `PGEngine` engine has a helper method `ainit_vectorstore_table()` that can be used to create a table with the proper schema for you."
196+
]
197+
},
198+
{
199+
"cell_type": "markdown",
200+
"metadata": {},
201+
"source": [
202+
"You can also specify a schema name by passing `schema_name` wherever you pass `table_name`. Eg:\n",
203+
"\n",
204+
"```python\n",
205+
"SCHEMA_NAME=\"my_schema\"\n",
206+
"\n",
207+
"await engine.ainit_vectorstore_table(\n",
208+
" table_name=TABLE_NAME,\n",
209+
" vector_size=768,\n",
210+
" schema_name=SCHEMA_NAME, # Default: \"public\"\n",
211+
")\n",
212+
"```\n",
213+
"\n",
214+
"When creating your vectorstore table, you have the flexibility to define custom metadata and ID columns. This is particularly useful for:\n",
215+
"\n",
216+
"- **Filtering**: Metadata columns allow you to easily filter your data within the vectorstore. For example, you might store the document source, date, or author as metadata for efficient retrieval.\n",
217+
"- **Non-UUID Identifiers**: By default, the id_column uses UUIDs. If you need to use a different type of ID (e.g., an integer or string), you can define a custom id_column.\n",
218+
"\n",
219+
"```python\n",
220+
"metadata_columns = [\n",
221+
" Column(f\"col_0_{collection_name}\", \"VARCHAR\"),\n",
222+
" Column(f\"col_1_{collection_name}\", \"VARCHAR\"),\n",
223+
"]\n",
224+
"engine.init_vectorstore_table(\n",
225+
" table_name=\"destination_table\",\n",
226+
" vector_size=VECTOR_SIZE,\n",
227+
" metadata_columns=metadata_columns,\n",
228+
" id_column=Column(\"langchain_id\", \"VARCHAR\"),\n",
229+
")"
230+
]
231+
},
232+
{
233+
"cell_type": "code",
234+
"execution_count": null,
235+
"metadata": {
236+
"id": "avlyHEMn6gzU"
237+
},
238+
"outputs": [],
239+
"source": [
240+
"# Vertex AI embeddings uses a vector size of 768.\n",
241+
"# Adjust this according to your embeddings service.\n",
242+
"VECTOR_SIZE = 768\n",
243+
"for collection_name in all_collection_names:\n",
244+
" engine.init_vectorstore_table(\n",
245+
" table_name=collection_name,\n",
246+
" vector_size=VECTOR_SIZE,\n",
247+
" )"
248+
]
249+
},
250+
{
251+
"cell_type": "markdown",
252+
"metadata": {},
253+
"source": [
254+
"### Create a vector store and migrate data\n",
255+
"\n",
256+
"> **_NOTE:_** The `FakeEmbeddings` embedding service is only used to initialize a vector store object, not to generate any embeddings. The embeddings are copied directly from the PGVector table.\n",
257+
"\n",
258+
"If you have any customizations on the metadata or the id columns, add them to the vector store as follows:\n",
259+
"\n",
260+
"```python\n",
261+
"from langchain_postgres import PGVectorStore\n",
262+
"from langchain_core.embeddings import FakeEmbeddings\n",
263+
"\n",
264+
"destination_vector_store = PGVectorStore.create_sync(\n",
265+
" engine,\n",
266+
" embedding_service=FakeEmbeddings(size=VECTOR_SIZE),\n",
267+
" table_name=DESTINATION_TABLE_NAME,\n",
268+
" metadata_columns=[col.name for col in metadata_columns],\n",
269+
" id_column=\"langchain_id\",\n",
270+
")\n",
271+
"```"
272+
]
273+
},
274+
{
275+
"cell_type": "code",
276+
"execution_count": null,
277+
"metadata": {
278+
"id": "z-AZyzAQ7bsf"
279+
},
280+
"outputs": [],
281+
"source": [
282+
"from langchain_core.embeddings import FakeEmbeddings\n",
283+
"from langchain_postgres import PGVectorStore\n",
284+
"from langchain_postgres.utils.pgvector_migrator import amigrate_pgvector_collection\n",
285+
"\n",
286+
"for collection_name in all_collection_names:\n",
287+
" destination_vector_store = await PGVectorStore.create(\n",
288+
" engine,\n",
289+
" embedding_service=FakeEmbeddings(size=VECTOR_SIZE),\n",
290+
" table_name=collection_name,\n",
291+
" )\n",
292+
"\n",
293+
" await amigrate_pgvector_collection(\n",
294+
" engine,\n",
295+
" # Set collection name here\n",
296+
" collection_name=collection_name,\n",
297+
" vector_store=destination_vector_store,\n",
298+
" # This deletes data from the original table upon migration. You can choose to turn it off.\n",
299+
" # The data will only be deleted from the original table once all of it has been successfully copied to the destination table.\n",
300+
" delete_pg_collection=True,\n",
301+
" )"
302+
]
303+
}
304+
],
305+
"metadata": {
306+
"colab": {
307+
"provenance": [],
308+
"toc_visible": true
309+
},
310+
"kernelspec": {
311+
"display_name": "Python 3",
312+
"name": "python3"
313+
},
314+
"language_info": {
315+
"codemirror_mode": {
316+
"name": "ipython",
317+
"version": 3
318+
},
319+
"file_extension": ".py",
320+
"mimetype": "text/x-python",
321+
"name": "python",
322+
"nbconvert_exporter": "python",
323+
"pygments_lexer": "ipython3",
324+
"version": "3.12.3"
325+
}
326+
},
327+
"nbformat": 4,
328+
"nbformat_minor": 0
329+
}

0 commit comments

Comments
 (0)