diff --git a/docs/core_docs/docs/integrations/vectorstores/faiss.ipynb b/docs/core_docs/docs/integrations/vectorstores/faiss.ipynb new file mode 100644 index 000000000000..2d2fe01c7faf --- /dev/null +++ b/docs/core_docs/docs/integrations/vectorstores/faiss.ipynb @@ -0,0 +1,467 @@ +{ + "cells": [ + { + "cell_type": "raw", + "id": "1957f5cb", + "metadata": { + "vscode": { + "languageId": "raw" + } + }, + "source": [ + "---\n", + "sidebar_label: Faiss\n", + "sidebar_class_name: node-only\n", + "---" + ] + }, + { + "cell_type": "markdown", + "id": "ef1f0986", + "metadata": {}, + "source": [ + "# FaissStore\n", + "\n", + "```{=mdx}\n", + "\n", + ":::tip Compatibility\n", + "Only available on Node.js.\n", + ":::\n", + "\n", + "```\n", + "\n", + "[Faiss](https://github.com/facebookresearch/faiss) is a library for efficient similarity search and clustering of dense vectors.\n", + "\n", + "LangChain.js supports using Faiss as a locally-running vectorstore that can be saved to a file. It also provides the ability to read the saved file from the [LangChain Python implementation](https://python.langchain.com/docs/integrations/vectorstores/faiss#saving-and-loading).\n", + "\n", + "This guide provides a quick overview for getting started with Faiss [vector stores](/docs/concepts/#vectorstores). For detailed documentation of all `FaissStore` features and configurations head to the [API reference](https://api.js.langchain.com/classes/langchain_community_vectorstores_faiss.FaissStore.html)." + ] + }, + { + "cell_type": "markdown", + "id": "c824838d", + "metadata": {}, + "source": [ + "## Overview\n", + "\n", + "### Integration details\n", + "\n", + "| Class | Package | [PY support](https://python.langchain.com/docs/integrations/vectorstores/faiss) | Package latest |\n", + "| :--- | :--- | :---: | :---: |\n", + "| [`FaissStore`](https://api.js.langchain.com/classes/langchain_community_vectorstores_faiss.FaissStore.html) | [`@langchain/community`](https://npmjs.com/@langchain/community) | ✅ | ![NPM - Version](https://img.shields.io/npm/v/@langchain/community?style=flat-square&label=%20&) |" + ] + }, + { + "cell_type": "markdown", + "id": "36fdc060", + "metadata": {}, + "source": [ + "## Setup\n", + "\n", + "To use Faiss vector stores, you'll need to install the `@langchain/community` integration package and the [`faiss-node`](https://github.com/ewfian/faiss-node) package as a peer dependency.\n", + "\n", + "This guide will also use [OpenAI embeddings](/docs/integrations/text_embedding/openai), which require you to install the `@langchain/openai` integration package. You can also use [other supported embeddings models](/docs/integrations/text_embedding) if you wish.\n", + "\n", + "```{=mdx}\n", + "import IntegrationInstallTooltip from \"@mdx_components/integration_install_tooltip.mdx\";\n", + "import Npm2Yarn from \"@theme/Npm2Yarn\";\n", + "\n", + "\n", + "\n", + "\n", + " @langchain/community faiss-node @langchain/openai\n", + "\n", + "```\n", + "\n", + "### Credentials\n", + "\n", + "Because Faiss runs locally, you do not need any credentials to use it.\n", + "\n", + "If you are using OpenAI embeddings for this guide, you'll need to set your OpenAI key as well:\n", + "\n", + "```typescript\n", + "process.env.OPENAI_API_KEY = \"YOUR_API_KEY\";\n", + "```\n", + "\n", + "If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:\n", + "\n", + "```typescript\n", + "// process.env.LANGCHAIN_TRACING_V2=\"true\"\n", + "// process.env.LANGCHAIN_API_KEY=\"your-api-key\"\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "93df377e", + "metadata": {}, + "source": [ + "## Instantiation" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "dc37144c-208d-4ab3-9f3a-0407a69fe052", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "import { FaissStore } from \"@langchain/community/vectorstores/faiss\";\n", + "import { OpenAIEmbeddings } from \"@langchain/openai\";\n", + "\n", + "const embeddings = new OpenAIEmbeddings({\n", + " model: \"text-embedding-3-small\",\n", + "});\n", + "\n", + "const vectorStore = new FaissStore(embeddings, {});" + ] + }, + { + "cell_type": "markdown", + "id": "ac6071d4", + "metadata": {}, + "source": [ + "## Manage vector store\n", + "\n", + "### Add items to vector store" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "17f5efc0", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[ '1', '2', '3', '4' ]\n" + ] + } + ], + "source": [ + "import type { Document } from \"@langchain/core/documents\";\n", + "\n", + "const document1: Document = {\n", + " pageContent: \"The powerhouse of the cell is the mitochondria\",\n", + " metadata: { source: \"https://example.com\" }\n", + "};\n", + "\n", + "const document2: Document = {\n", + " pageContent: \"Buildings are made out of brick\",\n", + " metadata: { source: \"https://example.com\" }\n", + "};\n", + "\n", + "const document3: Document = {\n", + " pageContent: \"Mitochondria are made out of lipids\",\n", + " metadata: { source: \"https://example.com\" }\n", + "};\n", + "\n", + "const document4: Document = {\n", + " pageContent: \"The 2024 Olympics are in Paris\",\n", + " metadata: { source: \"https://example.com\" }\n", + "}\n", + "\n", + "const documents = [document1, document2, document3, document4];\n", + "\n", + "await vectorStore.addDocuments(documents, { ids: [\"1\", \"2\", \"3\", \"4\"] });" + ] + }, + { + "cell_type": "markdown", + "id": "dcf1b905", + "metadata": {}, + "source": [ + "### Delete items from vector store" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "ef61e188", + "metadata": {}, + "outputs": [], + "source": [ + "await vectorStore.delete({ ids: [\"4\"] });" + ] + }, + { + "cell_type": "markdown", + "id": "c3620501", + "metadata": {}, + "source": [ + "## Query vector store\n", + "\n", + "Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent. \n", + "\n", + "### Query directly\n", + "\n", + "Performing a simple similarity search can be done as follows:" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "aa0a16fa", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "* The powerhouse of the cell is the mitochondria [{\"source\":\"https://example.com\"}]\n", + "* Mitochondria are made out of lipids [{\"source\":\"https://example.com\"}]\n" + ] + } + ], + "source": [ + "const similaritySearchResults = await vectorStore.similaritySearch(\"biology\", 2);\n", + "\n", + "for (const doc of similaritySearchResults) {\n", + " console.log(`* ${doc.pageContent} [${JSON.stringify(doc.metadata, null)}]`);\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "3ed9d733", + "metadata": {}, + "source": [ + "Filtering by metadata is currently not supported.\n", + "\n", + "If you want to execute a similarity search and receive the corresponding scores you can run:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "5efd2eaa", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "* [SIM=1.671] The powerhouse of the cell is the mitochondria [{\"source\":\"https://example.com\"}]\n", + "* [SIM=1.705] Mitochondria are made out of lipids [{\"source\":\"https://example.com\"}]\n" + ] + } + ], + "source": [ + "const similaritySearchWithScoreResults = await vectorStore.similaritySearchWithScore(\"biology\", 2);\n", + "\n", + "for (const [doc, score] of similaritySearchWithScoreResults) {\n", + " console.log(`* [SIM=${score.toFixed(3)}] ${doc.pageContent} [${JSON.stringify(doc.metadata)}]`);\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "0c235cdc", + "metadata": {}, + "source": [ + "### Query by turning into retriever\n", + "\n", + "You can also transform the vector store into a [retriever](/docs/concepts/#retrievers) for easier usage in your chains. " + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "f3460093", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[\n", + " {\n", + " pageContent: 'The powerhouse of the cell is the mitochondria',\n", + " metadata: { source: 'https://example.com' }\n", + " },\n", + " {\n", + " pageContent: 'Mitochondria are made out of lipids',\n", + " metadata: { source: 'https://example.com' }\n", + " }\n", + "]\n" + ] + } + ], + "source": [ + "const retriever = vectorStore.asRetriever({\n", + " k: 2,\n", + "});\n", + "await retriever.invoke(\"biology\");" + ] + }, + { + "cell_type": "markdown", + "id": "e2e0a211", + "metadata": {}, + "source": [ + "### Usage for retrieval-augmented generation\n", + "\n", + "For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n", + "\n", + "- [Tutorials: working with external knowledge](/docs/tutorials/#working-with-external-knowledge).\n", + "- [How-to: Question and answer with RAG](/docs/how_to/#qa-with-rag)\n", + "- [Retrieval conceptual docs](/docs/concepts#retrieval)" + ] + }, + { + "cell_type": "markdown", + "id": "58a88011", + "metadata": {}, + "source": [ + "## Merging indexes\n", + "\n", + "Faiss also supports merging existing indexes:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "79a65a68", + "metadata": {}, + "outputs": [], + "source": [ + "// Create an initial vector store\n", + "const initialStore = await FaissStore.fromTexts(\n", + " [\"Hello world\", \"Bye bye\", \"hello nice world\"],\n", + " [{ id: 2 }, { id: 1 }, { id: 3 }],\n", + " new OpenAIEmbeddings()\n", + ");\n", + "\n", + "// Create another vector store from texts\n", + "const newStore = await FaissStore.fromTexts(\n", + " [\"Some text\"],\n", + " [{ id: 1 }],\n", + " new OpenAIEmbeddings()\n", + ");\n", + "\n", + "// merge the first vector store into vectorStore2\n", + "await newStore.mergeFrom(initialStore);\n", + "\n", + "// You can also create a new vector store from another FaissStore index\n", + "const newStore2 = await FaissStore.fromIndex(\n", + " newStore,\n", + " new OpenAIEmbeddings()\n", + ");\n", + "\n", + "await newStore2.similaritySearch(\"Bye bye\", 1);" + ] + }, + { + "cell_type": "markdown", + "id": "b92a2301", + "metadata": {}, + "source": [ + "## Save an index to file and load it again\n", + "\n", + "To persist an index on disk, use the `.save` and static `.load` methods:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9e4aecb9", + "metadata": {}, + "outputs": [], + "source": [ + "// Create a vector store through any method, here from texts as an example\n", + "const persistentStore = await FaissStore.fromTexts(\n", + " [\"Hello world\", \"Bye bye\", \"hello nice world\"],\n", + " [{ id: 2 }, { id: 1 }, { id: 3 }],\n", + " new OpenAIEmbeddings()\n", + ");\n", + "\n", + "// Save the vector store to a directory\n", + "const directory = \"your/directory/here\";\n", + "\n", + "await persistentStore.save(directory);\n", + "\n", + "// Load the vector store from the same directory\n", + "const loadedVectorStore = await FaissStore.load(\n", + " directory,\n", + " new OpenAIEmbeddings()\n", + ");\n", + "\n", + "// vectorStore and loadedVectorStore are identical\n", + "const result = await loadedVectorStore.similaritySearch(\"hello world\", 1);\n", + "console.log(result);" + ] + }, + { + "cell_type": "markdown", + "id": "069f1b5f", + "metadata": {}, + "source": [ + "## Reading saved files from Python\n", + "\n", + "To enable the ability to read the saved file from [LangChain Python's implementation](https://python.langchain.com/docs/integrations/vectorstores/faiss#saving-and-loading), you'll need to install the [`pickleparser`](https://github.com/ewfian/pickleparser) package.\n", + "\n", + "```{=mdx}\n", + "\n", + " pickleparser\n", + "\n", + "```\n", + "\n", + "Then you can use the `.loadFromPython` static method:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d959f997", + "metadata": {}, + "outputs": [], + "source": [ + "// The directory of data saved from Python\n", + "const directoryWithSavedPythonStore = \"your/directory/here\";\n", + "\n", + "// Load the vector store from the directory\n", + "const pythonLoadedStore = await FaissStore.loadFromPython(\n", + " directoryWithSavedPythonStore,\n", + " new OpenAIEmbeddings()\n", + ");\n", + "\n", + "// Search for the most similar document\n", + "await pythonLoadedStore.similaritySearch(\"test\", 2);" + ] + }, + { + "cell_type": "markdown", + "id": "8a27244f", + "metadata": {}, + "source": [ + "## API reference\n", + "\n", + "For detailed documentation of all `FaissStore` features and configurations head to the [API reference](https://api.js.langchain.com/classes/langchain_community_vectorstores_faiss.FaissStore.html)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "TypeScript", + "language": "typescript", + "name": "tslab" + }, + "language_info": { + "codemirror_mode": { + "mode": "typescript", + "name": "javascript", + "typescript": true + }, + "file_extension": ".ts", + "mimetype": "text/typescript", + "name": "typescript", + "version": "3.7.2" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/core_docs/docs/integrations/vectorstores/faiss.mdx b/docs/core_docs/docs/integrations/vectorstores/faiss.mdx deleted file mode 100644 index eca8a4f6d69b..000000000000 --- a/docs/core_docs/docs/integrations/vectorstores/faiss.mdx +++ /dev/null @@ -1,80 +0,0 @@ ---- -sidebar_class_name: node-only ---- - -import CodeBlock from "@theme/CodeBlock"; - -# Faiss - -:::tip Compatibility -Only available on Node.js. -::: - -[Faiss](https://github.com/facebookresearch/faiss) is a library for efficient similarity search and clustering of dense vectors. - -Langchainjs supports using Faiss as a vectorstore that can be saved to file. It also provides the ability to read the saved file from [Python's implementation](https://python.langchain.com/docs/integrations/vectorstores/faiss#saving-and-loading). - -## Setup - -Install the [faiss-node](https://github.com/ewfian/faiss-node), which is a Node.js bindings for [Faiss](https://github.com/facebookresearch/faiss). - -```bash npm2yarn -npm install -S faiss-node -``` - -To enable the ability to read the saved file from [Python's implementation](https://python.langchain.com/docs/integrations/vectorstores/faiss#saving-and-loading), the [pickleparser](https://github.com/ewfian/pickleparser) also needs to install. - -```bash npm2yarn -npm install -S pickleparser -``` - -## Usage - -import IntegrationInstallTooltip from "@mdx_components/integration_install_tooltip.mdx"; - - - -```bash npm2yarn -npm install @langchain/openai @langchain/community -``` - -### Create a new index from texts - -import ExampleTexts from "@examples/indexes/vector_stores/faiss.ts"; - -{ExampleTexts} - -### Create a new index from a loader - -import ExampleLoader from "@examples/indexes/vector_stores/faiss_fromdocs.ts"; - -{ExampleLoader} - -### Deleting vectors - -import ExampleDelete from "@examples/indexes/vector_stores/faiss_delete.ts"; - -{ExampleDelete} - -### Merging indexes and creating new index from another instance - -import ExampleMerge from "@examples/indexes/vector_stores/faiss_mergefrom.ts"; - -{ExampleMerge} - -### Save an index to file and load it again - -import ExampleSave from "@examples/indexes/vector_stores/faiss_saveload.ts"; - -{ExampleSave} - -### Load the saved file from [Python's implementation](https://python.langchain.com/docs/integrations/vectorstores/faiss#saving-and-loading) - -import ExamplePython from "@examples/indexes/vector_stores/faiss_loadfrompython.ts"; - -{ExamplePython} - -## Related - -- Vector store [conceptual guide](/docs/concepts/#vectorstores) -- Vector store [how-to guides](/docs/how_to/#vectorstores) diff --git a/docs/core_docs/docs/integrations/vectorstores/hnswlib.ipynb b/docs/core_docs/docs/integrations/vectorstores/hnswlib.ipynb new file mode 100644 index 000000000000..25b688bc82a4 --- /dev/null +++ b/docs/core_docs/docs/integrations/vectorstores/hnswlib.ipynb @@ -0,0 +1,381 @@ +{ + "cells": [ + { + "cell_type": "raw", + "id": "1957f5cb", + "metadata": { + "vscode": { + "languageId": "raw" + } + }, + "source": [ + "---\n", + "sidebar_label: HNSWLib\n", + "sidebar_class_name: node-only\n", + "---" + ] + }, + { + "cell_type": "markdown", + "id": "ef1f0986", + "metadata": {}, + "source": [ + "# HNSWLib\n", + "\n", + "```{=mdx}\n", + ":::tip Compatibility\n", + "Only available on Node.js.\n", + ":::\n", + "```\n", + "\n", + "HNSWLib is an in-memory vector store that can be saved to a file. It uses the [HNSWLib library](https://github.com/nmslib/hnswlib).\n", + "\n", + "This guide provides a quick overview for getting started with HNSWLib [vector stores](/docs/concepts/#vectorstores). For detailed documentation of all `HNSWLib` features and configurations head to the [API reference](https://api.js.langchain.com/classes/langchain_community_vectorstores_hnswlib.HNSWLib.html)." + ] + }, + { + "cell_type": "markdown", + "id": "c824838d", + "metadata": {}, + "source": [ + "## Overview\n", + "\n", + "### Integration details\n", + "\n", + "| Class | Package | PY support | Package latest |\n", + "| :--- | :--- | :---: | :---: |\n", + "| [`HNSWLib`](https://api.js.langchain.com/classes/langchain_community_vectorstores_hnswlib.HNSWLib.html) | [`@langchain/community`](https://npmjs.com/@langchain/community) | ❌ | ![NPM - Version](https://img.shields.io/npm/v/@langchain/community?style=flat-square&label=%20&) |" + ] + }, + { + "cell_type": "markdown", + "id": "36fdc060", + "metadata": {}, + "source": [ + "## Setup\n", + "\n", + "To use HNSWLib vector stores, you'll need to install the `@langchain/community` integration package with the [`hnswlib-node`](https://www.npmjs.com/package/hnswlib-node) package as a peer dependency.\n", + "\n", + "This guide will also use [OpenAI embeddings](/docs/integrations/text_embedding/openai), which require you to install the `@langchain/openai` integration package. You can also use [other supported embeddings models](/docs/integrations/text_embedding) if you wish.\n", + "\n", + "```{=mdx}\n", + "import IntegrationInstallTooltip from \"@mdx_components/integration_install_tooltip.mdx\";\n", + "import Npm2Yarn from \"@theme/Npm2Yarn\";\n", + "\n", + "\n", + "\n", + "\n", + " @langchain/community hnswlib-node @langchain/openai\n", + "\n", + "```\n", + "\n", + "```{=mdx}\n", + ":::caution\n", + "\n", + "**On Windows**, you might need to install [Visual Studio](https://visualstudio.microsoft.com/downloads/) first in order to properly build the `hnswlib-node` package.\n", + "\n", + ":::\n", + "```\n", + "\n", + "### Credentials\n", + "\n", + "Because HNSWLib runs locally, you do not need any credentials to use it.\n", + "\n", + "If you are using OpenAI embeddings for this guide, you'll need to set your OpenAI key as well:\n", + "\n", + "```typescript\n", + "process.env.OPENAI_API_KEY = \"YOUR_API_KEY\";\n", + "```\n", + "\n", + "If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:\n", + "\n", + "```typescript\n", + "// process.env.LANGCHAIN_TRACING_V2=\"true\"\n", + "// process.env.LANGCHAIN_API_KEY=\"your-api-key\"\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "93df377e", + "metadata": {}, + "source": [ + "## Instantiation" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "dc37144c-208d-4ab3-9f3a-0407a69fe052", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "import { HNSWLib } from \"@langchain/community/vectorstores/hnswlib\";\n", + "import { OpenAIEmbeddings } from \"@langchain/openai\";\n", + "\n", + "const embeddings = new OpenAIEmbeddings({\n", + " model: \"text-embedding-3-small\",\n", + "});\n", + "\n", + "const vectorStore = await HNSWLib.fromDocuments([], embeddings);" + ] + }, + { + "cell_type": "markdown", + "id": "ac6071d4", + "metadata": {}, + "source": [ + "## Manage vector store\n", + "\n", + "### Add items to vector store" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "17f5efc0", + "metadata": {}, + "outputs": [], + "source": [ + "import type { Document } from \"@langchain/core/documents\";\n", + "\n", + "const document1: Document = {\n", + " pageContent: \"The powerhouse of the cell is the mitochondria\",\n", + " metadata: { source: \"https://example.com\" }\n", + "};\n", + "\n", + "const document2: Document = {\n", + " pageContent: \"Buildings are made out of brick\",\n", + " metadata: { source: \"https://example.com\" }\n", + "};\n", + "\n", + "const document3: Document = {\n", + " pageContent: \"Mitochondria are made out of lipids\",\n", + " metadata: { source: \"https://example.com\" }\n", + "};\n", + "\n", + "const document4: Document = {\n", + " pageContent: \"The 2024 Olympics are in Paris\",\n", + " metadata: { source: \"https://example.com\" }\n", + "}\n", + "\n", + "const documents = [document1, document2, document3, document4];\n", + "\n", + "await vectorStore.addDocuments(documents);" + ] + }, + { + "cell_type": "markdown", + "id": "c3620501", + "metadata": {}, + "source": [ + "Deletion and ids for individual documents are not currently supported.\n", + "\n", + "## Query vector store\n", + "\n", + "Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent. \n", + "\n", + "### Query directly\n", + "\n", + "Performing a simple similarity search can be done as follows:" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "aa0a16fa", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "* The powerhouse of the cell is the mitochondria [{\"source\":\"https://example.com\"}]\n", + "* Mitochondria are made out of lipids [{\"source\":\"https://example.com\"}]\n" + ] + } + ], + "source": [ + "const filter = (doc) => doc.metadata.source === \"https://example.com\";\n", + "\n", + "const similaritySearchResults = await vectorStore.similaritySearch(\"biology\", 2, filter);\n", + "\n", + "for (const doc of similaritySearchResults) {\n", + " console.log(`* ${doc.pageContent} [${JSON.stringify(doc.metadata, null)}]`);\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "3ed9d733", + "metadata": {}, + "source": [ + "The filter is optional, and must be a predicate function that takes a document as input, and returns `true` or `false` depending on whether the document should be returned.\n", + "\n", + "If you want to execute a similarity search and receive the corresponding scores you can run:" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "5efd2eaa", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "* [SIM=0.835] The powerhouse of the cell is the mitochondria [{\"source\":\"https://example.com\"}]\n", + "* [SIM=0.852] Mitochondria are made out of lipids [{\"source\":\"https://example.com\"}]\n" + ] + } + ], + "source": [ + "const similaritySearchWithScoreResults = await vectorStore.similaritySearchWithScore(\"biology\", 2, filter)\n", + "\n", + "for (const [doc, score] of similaritySearchWithScoreResults) {\n", + " console.log(`* [SIM=${score.toFixed(3)}] ${doc.pageContent} [${JSON.stringify(doc.metadata)}]`);\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "0c235cdc", + "metadata": {}, + "source": [ + "### Query by turning into retriever\n", + "\n", + "You can also transform the vector store into a [retriever](/docs/concepts/#retrievers) for easier usage in your chains." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "f3460093", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[\n", + " {\n", + " pageContent: 'The powerhouse of the cell is the mitochondria',\n", + " metadata: { source: 'https://example.com' }\n", + " },\n", + " {\n", + " pageContent: 'Mitochondria are made out of lipids',\n", + " metadata: { source: 'https://example.com' }\n", + " }\n", + "]\n" + ] + } + ], + "source": [ + "const retriever = vectorStore.asRetriever({\n", + " // Optional filter\n", + " filter: filter,\n", + " k: 2,\n", + "});\n", + "await retriever.invoke(\"biology\");" + ] + }, + { + "cell_type": "markdown", + "id": "e2e0a211", + "metadata": {}, + "source": [ + "### Usage for retrieval-augmented generation\n", + "\n", + "For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n", + "\n", + "- [Tutorials: working with external knowledge](/docs/tutorials/#working-with-external-knowledge).\n", + "- [How-to: Question and answer with RAG](/docs/how_to/#qa-with-rag)\n", + "- [Retrieval conceptual docs](/docs/concepts#retrieval)" + ] + }, + { + "cell_type": "markdown", + "id": "069f1b5f", + "metadata": {}, + "source": [ + "## Save to/load from file\n", + "\n", + "HNSWLib supports saving your index to a file, then reloading it at a later date:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f71ce986", + "metadata": {}, + "outputs": [], + "source": [ + "// Save the vector store to a directory\n", + "const directory = \"your/directory/here\";\n", + "await vectorStore.save(directory);\n", + "\n", + "// Load the vector store from the same directory\n", + "const loadedVectorStore = await HNSWLib.load(directory, new OpenAIEmbeddings());\n", + "\n", + "// vectorStore and loadedVectorStore are identical\n", + "await loadedVectorStore.similaritySearch(\"hello world\", 1);" + ] + }, + { + "cell_type": "markdown", + "id": "22f0d74f", + "metadata": {}, + "source": [ + "### Delete a saved index\n", + "\n", + "You can use the `.delete` method to clear an index saved to a given directory:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "daabbffd", + "metadata": {}, + "outputs": [], + "source": [ + "// Load the vector store from the same directory\n", + "const savedVectorStore = await HNSWLib.load(directory, new OpenAIEmbeddings());\n", + "\n", + "await savedVectorStore.delete({ directory });" + ] + }, + { + "cell_type": "markdown", + "id": "8a27244f", + "metadata": {}, + "source": [ + "## API reference\n", + "\n", + "For detailed documentation of all `HNSWLib` features and configurations head to the [API reference](https://api.js.langchain.com/classes/langchain_community_vectorstores_hnswlib.HNSWLib.html)." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "TypeScript", + "language": "typescript", + "name": "tslab" + }, + "language_info": { + "codemirror_mode": { + "mode": "typescript", + "name": "javascript", + "typescript": true + }, + "file_extension": ".ts", + "mimetype": "text/typescript", + "name": "typescript", + "version": "3.7.2" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/core_docs/docs/integrations/vectorstores/hnswlib.mdx b/docs/core_docs/docs/integrations/vectorstores/hnswlib.mdx deleted file mode 100644 index e2b6d152ff16..000000000000 --- a/docs/core_docs/docs/integrations/vectorstores/hnswlib.mdx +++ /dev/null @@ -1,72 +0,0 @@ ---- -sidebar_class_name: node-only ---- - -import CodeBlock from "@theme/CodeBlock"; - -# HNSWLib - -:::tip Compatibility -Only available on Node.js. -::: - -HNSWLib is an in-memory vectorstore that can be saved to a file. It uses [HNSWLib](https://github.com/nmslib/hnswlib). - -## Setup - -:::caution - -**On Windows**, you might need to install [Visual Studio](https://visualstudio.microsoft.com/downloads/) first in order to properly build the `hnswlib-node` package. - -::: - -You can install it with - -```bash npm2yarn -npm install hnswlib-node -``` - -import IntegrationInstallTooltip from "@mdx_components/integration_install_tooltip.mdx"; - - - -```bash npm2yarn -npm install @langchain/openai @langchain/community -``` - -## Usage - -### Create a new index from texts - -import ExampleTexts from "@examples/indexes/vector_stores/hnswlib.ts"; - -{ExampleTexts} - -### Create a new index from a loader - -import ExampleLoader from "@examples/indexes/vector_stores/hnswlib_fromdocs.ts"; - -{ExampleLoader} - -### Save an index to a file and load it again - -import ExampleSave from "@examples/indexes/vector_stores/hnswlib_saveload.ts"; - -{ExampleSave} - -### Filter documents - -import ExampleFilter from "@examples/indexes/vector_stores/hnswlib_filter.ts"; - -{ExampleFilter} - -### Delete index - -import ExampleDelete from "@examples/indexes/vector_stores/hnswlib_delete.ts"; - -{ExampleDelete} - -## Related - -- Vector store [conceptual guide](/docs/concepts/#vectorstores) -- Vector store [how-to guides](/docs/how_to/#vectorstores) diff --git a/docs/core_docs/docs/integrations/vectorstores/memory.ipynb b/docs/core_docs/docs/integrations/vectorstores/memory.ipynb index 4cab4eab62d8..d7c5424ba5aa 100644 --- a/docs/core_docs/docs/integrations/vectorstores/memory.ipynb +++ b/docs/core_docs/docs/integrations/vectorstores/memory.ipynb @@ -192,6 +192,8 @@ "id": "3ed9d733", "metadata": {}, "source": [ + "The filter is optional, and must be a predicate function that takes a document as input, and returns `true` or `false` depending on whether the document should be returned.\n", + "\n", "If you want to execute a similarity search and receive the corresponding scores you can run:" ] }, diff --git a/docs/core_docs/docs/integrations/vectorstores/mongodb_atlas.ipynb b/docs/core_docs/docs/integrations/vectorstores/mongodb_atlas.ipynb new file mode 100644 index 000000000000..00053103059f --- /dev/null +++ b/docs/core_docs/docs/integrations/vectorstores/mongodb_atlas.ipynb @@ -0,0 +1,495 @@ +{ + "cells": [ + { + "cell_type": "raw", + "id": "1957f5cb", + "metadata": { + "vscode": { + "languageId": "raw" + } + }, + "source": [ + "---\n", + "sidebar_label: MongoDB Atlas\n", + "sidebar_class_name: node-only\n", + "---" + ] + }, + { + "cell_type": "markdown", + "id": "ef1f0986", + "metadata": {}, + "source": [ + "# MongoDB Atlas\n", + "\n", + "```{=mdx}\n", + ":::tip Compatibility\n", + "Only available on Node.js.\n", + "\n", + "You can still create API routes that use MongoDB with Next.js by setting the `runtime` variable to `nodejs` like so:\n", + "\n", + "`export const runtime = \"nodejs\";`\n", + "\n", + "You can read more about Edge runtimes in the Next.js documentation [here](https://nextjs.org/docs/app/building-your-application/rendering/edge-and-nodejs-runtimes).\n", + ":::\n", + "```\n", + "\n", + "This guide provides a quick overview for getting started with MongoDB Atlas [vector stores](/docs/concepts/#vectorstores). For detailed documentation of all `MongoDBAtlasVectorSearch` features and configurations head to the [API reference](https://api.js.langchain.com/classes/langchain_mongodb.MongoDBAtlasVectorSearch.html)." + ] + }, + { + "cell_type": "markdown", + "id": "c824838d", + "metadata": {}, + "source": [ + "## Overview\n", + "\n", + "### Integration details\n", + "\n", + "| Class | Package | [PY support](https://python.langchain.com/v0.2/docs/integrations/vectorstores/mongodb_atlas/) | Package latest |\n", + "| :--- | :--- | :---: | :---: |\n", + "| [`MongoDBAtlasVectorSearch`](https://api.js.langchain.com/classes/langchain_mongodb.MongoDBAtlasVectorSearch.html) | [`@langchain/mongodb`](https://www.npmjs.com/package/@langchain/mongodb) | ✅ | ![NPM - Version](https://img.shields.io/npm/v/@langchain/mongodb?style=flat-square&label=%20&) |" + ] + }, + { + "cell_type": "markdown", + "id": "36fdc060", + "metadata": {}, + "source": [ + "## Setup\n", + "\n", + "To use MongoDB Atlas vector stores, you'll need to configure a MongoDB Atlas cluster and install the `@langchain/mongodb` integration package.\n", + "\n", + "### Initial Cluster Configuration\n", + "\n", + "To create a MongoDB Atlas cluster, navigate to the [MongoDB Atlas website](https://www.mongodb.com/products/platform/atlas-database) and create an account if you don't already have one.\n", + "\n", + "Create and name a cluster when prompted, then find it under `Database`. Select `Browse Collections` and create either a blank collection or one from the provided sample data.\n", + "\n", + "**Note:** The cluster created must be MongoDB 7.0 or higher.\n", + "\n", + "### Creating an Index\n", + "\n", + "After configuring your cluster, you'll need to create an index on the collection field you want to search over.\n", + "\n", + "Switch to the `Atlas Search` tab and click `Create Search Index`. From there, make sure you select `Atlas Vector Search - JSON Editor`, then select the appropriate database and collection and paste the following into the textbox:\n", + "\n", + "```json\n", + "{\n", + " \"fields\": [\n", + " {\n", + " \"numDimensions\": 1536,\n", + " \"path\": \"embedding\",\n", + " \"similarity\": \"euclidean\",\n", + " \"type\": \"vector\"\n", + " }\n", + " ]\n", + "}\n", + "```\n", + "\n", + "Note that the dimensions property should match the dimensionality of the embeddings you are using. For example, Cohere embeddings have 1024 dimensions, and by default OpenAI embeddings have 1536:\n", + "\n", + "Note: By default the vector store expects an index name of default, an indexed collection field name of embedding, and a raw text field name of text. You should initialize the vector store with field names matching your index name collection schema as shown below.\n", + "\n", + "Finally, proceed to build the index.\n", + "\n", + "### Embeddings\n", + "\n", + "This guide will also use [OpenAI embeddings](/docs/integrations/text_embedding/openai), which require you to install the `@langchain/openai` integration package. You can also use [other supported embeddings models](/docs/integrations/text_embedding) if you wish.\n", + "\n", + "### Installation\n", + "\n", + "Install the following packages:\n", + "\n", + "```{=mdx}\n", + "import IntegrationInstallTooltip from \"@mdx_components/integration_install_tooltip.mdx\";\n", + "import Npm2Yarn from \"@theme/Npm2Yarn\";\n", + "\n", + "\n", + "\n", + "\n", + " @langchain/mongodb mongodb @langchain/openai\n", + "\n", + "```\n", + "\n", + "### Credentials\n", + "\n", + "Once you've done the above, set the `MONGODB_ATLAS_URI` environment variable from the `Connect` button in Mongo's dashboard. You'll also need your DB name and collection name:\n", + "\n", + "```typescript\n", + "process.env.MONGODB_ATLAS_URI = \"your-atlas-url\";\n", + "process.env.MONGODB_ATLAS_COLLECTION_NAME = \"your-atlas-db-name\";\n", + "process.env.MONGODB_ATLAS_DB_NAME = \"your-atlas-db-name\";\n", + "```\n", + "\n", + "If you are using OpenAI embeddings for this guide, you'll need to set your OpenAI key as well:\n", + "\n", + "```typescript\n", + "process.env.OPENAI_API_KEY = \"YOUR_API_KEY\";\n", + "```\n", + "\n", + "If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:\n", + "\n", + "```typescript\n", + "// process.env.LANGCHAIN_TRACING_V2=\"true\"\n", + "// process.env.LANGCHAIN_API_KEY=\"your-api-key\"\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "93df377e", + "metadata": {}, + "source": [ + "## Instantiation\n", + "\n", + "Once you've set up your cluster as shown above, you can initialize your vector store as follows:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "dc37144c-208d-4ab3-9f3a-0407a69fe052", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "import { MongoDBAtlasVectorSearch } from \"@langchain/mongodb\";\n", + "import { OpenAIEmbeddings } from \"@langchain/openai\";\n", + "import { MongoClient } from \"mongodb\";\n", + "\n", + "const client = new MongoClient(process.env.MONGODB_ATLAS_URI || \"\");\n", + "const collection = client.db(process.env.MONGODB_ATLAS_DB_NAME)\n", + " .collection(process.env.MONGODB_ATLAS_COLLECTION_NAME);\n", + "\n", + "const embeddings = new OpenAIEmbeddings({\n", + " model: \"text-embedding-3-small\",\n", + "});\n", + "\n", + "const vectorStore = new MongoDBAtlasVectorSearch(embeddings, {\n", + " collection: collection,\n", + " indexName: \"vector_index\", // The name of the Atlas search index. Defaults to \"default\"\n", + " textKey: \"text\", // The name of the collection field containing the raw content. Defaults to \"text\"\n", + " embeddingKey: \"embedding\", // The name of the collection field containing the embedded text. Defaults to \"embedding\"\n", + "});" + ] + }, + { + "cell_type": "markdown", + "id": "ac6071d4", + "metadata": {}, + "source": [ + "## Manage vector store\n", + "\n", + "### Add items to vector store\n", + "\n", + "You can now add documents to your vector store:" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "17f5efc0", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[ '1', '2', '3', '4' ]\n" + ] + } + ], + "source": [ + "import type { Document } from \"@langchain/core/documents\";\n", + "\n", + "const document1: Document = {\n", + " pageContent: \"The powerhouse of the cell is the mitochondria\",\n", + " metadata: { source: \"https://example.com\" }\n", + "};\n", + "\n", + "const document2: Document = {\n", + " pageContent: \"Buildings are made out of brick\",\n", + " metadata: { source: \"https://example.com\" }\n", + "};\n", + "\n", + "const document3: Document = {\n", + " pageContent: \"Mitochondria are made out of lipids\",\n", + " metadata: { source: \"https://example.com\" }\n", + "};\n", + "\n", + "const document4: Document = {\n", + " pageContent: \"The 2024 Olympics are in Paris\",\n", + " metadata: { source: \"https://example.com\" }\n", + "}\n", + "\n", + "const documents = [document1, document2, document3, document4];\n", + "\n", + "await vectorStore.addDocuments(documents, { ids: [\"1\", \"2\", \"3\", \"4\"] });" + ] + }, + { + "cell_type": "markdown", + "id": "dcf1b905", + "metadata": {}, + "source": [ + "Adding a document with the same `id` as an existing document will update the existing one.\n", + "\n", + "### Delete items from vector store" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "ef61e188", + "metadata": {}, + "outputs": [], + "source": [ + "await vectorStore.delete({ ids: [\"4\"] });" + ] + }, + { + "cell_type": "markdown", + "id": "c3620501", + "metadata": {}, + "source": [ + "## Query vector store\n", + "\n", + "Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent. \n", + "\n", + "### Query directly\n", + "\n", + "Performing a simple similarity search can be done as follows:" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "aa0a16fa", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "* The powerhouse of the cell is the mitochondria [{\"_id\":\"1\",\"source\":\"https://example.com\"}]\n", + "* Mitochondria are made out of lipids [{\"_id\":\"3\",\"source\":\"https://example.com\"}]\n" + ] + } + ], + "source": [ + "const similaritySearchResults = await vectorStore.similaritySearch(\"biology\", 2);\n", + "\n", + "for (const doc of similaritySearchResults) {\n", + " console.log(`* ${doc.pageContent} [${JSON.stringify(doc.metadata, null)}]`);\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "3ed9d733", + "metadata": {}, + "source": [ + "### Filtering\n", + "\n", + "MongoDB Atlas supports pre-filtering of results on other fields. They require you to define which metadata fields you plan to filter on by updating the index you created initially. Here's an example:\n", + "\n", + "```json\n", + "{\n", + " \"fields\": [\n", + " {\n", + " \"numDimensions\": 1024,\n", + " \"path\": \"embedding\",\n", + " \"similarity\": \"euclidean\",\n", + " \"type\": \"vector\"\n", + " },\n", + " {\n", + " \"path\": \"source\",\n", + " \"type\": \"filter\"\n", + " }\n", + " ]\n", + "}\n", + "```\n", + "\n", + "Above, the first item in `fields` is the vector index, and the second item is the metadata property you want to filter on. The name of the property is the value of the `path` key. So the above index would allow us to search on a metadata field named `source`.\n", + "\n", + "Then, in your code you can use [MQL Query Operators](https://www.mongodb.com/docs/manual/reference/operator/query/) for filtering.\n", + "\n", + "The below example illustrates this:" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "bc8f242e", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "* The powerhouse of the cell is the mitochondria [{\"_id\":\"1\",\"source\":\"https://example.com\"}]\n", + "* Mitochondria are made out of lipids [{\"_id\":\"3\",\"source\":\"https://example.com\"}]\n" + ] + } + ], + "source": [ + "const filter = {\n", + " preFilter: {\n", + " source: {\n", + " $eq: \"https://example.com\",\n", + " },\n", + " },\n", + "}\n", + "\n", + "const filteredResults = await vectorStore.similaritySearch(\"biology\", 2, filter);\n", + "\n", + "for (const doc of filteredResults) {\n", + " console.log(`* ${doc.pageContent} [${JSON.stringify(doc.metadata, null)}]`);\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "69326bba", + "metadata": {}, + "source": [ + "### Returning scores\n", + "\n", + "If you want to execute a similarity search and receive the corresponding scores you can run:" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "5efd2eaa", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "* [SIM=0.374] The powerhouse of the cell is the mitochondria [{\"_id\":\"1\",\"source\":\"https://example.com\"}]\n", + "* [SIM=0.370] Mitochondria are made out of lipids [{\"_id\":\"3\",\"source\":\"https://example.com\"}]\n" + ] + } + ], + "source": [ + "const similaritySearchWithScoreResults = await vectorStore.similaritySearchWithScore(\"biology\", 2, filter)\n", + "\n", + "for (const [doc, score] of similaritySearchWithScoreResults) {\n", + " console.log(`* [SIM=${score.toFixed(3)}] ${doc.pageContent} [${JSON.stringify(doc.metadata)}]`);\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "0c235cdc", + "metadata": {}, + "source": [ + "### Query by turning into retriever\n", + "\n", + "You can also transform the vector store into a [retriever](/docs/concepts/#retrievers) for easier usage in your chains. " + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "f3460093", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[\n", + " Document {\n", + " pageContent: 'The powerhouse of the cell is the mitochondria',\n", + " metadata: { _id: '1', source: 'https://example.com' },\n", + " id: undefined\n", + " },\n", + " Document {\n", + " pageContent: 'Mitochondria are made out of lipids',\n", + " metadata: { _id: '3', source: 'https://example.com' },\n", + " id: undefined\n", + " }\n", + "]\n" + ] + } + ], + "source": [ + "const retriever = vectorStore.asRetriever({\n", + " // Optional filter\n", + " filter: filter,\n", + " k: 2,\n", + "});\n", + "await retriever.invoke(\"biology\");" + ] + }, + { + "cell_type": "markdown", + "id": "e2e0a211", + "metadata": {}, + "source": [ + "### Usage for retrieval-augmented generation\n", + "\n", + "For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n", + "\n", + "- [Tutorials: working with external knowledge](/docs/tutorials/#working-with-external-knowledge).\n", + "- [How-to: Question and answer with RAG](/docs/how_to/#qa-with-rag)\n", + "- [Retrieval conceptual docs](/docs/concepts#retrieval)" + ] + }, + { + "cell_type": "markdown", + "id": "069f1b5f", + "metadata": {}, + "source": [ + "## Closing connections\n", + "\n", + "Make sure you close the client instance when you are finished to avoid excessive resource consumption:" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "f71ce986", + "metadata": {}, + "outputs": [], + "source": [ + "await client.close();" + ] + }, + { + "cell_type": "markdown", + "id": "8a27244f", + "metadata": {}, + "source": [ + "## API reference\n", + "\n", + "For detailed documentation of all `MongoDBAtlasVectorSearch` features and configurations head to the [API reference](https://api.js.langchain.com/classes/langchain_mongodb.MongoDBAtlasVectorSearch.html)." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "TypeScript", + "language": "typescript", + "name": "tslab" + }, + "language_info": { + "codemirror_mode": { + "mode": "typescript", + "name": "javascript", + "typescript": true + }, + "file_extension": ".ts", + "mimetype": "text/typescript", + "name": "typescript", + "version": "3.7.2" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/core_docs/docs/integrations/vectorstores/mongodb_atlas.mdx b/docs/core_docs/docs/integrations/vectorstores/mongodb_atlas.mdx deleted file mode 100644 index dfe680b54353..000000000000 --- a/docs/core_docs/docs/integrations/vectorstores/mongodb_atlas.mdx +++ /dev/null @@ -1,131 +0,0 @@ ---- -sidebar_class_name: node-only ---- - -# MongoDB Atlas - -:::tip Compatibility -Only available on Node.js. - -You can still create API routes that use MongoDB with Next.js by setting the `runtime` variable to `nodejs` like so: - -```typescript -export const runtime = "nodejs"; -``` - -You can read more about Edge runtimes in the Next.js documentation [here](https://nextjs.org/docs/app/building-your-application/rendering/edge-and-nodejs-runtimes). -::: - -LangChain.js supports MongoDB Atlas as a vector store, and supports both standard similarity search and maximal marginal relevance search, -which takes a combination of documents are most similar to the inputs, then reranks and optimizes for diversity. - -## Setup - -### Installation - -First, add the Node MongoDB SDK to your project: - -```bash npm2yarn -npm install -S mongodb -``` - -### Initial Cluster Configuration - -Next, you'll need create a MongoDB Atlas cluster. Navigate to the [MongoDB Atlas website](https://www.mongodb.com/atlas/database) and create an account if you don't already have one. - -Create and name a cluster when prompted, then find it under `Database`. Select `Collections` and create either a blank collection or one from the provided sample data. - -** Note ** The cluster created must be MongoDB 7.0 or higher. If you are using a pre-7.0 version of MongoDB, you must use a version of langchainjs<=0.0.163. - -### Creating an Index - -After configuring your cluster, you'll need to create an index on the collection field you want to search over. - -Switch to the `Atlas Search` tab and click `Create Search Index`. From there, make sure you select `Atlas Vector Search - JSON Editor`, -then select the appropriate database and collection and paste the following into the textbox: - -```json -{ - "fields": [ - { - "numDimensions": 1024, - "path": "embedding", - "similarity": "euclidean", - "type": "vector" - } - ] -} -``` - -Note that the `dimensions` property should match the dimensionality of the embeddings you are using. -For example, Cohere embeddings have 1024 dimensions, and by default OpenAI embeddings have 1536: - -**Note:** By default the vector store expects an index name of `default`, an indexed collection field name of `embedding`, and a raw text field name of `text`. -You should initialize the vector store with field names matching your index name collection schema as shown below. - -Finally, proceed to build the index. - -## Usage - -import IntegrationInstallTooltip from "@mdx_components/integration_install_tooltip.mdx"; - - - -```bash npm2yarn -npm install @langchain/community -``` - -### Ingestion - -import CodeBlock from "@theme/CodeBlock"; -import Ingestion from "@examples/indexes/vector_stores/mongodb_atlas_fromTexts.ts"; - -{Ingestion} - -### Search - -import Search from "@examples/indexes/vector_stores/mongodb_atlas_search.ts"; - -{Search} - -### Maximal marginal relevance - -import MMRExample from "@examples/indexes/vector_stores/mongodb_mmr.ts"; - -{MMRExample} - -### Metadata filtering - -MongoDB Atlas supports pre-filtering of results on other fields. They require you to define which metadata fields -you plan to filter on by updating the index. Here's an example: - -```json -{ - "fields": [ - { - "numDimensions": 1024, - "path": "embedding", - "similarity": "euclidean", - "type": "vector" - }, - { - "path": "docstore_document_id", - "type": "filter" - } - ] -} -``` - -Above, the first item in `fields` is the vector index, and the second item is the metadata property you want to filter on. -The name of the property is `path`, so the above index would allow us to search on a metadata field named `docstore_document_id`. - -Then, in your code you can use [MQL Query Operators](https://www.mongodb.com/docs/manual/reference/operator/query/) for filtering. Here's an example: - -import MetadataExample from "@examples/indexes/vector_stores/mongodb_metadata_filtering.ts"; - -{MetadataExample} - -## Related - -- Vector store [conceptual guide](/docs/concepts/#vectorstores) -- Vector store [how-to guides](/docs/how_to/#vectorstores) diff --git a/docs/core_docs/docs/integrations/vectorstores/pgvector.ipynb b/docs/core_docs/docs/integrations/vectorstores/pgvector.ipynb new file mode 100644 index 000000000000..19f0a2690373 --- /dev/null +++ b/docs/core_docs/docs/integrations/vectorstores/pgvector.ipynb @@ -0,0 +1,629 @@ +{ + "cells": [ + { + "cell_type": "raw", + "id": "1957f5cb", + "metadata": { + "vscode": { + "languageId": "raw" + } + }, + "source": [ + "---\n", + "sidebar_label: PGVector\n", + "sidebar_class_name: node-only\n", + "---" + ] + }, + { + "cell_type": "markdown", + "id": "ef1f0986", + "metadata": {}, + "source": [ + "# PGVectorStore\n", + "\n", + "```{=mdx}\n", + ":::tip Compatibility\n", + "Only available on Node.js.\n", + ":::\n", + "```\n", + "\n", + "To enable vector search in generic PostgreSQL databases, LangChain.js supports using the [`pgvector`](https://github.com/pgvector/pgvector) Postgres extension.\n", + "\n", + "This guide provides a quick overview for getting started with PGVector [vector stores](/docs/concepts/#vectorstores). For detailed documentation of all `PGVectorStore` features and configurations head to the [API reference](https://api.js.langchain.com/classes/langchain_community_vectorstores_pgvector.PGVectorStore.html)." + ] + }, + { + "cell_type": "markdown", + "id": "c824838d", + "metadata": {}, + "source": [ + "## Overview\n", + "\n", + "### Integration details\n", + "\n", + "| Class | Package | [PY support](https://python.langchain.com/v0.2/docs/integrations/vectorstores/pgvector/) | Package latest |\n", + "| :--- | :--- | :---: | :---: |\n", + "| [`PGVectorStore`](https://api.js.langchain.com/classes/langchain_community_vectorstores_pgvector.PGVectorStore.html) | [`@langchain/community`](https://npmjs.com/@langchain/community) | ✅ | ![NPM - Version](https://img.shields.io/npm/v/@langchain/community?style=flat-square&label=%20&) |" + ] + }, + { + "cell_type": "markdown", + "id": "36fdc060", + "metadata": {}, + "source": [ + "## Setup\n", + "\n", + "To use PGVector vector stores, you'll need to set up a Postgres instance with the [`pgvector`](https://github.com/pgvector/pgvector) extension enabled. You'll also need to install the `@langchain/community` integration package with the [`pg`](https://www.npmjs.com/package/pg) package as a peer dependency.\n", + "\n", + "This guide will also use [OpenAI embeddings](/docs/integrations/text_embedding/openai), which require you to install the `@langchain/openai` integration package. You can also use [other supported embeddings models](/docs/integrations/text_embedding) if you wish.\n", + "\n", + "We'll also use the [`uuid`](https://www.npmjs.com/package/uuid) package to generate ids in the required format.\n", + "\n", + "```{=mdx}\n", + "import IntegrationInstallTooltip from \"@mdx_components/integration_install_tooltip.mdx\";\n", + "import Npm2Yarn from \"@theme/Npm2Yarn\";\n", + "\n", + "\n", + "\n", + "\n", + " @langchain/community pg @langchain/openai uuid\n", + "\n", + "```\n", + "\n", + "### Setting up an instance\n", + "\n", + "There are many ways to connect to Postgres depending on how you've set up your instance. Here's one example of a local setup using a prebuilt Docker image provided by the `pgvector` team.\n", + "\n", + "Create a file with the below content named docker-compose.yml:\n", + "\n", + "```yaml\n", + "# Run this command to start the database:\n", + "# docker-compose up --build\n", + "version: \"3\"\n", + "services:\n", + " db:\n", + " hostname: 127.0.0.1\n", + " image: pgvector/pgvector:pg16\n", + " ports:\n", + " - 5432:5432\n", + " restart: always\n", + " environment:\n", + " - POSTGRES_DB=api\n", + " - POSTGRES_USER=myuser\n", + " - POSTGRES_PASSWORD=ChangeMe\n", + " volumes:\n", + " - ./init.sql:/docker-entrypoint-initdb.d/init.sql\n", + "```\n", + "\n", + "And then in the same directory, run docker compose up to start the container.\n", + "\n", + "You can find more information on how to setup pgvector in the [official repository](https://github.com/pgvector/pgvector/).\n", + "\n", + "### Credentials\n", + "\n", + "To connect to you Postgres instance, you'll need corresponding credentials. For a full list of supported options, see the [`node-postgres` docs](https://node-postgres.com/apis/client).\n", + "\n", + "If you are using OpenAI embeddings for this guide, you'll need to set your OpenAI key as well:\n", + "\n", + "```typescript\n", + "process.env.OPENAI_API_KEY = \"YOUR_API_KEY\";\n", + "```\n", + "\n", + "If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:\n", + "\n", + "```typescript\n", + "// process.env.LANGCHAIN_TRACING_V2=\"true\"\n", + "// process.env.LANGCHAIN_API_KEY=\"your-api-key\"\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "93df377e", + "metadata": {}, + "source": [ + "## Instantiation\n", + "\n", + "To instantiate the vector store, call the `.initialize()` static method. This will automatically check for the presence of a table, given by `tableName` in the passed `config`. If it is not there, it will create it with the required columns.\n", + "\n", + "```{=mdx}\n", + "\n", + "::::danger Security\n", + "User-generated data such as usernames should not be used as input for table and column names. \n", + "**This may lead to SQL Injection!**\n", + "::::\n", + "\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "dc37144c-208d-4ab3-9f3a-0407a69fe052", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "import {\n", + " PGVectorStore,\n", + " DistanceStrategy,\n", + "} from \"@langchain/community/vectorstores/pgvector\";\n", + "import { OpenAIEmbeddings } from \"@langchain/openai\";\n", + "import { PoolConfig } from \"pg\";\n", + "\n", + "const embeddings = new OpenAIEmbeddings({\n", + " model: \"text-embedding-3-small\",\n", + "});\n", + "\n", + "// Sample config\n", + "const config = {\n", + " postgresConnectionOptions: {\n", + " type: \"postgres\",\n", + " host: \"127.0.0.1\",\n", + " port: 5433,\n", + " user: \"myuser\",\n", + " password: \"ChangeMe\",\n", + " database: \"api\",\n", + " } as PoolConfig,\n", + " tableName: \"testlangchainjs\",\n", + " columns: {\n", + " idColumnName: \"id\",\n", + " vectorColumnName: \"vector\",\n", + " contentColumnName: \"content\",\n", + " metadataColumnName: \"metadata\",\n", + " },\n", + " // supported distance strategies: cosine (default), innerProduct, or euclidean\n", + " distanceStrategy: \"cosine\" as DistanceStrategy,\n", + "};\n", + "\n", + "const vectorStore = await PGVectorStore.initialize(\n", + " embeddings,\n", + " config\n", + ");" + ] + }, + { + "cell_type": "markdown", + "id": "ac6071d4", + "metadata": {}, + "source": [ + "## Manage vector store\n", + "\n", + "### Add items to vector store" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "17f5efc0", + "metadata": {}, + "outputs": [], + "source": [ + "import { v4 as uuidv4 } from \"uuid\";\n", + "import type { Document } from \"@langchain/core/documents\";\n", + "\n", + "const document1: Document = {\n", + " pageContent: \"The powerhouse of the cell is the mitochondria\",\n", + " metadata: { source: \"https://example.com\" }\n", + "};\n", + "\n", + "const document2: Document = {\n", + " pageContent: \"Buildings are made out of brick\",\n", + " metadata: { source: \"https://example.com\" }\n", + "};\n", + "\n", + "const document3: Document = {\n", + " pageContent: \"Mitochondria are made out of lipids\",\n", + " metadata: { source: \"https://example.com\" }\n", + "};\n", + "\n", + "const document4: Document = {\n", + " pageContent: \"The 2024 Olympics are in Paris\",\n", + " metadata: { source: \"https://example.com\" }\n", + "}\n", + "\n", + "const documents = [document1, document2, document3, document4];\n", + "\n", + "const ids = [uuidv4(), uuidv4(), uuidv4(), uuidv4()]\n", + "\n", + "await vectorStore.addDocuments(documents, { ids: ids });" + ] + }, + { + "cell_type": "markdown", + "id": "dcf1b905", + "metadata": {}, + "source": [ + "### Delete items from vector store" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "ef61e188", + "metadata": {}, + "outputs": [], + "source": [ + "const id4 = ids[ids.length - 1];\n", + "\n", + "await vectorStore.delete({ ids: [id4] });" + ] + }, + { + "cell_type": "markdown", + "id": "c3620501", + "metadata": {}, + "source": [ + "## Query vector store\n", + "\n", + "Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent. \n", + "\n", + "### Query directly\n", + "\n", + "Performing a simple similarity search can be done as follows:" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "aa0a16fa", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "* The powerhouse of the cell is the mitochondria [{\"source\":\"https://example.com\"}]\n", + "* Mitochondria are made out of lipids [{\"source\":\"https://example.com\"}]\n" + ] + } + ], + "source": [ + "const filter = { source: \"https://example.com\" };\n", + "\n", + "const similaritySearchResults = await vectorStore.similaritySearch(\"biology\", 2, filter);\n", + "\n", + "for (const doc of similaritySearchResults) {\n", + " console.log(`* ${doc.pageContent} [${JSON.stringify(doc.metadata, null)}]`);\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "3ed9d733", + "metadata": {}, + "source": [ + "The above filter syntax supports exact match, but the following are also supported:\n", + "\n", + "#### Using the `in` operator\n", + "\n", + "```json\n", + "{\n", + " \"field\": {\n", + " \"in\": [\"value1\", \"value2\"],\n", + " }\n", + "}\n", + "```\n", + "\n", + "#### Using the `arrayContains` operator\n", + "\n", + "```json\n", + "{\n", + " \"field\": {\n", + " \"arrayContains\": [\"value1\", \"value2\"],\n", + " }\n", + "}\n", + "```\n", + "\n", + "If you want to execute a similarity search and receive the corresponding scores you can run:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "5efd2eaa", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "* [SIM=0.835] The powerhouse of the cell is the mitochondria [{\"source\":\"https://example.com\"}]\n", + "* [SIM=0.852] Mitochondria are made out of lipids [{\"source\":\"https://example.com\"}]\n" + ] + } + ], + "source": [ + "const similaritySearchWithScoreResults = await vectorStore.similaritySearchWithScore(\"biology\", 2, filter)\n", + "\n", + "for (const [doc, score] of similaritySearchWithScoreResults) {\n", + " console.log(`* [SIM=${score.toFixed(3)}] ${doc.pageContent} [${JSON.stringify(doc.metadata)}]`);\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "0c235cdc", + "metadata": {}, + "source": [ + "### Query by turning into retriever\n", + "\n", + "You can also transform the vector store into a [retriever](/docs/concepts/#retrievers) for easier usage in your chains. " + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "f3460093", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[\n", + " Document {\n", + " pageContent: 'The powerhouse of the cell is the mitochondria',\n", + " metadata: { source: 'https://example.com' },\n", + " id: undefined\n", + " },\n", + " Document {\n", + " pageContent: 'Mitochondria are made out of lipids',\n", + " metadata: { source: 'https://example.com' },\n", + " id: undefined\n", + " }\n", + "]\n" + ] + } + ], + "source": [ + "const retriever = vectorStore.asRetriever({\n", + " // Optional filter\n", + " filter: filter,\n", + " k: 2,\n", + "});\n", + "await retriever.invoke(\"biology\");" + ] + }, + { + "cell_type": "markdown", + "id": "e2e0a211", + "metadata": {}, + "source": [ + "### Usage for retrieval-augmented generation\n", + "\n", + "For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n", + "\n", + "- [Tutorials: working with external knowledge](/docs/tutorials/#working-with-external-knowledge).\n", + "- [How-to: Question and answer with RAG](/docs/how_to/#qa-with-rag)\n", + "- [Retrieval conceptual docs](/docs/concepts#retrieval)" + ] + }, + { + "cell_type": "markdown", + "id": "371727a8", + "metadata": {}, + "source": [ + "## Advanced: reusing connections\n", + "\n", + "You can reuse connections by creating a pool, then creating new `PGVectorStore` instances directly via the constructor.\n", + "\n", + "Note that you should call `.initialize()` to set up your database at least once to set up your tables properly before using the constructor." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "09efeac4", + "metadata": {}, + "outputs": [], + "source": [ + "import { OpenAIEmbeddings } from \"@langchain/openai\";\n", + "import { PGVectorStore } from \"@langchain/community/vectorstores/pgvector\";\n", + "import pg from \"pg\";\n", + "\n", + "// First, follow set-up instructions at\n", + "// https://js.langchain.com/docs/modules/indexes/vector_stores/integrations/pgvector\n", + "\n", + "const reusablePool = new pg.Pool({\n", + " host: \"127.0.0.1\",\n", + " port: 5433,\n", + " user: \"myuser\",\n", + " password: \"ChangeMe\",\n", + " database: \"api\",\n", + "});\n", + "\n", + "const originalConfig = {\n", + " pool: reusablePool,\n", + " tableName: \"testlangchainjs\",\n", + " collectionName: \"sample\",\n", + " collectionTableName: \"collections\",\n", + " columns: {\n", + " idColumnName: \"id\",\n", + " vectorColumnName: \"vector\",\n", + " contentColumnName: \"content\",\n", + " metadataColumnName: \"metadata\",\n", + " },\n", + "};\n", + "\n", + "// Set up the DB.\n", + "// Can skip this step if you've already initialized the DB.\n", + "// await PGVectorStore.initialize(new OpenAIEmbeddings(), originalConfig);\n", + "const pgvectorStore = new PGVectorStore(new OpenAIEmbeddings(), originalConfig);\n", + "\n", + "await pgvectorStore.addDocuments([\n", + " { pageContent: \"what's this\", metadata: { a: 2 } },\n", + " { pageContent: \"Cat drinks milk\", metadata: { a: 1 } },\n", + "]);\n", + "\n", + "const results = await pgvectorStore.similaritySearch(\"water\", 1);\n", + "\n", + "console.log(results);\n", + "\n", + "/*\n", + " [ Document { pageContent: 'Cat drinks milk', metadata: { a: 1 } } ]\n", + "*/\n", + "\n", + "const pgvectorStore2 = new PGVectorStore(new OpenAIEmbeddings(), {\n", + " pool: reusablePool,\n", + " tableName: \"testlangchainjs\",\n", + " collectionTableName: \"collections\",\n", + " collectionName: \"some_other_collection\",\n", + " columns: {\n", + " idColumnName: \"id\",\n", + " vectorColumnName: \"vector\",\n", + " contentColumnName: \"content\",\n", + " metadataColumnName: \"metadata\",\n", + " },\n", + "});\n", + "\n", + "const results2 = await pgvectorStore2.similaritySearch(\"water\", 1);\n", + "\n", + "console.log(results2);\n", + "\n", + "/*\n", + " []\n", + "*/\n", + "\n", + "await reusablePool.end();" + ] + }, + { + "cell_type": "markdown", + "id": "23bd7096", + "metadata": {}, + "source": [ + "## Create HNSW Index\n", + "\n", + "By default, the extension performs a sequential scan search, with 100% recall. You might consider creating an HNSW index for approximate nearest neighbor (ANN) search to speed up `similaritySearchVectorWithScore` execution time. To create the HNSW index on your vector column, use the `createHnswIndex()` method.\n", + "\n", + "The method parameters include:\n", + "\n", + "- `dimensions`: Defines the number of dimensions in your vector data type, up to 2000. For example, use 1536 for OpenAI's text-embedding-ada-002 and Amazon's amazon.titan-embed-text-v1 models.\n", + "\n", + "- `m?`: The max number of connections per layer (16 by default). Index build time improves with smaller values, while higher values can speed up search queries.\n", + "\n", + "- `efConstruction?`: The size of the dynamic candidate list for constructing the graph (64 by default). A higher value can potentially improve the index quality at the cost of index build time.\n", + "\n", + "- `distanceFunction?`: The distance function name you want to use, is automatically selected based on the distanceStrategy.\n", + "\n", + "For more info, see the [Pgvector GitHub repo](https://github.com/pgvector/pgvector?tab=readme-ov-file#hnsw) and the [HNSW paper from Malkov Yu A. and Yashunin D. A.. 2020. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs](https://arxiv.org/pdf/1603.09320)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5e5b9595", + "metadata": {}, + "outputs": [], + "source": [ + "import { OpenAIEmbeddings } from \"@langchain/openai\";\n", + "import {\n", + " DistanceStrategy,\n", + " PGVectorStore,\n", + "} from \"@langchain/community/vectorstores/pgvector\";\n", + "import { PoolConfig } from \"pg\";\n", + "\n", + "// First, follow set-up instructions at\n", + "// https://js.langchain.com/docs/modules/indexes/vector_stores/integrations/pgvector\n", + "\n", + "const hnswConfig = {\n", + " postgresConnectionOptions: {\n", + " type: \"postgres\",\n", + " host: \"127.0.0.1\",\n", + " port: 5433,\n", + " user: \"myuser\",\n", + " password: \"ChangeMe\",\n", + " database: \"api\",\n", + " } as PoolConfig,\n", + " tableName: \"testlangchainjs\",\n", + " columns: {\n", + " idColumnName: \"id\",\n", + " vectorColumnName: \"vector\",\n", + " contentColumnName: \"content\",\n", + " metadataColumnName: \"metadata\",\n", + " },\n", + " // supported distance strategies: cosine (default), innerProduct, or euclidean\n", + " distanceStrategy: \"cosine\" as DistanceStrategy,\n", + "};\n", + "\n", + "const hnswPgVectorStore = await PGVectorStore.initialize(\n", + " new OpenAIEmbeddings(),\n", + " hnswConfig\n", + ");\n", + "\n", + "// create the index\n", + "await hnswPgVectorStore.createHnswIndex({\n", + " dimensions: 1536,\n", + " efConstruction: 64,\n", + " m: 16,\n", + "});\n", + "\n", + "await hnswPgVectorStore.addDocuments([\n", + " { pageContent: \"what's this\", metadata: { a: 2, b: [\"tag1\", \"tag2\"] } },\n", + " { pageContent: \"Cat drinks milk\", metadata: { a: 1, b: [\"tag2\"] } },\n", + "]);\n", + "\n", + "const model = new OpenAIEmbeddings();\n", + "const query = await model.embedQuery(\"water\");\n", + "const hnswResults = await hnswPgVectorStore.similaritySearchVectorWithScore(query, 1);\n", + "\n", + "console.log(hnswResults);\n", + "\n", + "await pgvectorStore.end();" + ] + }, + { + "cell_type": "markdown", + "id": "069f1b5f", + "metadata": {}, + "source": [ + "## Closing connections\n", + "\n", + "Make sure you close the connection when you are finished to avoid excessive resource consumption:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f71ce986", + "metadata": {}, + "outputs": [], + "source": [ + "await vectorStore.end();" + ] + }, + { + "cell_type": "markdown", + "id": "8a27244f", + "metadata": {}, + "source": [ + "## API reference\n", + "\n", + "For detailed documentation of all `PGVectorStore` features and configurations head to the [API reference](https://api.js.langchain.com/classes/langchain_community_vectorstores_pgvector.PGVectorStore.html)." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "TypeScript", + "language": "typescript", + "name": "tslab" + }, + "language_info": { + "codemirror_mode": { + "mode": "typescript", + "name": "javascript", + "typescript": true + }, + "file_extension": ".ts", + "mimetype": "text/typescript", + "name": "typescript", + "version": "3.7.2" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/core_docs/docs/integrations/vectorstores/pgvector.mdx b/docs/core_docs/docs/integrations/vectorstores/pgvector.mdx deleted file mode 100644 index 049ac928db79..000000000000 --- a/docs/core_docs/docs/integrations/vectorstores/pgvector.mdx +++ /dev/null @@ -1,101 +0,0 @@ -# PGVector - -To enable vector search in a generic PostgreSQL database, LangChain.js supports using the [`pgvector`](https://github.com/pgvector/pgvector) Postgres extension. - -## Setup - -To work with PGVector, you need to install the `pg` package: - -```bash npm2yarn -npm install pg -``` - -### Setup a `pgvector` self hosted instance with `docker-compose` - -import IntegrationInstallTooltip from "@mdx_components/integration_install_tooltip.mdx"; - - - -```bash npm2yarn -npm install @langchain/openai @langchain/community -``` - -`pgvector` provides a prebuilt Docker image that can be used to quickly setup a self-hosted Postgres instance. -Create a file below named `docker-compose.yml`: - -import CodeBlock from "@theme/CodeBlock"; -import DockerExample from "@examples/indexes/vector_stores/pgvector_vectorstore/docker-compose.example.yml"; - -```yml -# Run this command to start the database: -# docker-compose up --build -version: "3" -services: - db: - hostname: 127.0.0.1 - image: ankane/pgvector - ports: - - 5432:5432 - restart: always - environment: - - POSTGRES_DB=api - - POSTGRES_USER=myuser - - POSTGRES_PASSWORD=ChangeMe - volumes: - - ./init.sql:/docker-entrypoint-initdb.d/init.sql -``` - -And then in the same directory, run `docker compose up` to start the container. - -You can find more information on how to setup `pgvector` in the [official repository](https://github.com/pgvector/pgvector). - -## Usage - -::::danger Security -User-generated data such as usernames should not be used as input for table and column names. -**This may lead to SQL Injection!** -:::: - -import Example from "@examples/indexes/vector_stores/pgvector_vectorstore/pgvector.ts"; - -One complete example of using `PGVectorStore` is the following: - -{Example} - -You can also specify a `collectionTableName` and a `collectionName` to partition vectors between multiple users or namespaces. - -### Advanced: reusing connections - -You can reuse connections by creating a pool, then creating new `PGVectorStore` instances directly via the constructor. - -Note that you should call `.initialize()` to set up your database at least once to set up your tables properly -before using the constructor. - -import ConnectionReuseExample from "@examples/indexes/vector_stores/pgvector_vectorstore/pgvector_pool.ts"; - -{ConnectionReuseExample} - -### Create HNSW Index - -By default, the extension performs a sequential scan search, with 100% recall. You might consider creating an HNSW index for approximate nearest neighbor (ANN) search to speed up similaritySearchVectorWithScore execution time. To create the HNSW index on your vector column, use the `createHnswIndex()` method: - -The method parameters include: - -**dimensions**: Defines the number of dimensions in your vector data type, up to 2000. For example, use 1536 for OpenAI's `text-embedding-ada-002` and Amazon's `amazon.titan-embed-text-v1` models. - -**m?**: The max number of connections per layer (16 by default). Index build time improves with smaller values, while higher values can speed up search queries. - -**efConstruction?**: The size of the dynamic candidate list for constructing the graph (64 by default). A higher value can potentially improve the index quality at the cost of index build time. - -**distanceFunction?**: The distance function name you want to use, is automatically selected based on the distanceStrategy. - -More info at the [`Pgvector GitHub project`](https://github.com/pgvector/pgvector?tab=readme-ov-file#hnsw) and the HNSW paper from Malkov Yu A. and Yashunin D. A.. 2020. [`Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs`](https://arxiv.org/pdf/1603.09320) - -import HnswExample from "@examples/indexes/vector_stores/pgvector_vectorstore/pgvector_hnsw.ts"; - -{HnswExample} - -## Related - -- Vector store [conceptual guide](/docs/concepts/#vectorstores) -- Vector store [how-to guides](/docs/how_to/#vectorstores) diff --git a/test-int-deps-docker-compose.yml b/test-int-deps-docker-compose.yml index c3a5901b6134..e14c4d65779a 100644 --- a/test-int-deps-docker-compose.yml +++ b/test-int-deps-docker-compose.yml @@ -24,7 +24,7 @@ services: DEFAULT_VECTORIZER_MODULE: 'none' CLUSTER_HOSTNAME: 'node1' db: - image: ankane/pgvector + image: pgvector/pgvector:pg16 ports: - 5433:5432 volumes: