diff --git a/docs/core_docs/docs/integrations/text_embedding/openai.ipynb b/docs/core_docs/docs/integrations/text_embedding/openai.ipynb
new file mode 100644
index 000000000000..3c820eec3d55
--- /dev/null
+++ b/docs/core_docs/docs/integrations/text_embedding/openai.ipynb
@@ -0,0 +1,418 @@
+{
+ "cells": [
+ {
+ "cell_type": "raw",
+ "id": "afaf8039",
+ "metadata": {
+ "vscode": {
+ "languageId": "raw"
+ }
+ },
+ "source": [
+ "---\n",
+ "sidebar_label: OpenAI\n",
+ "---"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9a3d6f34",
+ "metadata": {},
+ "source": [
+ "# OpenAI\n",
+ "\n",
+ "This will help you get started with OpenAIEmbeddings [embedding models](/docs/concepts#embedding-models) using LangChain. For detailed documentation on `OpenAIEmbeddings` features and configuration options, please refer to the [API reference](https://api.js.langchain.com/classes/langchain_openai.OpenAIEmbeddings.html).\n",
+ "\n",
+ "## Overview\n",
+ "### Integration details\n",
+ "\n",
+ "| Class | Package | Local | [Py support](https://python.langchain.com/docs/integrations/text_embedding/openai/) | Package downloads | Package latest |\n",
+ "| :--- | :--- | :---: | :---: | :---: | :---: |\n",
+ "| [OpenAIEmbeddings](https://api.js.langchain.com/classes/langchain_openai.OpenAIEmbeddings.html) | [@langchain/openai](https://api.js.langchain.com/modules/langchain_openai.html) | ❌ | ✅ | ![NPM - Downloads](https://img.shields.io/npm/dm/@langchain/openai?style=flat-square&label=%20&) | ![NPM - Version](https://img.shields.io/npm/v/@langchain/openai?style=flat-square&label=%20&) |\n",
+ "\n",
+ "## Setup\n",
+ "\n",
+ "To access OpenAIEmbeddings embedding models you'll need to create an OpenAI account, get an API key, and install the `@langchain/openai` integration package.\n",
+ "\n",
+ "### Credentials\n",
+ "\n",
+ "Head to [platform.openai.com](https://platform.openai.com) to sign up to OpenAI and generate an API key. Once you've done this set the `OPENAI_API_KEY` environment variable:\n",
+ "\n",
+ "```bash\n",
+ "export OPENAI_API_KEY=\"your-api-key\"\n",
+ "```\n",
+ "\n",
+ "If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:\n",
+ "\n",
+ "```bash\n",
+ "# export LANGCHAIN_TRACING_V2=\"true\"\n",
+ "# export LANGCHAIN_API_KEY=\"your-api-key\"\n",
+ "```\n",
+ "\n",
+ "### Installation\n",
+ "\n",
+ "The LangChain OpenAIEmbeddings integration lives in the `@langchain/openai` package:\n",
+ "\n",
+ "```{=mdx}\n",
+ "import IntegrationInstallTooltip from \"@mdx_components/integration_install_tooltip.mdx\";\n",
+ "import Npm2Yarn from \"@theme/Npm2Yarn\";\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ " @langchain/openai\n",
+ "\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "45dd1724",
+ "metadata": {},
+ "source": [
+ "## Instantiation\n",
+ "\n",
+ "Now we can instantiate our model object and generate chat completions:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "id": "9ea7a09b",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import { OpenAIEmbeddings } from \"@langchain/openai\";\n",
+ "\n",
+ "const embeddings = new OpenAIEmbeddings({\n",
+ " apiKey: \"YOUR-API-KEY\", // In Node.js defaults to process.env.OPENAI_API_KEY\n",
+ " batchSize: 512, // Default value if omitted is 512. Max is 2048\n",
+ " model: \"text-embedding-3-large\",\n",
+ "});"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "fb4153d3",
+ "metadata": {},
+ "source": [
+ "If you're part of an organization, you can set `process.env.OPENAI_ORGANIZATION` to your OpenAI organization id, or pass it in as `organization` when\n",
+ "initializing the model."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "77d271b6",
+ "metadata": {},
+ "source": [
+ "## Indexing and Retrieval\n",
+ "\n",
+ "Embedding models are often used in retrieval-augmented generation (RAG) flows, both as part of indexing data as well as later retrieving it. For more detailed instructions, please see our RAG tutorials under the [working with external knowledge tutorials](/docs/tutorials/#working-with-external-knowledge).\n",
+ "\n",
+ "Below, see how to index and retrieve data using the `embeddings` object we initialized above. In this example, we will index and retrieve a sample document using the demo [`MemoryVectorStore`](/docs/integrations/vectorstores/memory)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "id": "d817716b",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "LangChain is the framework for building context-aware reasoning applications\n"
+ ]
+ }
+ ],
+ "source": [
+ "// Create a vector store with a sample text\n",
+ "import { MemoryVectorStore } from \"langchain/vectorstores/memory\";\n",
+ "\n",
+ "const text = \"LangChain is the framework for building context-aware reasoning applications\";\n",
+ "\n",
+ "const vectorstore = await MemoryVectorStore.fromDocuments(\n",
+ " [{ pageContent: text, metadata: {} }],\n",
+ " embeddings,\n",
+ ");\n",
+ "\n",
+ "// Use the vector store as a retriever that returns a single document\n",
+ "const retriever = vectorstore.asRetriever(1);\n",
+ "\n",
+ "// Retrieve the most similar text\n",
+ "const retrievedDocuments = await retriever.invoke(\"What is LangChain?\");\n",
+ "\n",
+ "retrievedDocuments[0].pageContent;"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e02b9855",
+ "metadata": {},
+ "source": [
+ "## Direct Usage\n",
+ "\n",
+ "Under the hood, the vectorstore and retriever implementations are calling `embeddings.embedDocument(...)` and `embeddings.embedQuery(...)` to create embeddings for the text(s) used in `fromDocuments` and the retriever's `invoke` operations, respectively.\n",
+ "\n",
+ "You can directly call these methods to get embeddings for your own use cases.\n",
+ "\n",
+ "### Embed single texts\n",
+ "\n",
+ "You can embed queries for search with `embedQuery`. This generates a vector representation specific to the query:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "id": "0d2befcd",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[\n",
+ " -0.01927683, 0.0037708976, -0.032942563, 0.0037671267, 0.008175306,\n",
+ " -0.012511838, -0.009713832, 0.021403614, -0.015377721, 0.0018684798,\n",
+ " 0.020574018, 0.022399133, -0.02322873, -0.01524951, -0.00504169,\n",
+ " -0.007375876, -0.03448109, 0.00015130726, 0.021388533, -0.012564631,\n",
+ " -0.020031009, 0.027406884, -0.039217334, 0.03036327, 0.030393435,\n",
+ " -0.021750538, 0.032610722, -0.021162277, -0.025898525, 0.018869571,\n",
+ " 0.034179416, -0.013371604, 0.0037652412, -0.02146395, 0.0012641934,\n",
+ " -0.055688616, 0.05104287, 0.0024982197, -0.019095825, 0.0037369595,\n",
+ " 0.00088757504, 0.025189597, -0.018779071, 0.024978427, 0.016833287,\n",
+ " -0.0025868358, -0.011727491, -0.0021154736, -0.017738303, 0.0013839195,\n",
+ " -0.0131151825, -0.05405959, 0.029729757, -0.003393808, 0.019774588,\n",
+ " 0.028885076, 0.004355387, 0.026094612, 0.06479911, 0.038040817,\n",
+ " -0.03478276, -0.012594799, -0.024767255, -0.0031430433, 0.017874055,\n",
+ " -0.015294761, 0.005709139, 0.025355516, 0.044798266, 0.02549127,\n",
+ " -0.02524993, 0.00014553308, -0.019427665, -0.023545485, 0.008748483,\n",
+ " 0.019850006, -0.028417485, -0.001860938, -0.02318348, -0.010799851,\n",
+ " 0.04793565, -0.0048983963, 0.02193154, -0.026411368, 0.026426451,\n",
+ " -0.012149832, 0.035355937, -0.047814984, -0.027165547, -0.008228099,\n",
+ " -0.007737882, 0.023726488, -0.046487626, -0.007783133, -0.019638835,\n",
+ " 0.01793439, -0.018024892, 0.0030336871, -0.019578502, 0.0042837397\n",
+ "]\n"
+ ]
+ }
+ ],
+ "source": [
+ "const singleVector = await embeddings.embedQuery(text);\n",
+ "\n",
+ "console.log(singleVector.slice(0, 100));"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1b5a7d03",
+ "metadata": {},
+ "source": [
+ "### Embed multiple texts\n",
+ "\n",
+ "You can embed multiple texts for indexing with `embedDocuments`. The internals used for this method may (but do not have to) differ from embedding queries:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "id": "2f4d6e97",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[\n",
+ " -0.01927683, 0.0037708976, -0.032942563, 0.0037671267, 0.008175306,\n",
+ " -0.012511838, -0.009713832, 0.021403614, -0.015377721, 0.0018684798,\n",
+ " 0.020574018, 0.022399133, -0.02322873, -0.01524951, -0.00504169,\n",
+ " -0.007375876, -0.03448109, 0.00015130726, 0.021388533, -0.012564631,\n",
+ " -0.020031009, 0.027406884, -0.039217334, 0.03036327, 0.030393435,\n",
+ " -0.021750538, 0.032610722, -0.021162277, -0.025898525, 0.018869571,\n",
+ " 0.034179416, -0.013371604, 0.0037652412, -0.02146395, 0.0012641934,\n",
+ " -0.055688616, 0.05104287, 0.0024982197, -0.019095825, 0.0037369595,\n",
+ " 0.00088757504, 0.025189597, -0.018779071, 0.024978427, 0.016833287,\n",
+ " -0.0025868358, -0.011727491, -0.0021154736, -0.017738303, 0.0013839195,\n",
+ " -0.0131151825, -0.05405959, 0.029729757, -0.003393808, 0.019774588,\n",
+ " 0.028885076, 0.004355387, 0.026094612, 0.06479911, 0.038040817,\n",
+ " -0.03478276, -0.012594799, -0.024767255, -0.0031430433, 0.017874055,\n",
+ " -0.015294761, 0.005709139, 0.025355516, 0.044798266, 0.02549127,\n",
+ " -0.02524993, 0.00014553308, -0.019427665, -0.023545485, 0.008748483,\n",
+ " 0.019850006, -0.028417485, -0.001860938, -0.02318348, -0.010799851,\n",
+ " 0.04793565, -0.0048983963, 0.02193154, -0.026411368, 0.026426451,\n",
+ " -0.012149832, 0.035355937, -0.047814984, -0.027165547, -0.008228099,\n",
+ " -0.007737882, 0.023726488, -0.046487626, -0.007783133, -0.019638835,\n",
+ " 0.01793439, -0.018024892, 0.0030336871, -0.019578502, 0.0042837397\n",
+ "]\n",
+ "[\n",
+ " -0.010181213, 0.023419594, -0.04215527, -0.0015320902, -0.023573855,\n",
+ " -0.0091644935, -0.014893179, 0.019016149, -0.023475688, 0.0010219777,\n",
+ " 0.009255648, 0.03996757, -0.04366983, -0.01640774, -0.020194141,\n",
+ " 0.019408813, -0.027977299, -0.022017224, 0.013539891, -0.007769135,\n",
+ " 0.032647192, -0.015089511, -0.022900717, 0.023798235, 0.026084099,\n",
+ " -0.024625633, 0.035003178, -0.017978394, -0.049615882, 0.013364594,\n",
+ " 0.031132633, 0.019142363, 0.023195215, -0.038396914, 0.005584942,\n",
+ " -0.031946007, 0.053682756, -0.0036356465, 0.011240003, 0.0056690844,\n",
+ " -0.0062791156, 0.044146635, -0.037387207, 0.01300699, 0.018946031,\n",
+ " 0.0050415234, 0.029618073, -0.021750772, -0.000649473, 0.00026951815,\n",
+ " -0.014710871, -0.029814405, 0.04204308, -0.014710871, 0.0039616977,\n",
+ " -0.021512369, 0.054608323, 0.021484323, 0.02790718, -0.010573876,\n",
+ " -0.023952495, -0.035143413, -0.048802506, -0.0075798146, 0.023279356,\n",
+ " -0.022690361, -0.016590048, 0.0060477243, 0.014100839, 0.005476258,\n",
+ " -0.017221114, -0.0100059165, -0.017922299, -0.021989176, 0.01830094,\n",
+ " 0.05516927, 0.001033372, 0.0017310516, -0.00960624, -0.037864015,\n",
+ " 0.013063084, 0.006591143, -0.010160177, 0.0011394264, 0.04953174,\n",
+ " 0.004806626, 0.029421741, -0.037751824, 0.003618117, 0.007162609,\n",
+ " 0.027696826, -0.0021070621, -0.024485396, -0.0042141243, -0.02801937,\n",
+ " -0.019605145, 0.016281527, -0.035143413, 0.01640774, 0.042323552\n",
+ "]\n"
+ ]
+ }
+ ],
+ "source": [
+ "const text2 = \"LangGraph is a library for building stateful, multi-actor applications with LLMs\";\n",
+ "\n",
+ "const vectors = await embeddings.embedDocuments([text, text2]);\n",
+ "\n",
+ "console.log(vectors[0].slice(0, 100));\n",
+ "console.log(vectors[1].slice(0, 100));"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2b1a3527",
+ "metadata": {},
+ "source": [
+ "## Specifying dimensions\n",
+ "\n",
+ "With the `text-embedding-3` class of models, you can specify the size of the embeddings you want returned. For example by default `text-embedding-3-large` returns embeddings of dimension 3072:\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "id": "a611fe1a",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "3072\n"
+ ]
+ }
+ ],
+ "source": [
+ "import { OpenAIEmbeddings } from \"@langchain/openai\";\n",
+ "\n",
+ "const embeddingsDefaultDimensions = new OpenAIEmbeddings({\n",
+ " model: \"text-embedding-3-large\",\n",
+ "});\n",
+ "\n",
+ "const vectorsDefaultDimensions = await embeddingsDefaultDimensions.embedDocuments([\"some text\"]);\n",
+ "console.log(vectorsDefaultDimensions[0].length);"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "08efe771",
+ "metadata": {},
+ "source": [
+ "But by passing in `dimensions: 1024` we can reduce the size of our embeddings to 1024:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "id": "19667fdb",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "1024\n"
+ ]
+ }
+ ],
+ "source": [
+ "import { OpenAIEmbeddings } from \"@langchain/openai\";\n",
+ "\n",
+ "const embeddings1024 = new OpenAIEmbeddings({\n",
+ " model: \"text-embedding-3-large\",\n",
+ " dimensions: 1024,\n",
+ "});\n",
+ "\n",
+ "const vectors1024 = await embeddings1024.embedDocuments([\"some text\"]);\n",
+ "console.log(vectors1024[0].length);"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6b84c0df",
+ "metadata": {},
+ "source": [
+ "## Custom URLs\n",
+ "\n",
+ "You can customize the base URL the SDK sends requests to by passing a `configuration` parameter like this:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "3bfa20a6",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import { OpenAIEmbeddings } from \"@langchain/openai\";\n",
+ "\n",
+ "const model = new OpenAIEmbeddings({\n",
+ " configuration: {\n",
+ " baseURL: \"https://your_custom_url.com\",\n",
+ " },\n",
+ "});"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ac3cac9b",
+ "metadata": {},
+ "source": [
+ "You can also pass other `ClientOptions` parameters accepted by the official SDK.\n",
+ "\n",
+ "If you are hosting on Azure OpenAI, see the [dedicated page instead](/docs/integrations/text_embedding/azure_openai)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8938e581",
+ "metadata": {},
+ "source": [
+ "## API reference\n",
+ "\n",
+ "For detailed documentation of all OpenAIEmbeddings features and configurations head to the API reference: https://api.js.langchain.com/classes/langchain_openai.OpenAIEmbeddings.html"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "TypeScript",
+ "language": "typescript",
+ "name": "tslab"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "mode": "typescript",
+ "name": "javascript",
+ "typescript": true
+ },
+ "file_extension": ".ts",
+ "mimetype": "text/typescript",
+ "name": "typescript",
+ "version": "3.7.2"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/docs/core_docs/docs/integrations/text_embedding/openai.mdx b/docs/core_docs/docs/integrations/text_embedding/openai.mdx
deleted file mode 100644
index bba94b0777ee..000000000000
--- a/docs/core_docs/docs/integrations/text_embedding/openai.mdx
+++ /dev/null
@@ -1,77 +0,0 @@
----
-keywords: [openaiembeddings]
----
-
-# OpenAI
-
-The `OpenAIEmbeddings` class uses the OpenAI API to generate embeddings for a given text. By default it strips new line characters from the text, as recommended by OpenAI, but you can disable this by passing `stripNewLines: false` to the constructor.
-
-import IntegrationInstallTooltip from "@mdx_components/integration_install_tooltip.mdx";
-
-
-
-```bash npm2yarn
-npm install @langchain/openai
-```
-
-```typescript
-import { OpenAIEmbeddings } from "@langchain/openai";
-
-const embeddings = new OpenAIEmbeddings({
- apiKey: "YOUR-API-KEY", // In Node.js defaults to process.env.OPENAI_API_KEY
- batchSize: 512, // Default value if omitted is 512. Max is 2048
- model: "text-embedding-3-large",
-});
-```
-
-If you're part of an organization, you can set `process.env.OPENAI_ORGANIZATION` to your OpenAI organization id, or pass it in as `organization` when
-initializing the model.
-
-## Specifying dimensions
-
-With the `text-embedding-3` class of models, you can specify the size of the embeddings you want returned. For example by default `text-embedding-3-large` returns embeddings of dimension 3072:
-
-```typescript
-const embeddings = new OpenAIEmbeddings({
- model: "text-embedding-3-large",
-});
-
-const vectors = await embeddings.embedDocuments(["some text"]);
-console.log(vectors[0].length);
-```
-
-```
-3072
-```
-
-But by passing in `dimensions: 1024` we can reduce the size of our embeddings to 1024:
-
-```typescript
-const embeddings1024 = new OpenAIEmbeddings({
- model: "text-embedding-3-large",
- dimensions: 1024,
-});
-
-const vectors2 = await embeddings1024.embedDocuments(["some text"]);
-console.log(vectors2[0].length);
-```
-
-```
-1024
-```
-
-## Custom URLs
-
-You can customize the base URL the SDK sends requests to by passing a `configuration` parameter like this:
-
-```typescript
-const model = new OpenAIEmbeddings({
- configuration: {
- baseURL: "https://your_custom_url.com",
- },
-});
-```
-
-You can also pass other `ClientOptions` parameters accepted by the official SDK.
-
-If you are hosting on Azure OpenAI, see the [dedicated page instead](/docs/integrations/text_embedding/azure_openai).
diff --git a/libs/langchain-scripts/src/cli/docs/embeddings.ts b/libs/langchain-scripts/src/cli/docs/embeddings.ts
new file mode 100644
index 000000000000..03e1cc96f99e
--- /dev/null
+++ b/libs/langchain-scripts/src/cli/docs/embeddings.ts
@@ -0,0 +1,186 @@
+import * as path from "node:path";
+import * as fs from "node:fs";
+import {
+ boldText,
+ getUserInput,
+ greenText,
+ redBackground,
+} from "../utils/get-input.js";
+
+const PACKAGE_NAME_PLACEHOLDER = "__package_name__";
+const MODULE_NAME_PLACEHOLDER = "__ModuleName__";
+const SIDEBAR_LABEL_PLACEHOLDER = "__sidebar_label__";
+const FULL_IMPORT_PATH_PLACEHOLDER = "__full_import_path__";
+const LOCAL_PLACEHOLDER = "__local__";
+const PY_SUPPORT_PLACEHOLDER = "__py_support__";
+const ENV_VAR_NAME_PLACEHOLDER = "__env_var_name__";
+const API_REF_MODULE_PLACEHOLDER = "__api_ref_module__";
+const API_REF_PACKAGE_PLACEHOLDER = "__api_ref_package__";
+const PYTHON_DOC_URL_PLACEHOLDER = "__python_doc_url__";
+
+const TEMPLATE_PATH = path.resolve(
+ "./src/cli/docs/templates/text_embedding.ipynb"
+);
+const INTEGRATIONS_DOCS_PATH = path.resolve(
+ "../../docs/core_docs/docs/integrations/text_embedding"
+);
+
+const fetchAPIRefUrl = async (url: string): Promise => {
+ try {
+ const res = await fetch(url);
+ if (res.status !== 200) {
+ throw new Error(`API Reference URL ${url} not found.`);
+ }
+ return true;
+ } catch (_) {
+ return false;
+ }
+};
+
+type ExtraFields = {
+ local: boolean;
+ pySupport: boolean;
+ packageName: string;
+ fullImportPath?: string;
+ envVarName: string;
+};
+
+async function promptExtraFields(fields: {
+ envVarGuess: string;
+ packageNameGuess: string;
+ isCommunity: boolean;
+}): Promise {
+ const { envVarGuess, packageNameGuess, isCommunity } = fields;
+ const canRunLocally = await getUserInput(
+ "Does this embeddings model support local usage? (y/n) ",
+ undefined,
+ true
+ );
+ const hasPySupport = await getUserInput(
+ "Does this integration have Python support? (y/n) ",
+ undefined,
+ true
+ );
+
+ let packageName = packageNameGuess;
+ if (!isCommunity) {
+ // If it's not community, get the package name.
+
+ const isOtherPackageName = await getUserInput(
+ `Is this integration part of the ${packageNameGuess} package? (y/n) `
+ );
+ if (isOtherPackageName.toLowerCase() === "n") {
+ packageName = await getUserInput(
+ "What is the name of the package this integration is located in? (e.g @langchain/openai) ",
+ undefined,
+ true
+ );
+ if (
+ !packageName.startsWith("@langchain/") &&
+ !packageName.startsWith("langchain/")
+ ) {
+ packageName = await getUserInput(
+ "Packages must start with either '@langchain/' or 'langchain/'. Please enter a valid package name: ",
+ undefined,
+ true
+ );
+ }
+ }
+ }
+
+ // If it's community or langchain, ask for the full import path
+ let fullImportPath: string | undefined;
+ if (
+ packageName.startsWith("@langchain/community") ||
+ packageName.startsWith("langchain/")
+ ) {
+ fullImportPath = await getUserInput(
+ "What is the full import path of the package? (e.g '@langchain/community/embeddings/togetherai') ",
+ undefined,
+ true
+ );
+ }
+
+ const envVarName = await getUserInput(
+ `Is the environment variable for the API key named ${envVarGuess}? If it is, reply with 'y', else reply with the correct name: `,
+ undefined,
+ true
+ );
+
+ return {
+ local: canRunLocally.toLowerCase() === "y",
+ pySupport: hasPySupport.toLowerCase() === "y",
+ packageName,
+ fullImportPath,
+ envVarName:
+ envVarName.toLowerCase() === "y" ? envVarGuess : envVarName.toUpperCase(),
+ };
+}
+
+export async function fillEmbeddingsIntegrationDocTemplate(fields: {
+ packageName: string;
+ moduleName: string;
+ isCommunity: boolean;
+}) {
+ const sidebarLabel = fields.moduleName.replace("Embeddings", "");
+ const pyDocUrl = `https://python.langchain.com/docs/integrations/text_embedding/${sidebarLabel.toLowerCase()}/`;
+ let envVarName = `${sidebarLabel.toUpperCase()}_API_KEY`;
+ const extraFields = await promptExtraFields({
+ packageNameGuess: `@langchain/${fields.packageName}`,
+ envVarGuess: envVarName,
+ isCommunity: fields.isCommunity,
+ });
+ envVarName = extraFields.envVarName;
+ const { pySupport } = extraFields;
+ const localSupport = extraFields.local;
+ const { packageName } = extraFields;
+ const fullImportPath = extraFields.fullImportPath ?? extraFields.packageName;
+
+ const apiRefModuleUrl = `https://api.js.langchain.com/classes/${fullImportPath
+ .replace("@", "")
+ .replaceAll("/", "_")
+ .replaceAll("-", "_")}.${fields.moduleName}.html`;
+ const apiRefPackageUrl = apiRefModuleUrl
+ .replace("/classes/", "/modules/")
+ .replace(`.${fields.moduleName}.html`, ".html");
+
+ const apiRefUrlSuccesses = await Promise.all([
+ fetchAPIRefUrl(apiRefModuleUrl),
+ fetchAPIRefUrl(apiRefPackageUrl),
+ ]);
+ if (apiRefUrlSuccesses.find((s) => !s)) {
+ console.warn(
+ "API ref URLs invalid. Please manually ensure they are correct."
+ );
+ }
+
+ const docTemplate = (await fs.promises.readFile(TEMPLATE_PATH, "utf-8"))
+ .replaceAll(PACKAGE_NAME_PLACEHOLDER, packageName)
+ .replaceAll(MODULE_NAME_PLACEHOLDER, fields.moduleName)
+ .replaceAll(SIDEBAR_LABEL_PLACEHOLDER, sidebarLabel)
+ .replaceAll(FULL_IMPORT_PATH_PLACEHOLDER, fullImportPath)
+ .replaceAll(LOCAL_PLACEHOLDER, localSupport ? "✅" : "❌")
+ .replaceAll(PY_SUPPORT_PLACEHOLDER, pySupport ? "✅" : "❌")
+ .replaceAll(ENV_VAR_NAME_PLACEHOLDER, envVarName)
+ .replaceAll(API_REF_MODULE_PLACEHOLDER, apiRefModuleUrl)
+ .replaceAll(API_REF_PACKAGE_PLACEHOLDER, apiRefPackageUrl)
+ .replaceAll(PYTHON_DOC_URL_PLACEHOLDER, pyDocUrl);
+
+ const docFileName = fullImportPath.split("/").pop();
+ const docPath = path.join(INTEGRATIONS_DOCS_PATH, `${docFileName}.ipynb`);
+ await fs.promises.writeFile(docPath, docTemplate);
+ const prettyDocPath = docPath.split("docs/core_docs/")[1];
+
+ const updatePythonDocUrlText = ` ${redBackground(
+ "- Update the Python documentation URL with the proper URL."
+ )}`;
+ const successText = `\nSuccessfully created new chat model integration doc at ${prettyDocPath}.`;
+
+ console.log(
+ `${greenText(successText)}\n
+${boldText("Next steps:")}
+${extraFields?.pySupport ? updatePythonDocUrlText : ""}
+ - Run all code cells in the generated doc to record the outputs.
+ - Add extra sections on integration specific features.\n`
+ );
+}
diff --git a/libs/langchain-scripts/src/cli/docs/index.ts b/libs/langchain-scripts/src/cli/docs/index.ts
index d664a220a240..1baba8b900bf 100644
--- a/libs/langchain-scripts/src/cli/docs/index.ts
+++ b/libs/langchain-scripts/src/cli/docs/index.ts
@@ -5,6 +5,7 @@ import { Command } from "commander";
import { fillChatIntegrationDocTemplate } from "./chat.js";
import { fillDocLoaderIntegrationDocTemplate } from "./document_loaders.js";
import { fillLLMIntegrationDocTemplate } from "./llms.js";
+import { fillEmbeddingsIntegrationDocTemplate } from "./embeddings.js";
type CLIInput = {
package: string;
@@ -57,9 +58,16 @@ async function main() {
isCommunity,
});
break;
+ case "embeddings":
+ await fillEmbeddingsIntegrationDocTemplate({
+ packageName,
+ moduleName,
+ isCommunity,
+ });
+ break;
default:
console.error(
- `Invalid type: ${type}.\nOnly 'chat', 'llm' and 'doc_loader' are supported at this time.`
+ `Invalid type: ${type}.\nOnly 'chat', 'llm', 'embeddings' and 'doc_loader' are supported at this time.`
);
process.exit(1);
}
diff --git a/libs/langchain-scripts/src/cli/docs/templates/text_embedding.ipynb b/libs/langchain-scripts/src/cli/docs/templates/text_embedding.ipynb
new file mode 100644
index 000000000000..48d85f8a3afc
--- /dev/null
+++ b/libs/langchain-scripts/src/cli/docs/templates/text_embedding.ipynb
@@ -0,0 +1,228 @@
+{
+ "cells": [
+ {
+ "cell_type": "raw",
+ "id": "afaf8039",
+ "metadata": {
+ "vscode": {
+ "languageId": "raw"
+ }
+ },
+ "source": [
+ "---\n",
+ "sidebar_label: __sidebar_label__\n",
+ "---"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9a3d6f34",
+ "metadata": {},
+ "source": [
+ "# __ModuleName__\n",
+ "\n",
+ "- [ ] TODO: Make sure API reference link is correct\n",
+ "\n",
+ "This will help you get started with __ModuleName__ [embedding models](/docs/concepts#embedding-models) using LangChain. For detailed documentation on `__ModuleName__` features and configuration options, please refer to the [API reference](__api_ref_module__).\n",
+ "\n",
+ "## Overview\n",
+ "### Integration details\n",
+ "\n",
+ "- TODO: Fill in table features.\n",
+ "- TODO: Remove JS support link if not relevant, otherwise ensure link is correct.\n",
+ "- TODO: Make sure API reference links are correct.\n",
+ "\n",
+ "| Class | Package | Local | [Py support](__python_doc_url__) | Package downloads | Package latest |\n",
+ "| :--- | :--- | :---: | :---: | :---: | :---: |\n",
+ "| [__ModuleName__](__api_ref_module__) | [__package_name__](__api_ref_package__) | __local__ | __py_support__ | ![NPM - Downloads](https://img.shields.io/npm/dm/__package_name__?style=flat-square&label=%20&) | ![NPM - Version](https://img.shields.io/npm/v/__package_name__?style=flat-square&label=%20&) |\n",
+ "\n",
+ "## Setup\n",
+ "\n",
+ "- [ ] TODO: Update with relevant info.\n",
+ "\n",
+ "To access __sidebar_label__ embedding models you'll need to create a/an __ModuleName__ account, get an API key, and install the `__package_name__` integration package.\n",
+ "\n",
+ "### Credentials\n",
+ "\n",
+ "- TODO: Update with relevant info.\n",
+ "\n",
+ "Head to (TODO: link) to sign up to `__sidebar_label__` and generate an API key. Once you've done this set the `__env_var_name__` environment variable:\n",
+ "\n",
+ "```bash\n",
+ "export __env_var_name__=\"your-api-key\"\n",
+ "```\n",
+ "\n",
+ "If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:\n",
+ "\n",
+ "```bash\n",
+ "# export LANGCHAIN_TRACING_V2=\"true\"\n",
+ "# export LANGCHAIN_API_KEY=\"your-api-key\"\n",
+ "```\n",
+ "\n",
+ "### Installation\n",
+ "\n",
+ "The LangChain __ModuleName__ integration lives in the `__package_name__` package:\n",
+ "\n",
+ "```{=mdx}\n",
+ "import IntegrationInstallTooltip from \"@mdx_components/integration_install_tooltip.mdx\";\n",
+ "import Npm2Yarn from \"@theme/Npm2Yarn\";\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ " __package_name__\n",
+ "\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "45dd1724",
+ "metadata": {},
+ "source": [
+ "## Instantiation\n",
+ "\n",
+ "Now we can instantiate our model object and generate chat completions:\n",
+ "\n",
+ "- TODO: Update model instantiation with relevant params."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "9ea7a09b",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import { __ModuleName__ } from \"__full_import_path__\";\n",
+ "\n",
+ "const embeddings = new __ModuleName__({\n",
+ " model: \"model-name\",\n",
+ " // ...\n",
+ "});"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "77d271b6",
+ "metadata": {},
+ "source": [
+ "## Indexing and Retrieval\n",
+ "\n",
+ "Embedding models are often used in retrieval-augmented generation (RAG) flows, both as part of indexing data as well as later retrieving it. For more detailed instructions, please see our RAG tutorials under the [working with external knowledge tutorials](/docs/tutorials/#working-with-external-knowledge).\n",
+ "\n",
+ "Below, see how to index and retrieve data using the `embeddings` object we initialized above. In this example, we will index and retrieve a sample document using the demo [`MemoryVectorStore`](/docs/integrations/vectorstores/memory)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "d817716b",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "// Create a vector store with a sample text\n",
+ "import { MemoryVectorStore } from \"langchain/vectorstores/memory\";\n",
+ "\n",
+ "const text = \"LangChain is the framework for building context-aware reasoning applications\";\n",
+ "\n",
+ "const vectorstore = await MemoryVectorStore.fromDocuments(\n",
+ " [{ pageContent: text, metadata: {} }],\n",
+ " embeddings,\n",
+ ");\n",
+ "\n",
+ "// Use the vector store as a retriever that returns a single document\n",
+ "const retriever = vectorstore.asRetriever(1);\n",
+ "\n",
+ "// Retrieve the most similar text\n",
+ "const retrievedDocument = await retriever.invoke(\"What is LangChain?\");\n",
+ "\n",
+ "retrievedDocument.pageContent;"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e02b9855",
+ "metadata": {},
+ "source": [
+ "## Direct Usage\n",
+ "\n",
+ "Under the hood, the vectorstore and retriever implementations are calling `embeddings.embedDocument(...)` and `embeddings.embedQuery(...)` to create embeddings for the text(s) used in `fromDocuments` and the retriever's `invoke` operations, respectively.\n",
+ "\n",
+ "You can directly call these methods to get embeddings for your own use cases.\n",
+ "\n",
+ "### Embed single texts\n",
+ "\n",
+ "You can embed queries for search with `embedQuery`. This generates a vector representation specific to the query:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "0d2befcd",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "const singleVector = await embeddings.embedQuery(text);\n",
+ "\n",
+ "console.log(singleVector.slice(0, 100));"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1b5a7d03",
+ "metadata": {},
+ "source": [
+ "### Embed multiple texts\n",
+ "\n",
+ "You can embed multiple texts for indexing with `embedDocuments`. The internals used for this method may (but do not have to) differ from embedding queries:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "2f4d6e97",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "const text2 = \"LangGraph is a library for building stateful, multi-actor applications with LLMs\";\n",
+ "\n",
+ "const vectors = await embeddings.embedDocuments([text, text2]);\n",
+ "\n",
+ "console.log(vectors[0].slice(0, 100));\n",
+ "console.log(vectors[1].slice(0, 100));"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8938e581",
+ "metadata": {},
+ "source": [
+ "## API reference\n",
+ "\n",
+ "For detailed documentation of all __ModuleName__ features and configurations head to the API reference: __api_ref_module__"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "TypeScript",
+ "language": "typescript",
+ "name": "tslab"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "typescript",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.5"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}