Merge branch 'langchain-ai:master' into documentation/add_update_docu…

…mentation_for_oci
langchain-ai · Feb 13, 2024 · e6b0923 · e6b0923
2 parents 5754b5a + db6f266
commit e6b0923
Show file tree

Hide file tree

Showing 94 changed files with 5,208 additions and 358 deletions.
diff --git a/cookbook/apache_kafka_message_handling.ipynb b/cookbook/apache_kafka_message_handling.ipynb
diff --git a/docs/docs/integrations/document_loaders/pebblo.ipynb b/docs/docs/integrations/document_loaders/pebblo.ipynb
@@ -0,0 +1,88 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Pebblo Safe DocumentLoader\n",
+    "\n",
+    "> [Pebblo](https://github.com/daxa-ai/pebblo) enables developers to safely load data and promote their Gen AI app to deployment without worrying about the organization’s compliance and security requirements. The project identifies semantic topics and entities found in the loaded data and summarizes them on the UI or a PDF report.\n",
+    "\n",
+    "Pebblo has two components.\n",
+    "\n",
+    "1. Pebblo Safe DocumentLoader for Langchain\n",
+    "1. Pebblo Daemon\n",
+    "\n",
+    "This document describes how to augment your existing Langchain DocumentLoader with Pebblo Safe DocumentLoader to get deep data visibility on the types of Topics and Entities ingested into the Gen-AI Langchain application. For details on `Pebblo Daemon` see this [pebblo daemon](https://daxa-ai.github.io/pebblo-docs/daemon.html) document.\n",
+    "\n",
+    "Pebblo Safeloader enables safe data ingestion for Langchain `DocumentLoader`. This is done by wrapping the document loader call with `Pebblo Safe DocumentLoader`.\n",
+    "\n",
+    "#### How to Pebblo enable Document Loading?\n",
+    "\n",
+    "Assume a Langchain RAG application snippet using `CSVLoader` to read a CSV document for inference.\n",
+    "\n",
+    "Here is the snippet of Document loading using `CSVLoader`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.document_loaders.csv_loader import CSVLoader\n",
+    "\n",
+    "loader = CSVLoader(\"data/corp_sens_data.csv\")\n",
+    "documents = loader.load()\n",
+    "print(documents)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The Pebblo SafeLoader can be enabled with few lines of code change to the above snippet."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.document_loaders.csv_loader import CSVLoader\n",
+    "from langchain_community.document_loaders import PebbloSafeLoader\n",
+    "\n",
+    "loader = PebbloSafeLoader(\n",
+    "    CSVLoader(\"data/corp_sens_data.csv\"),\n",
+    "    name=\"acme-corp-rag-1\",  # App name (Mandatory)\n",
+    "    owner=\"Joe Smith\",  # Owner (Optional)\n",
+    "    description=\"Support productivity RAG application\",  # Description (Optional)\n",
+    ")\n",
+    "documents = loader.load()\n",
+    "print(documents)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.13"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/docs/docs/integrations/document_loaders/source_code.ipynb b/docs/docs/integrations/document_loaders/source_code.ipynb
@@ -9,7 +9,35 @@
     "\n",
     "This notebook covers how to load source code files using a special approach with language parsing: each top-level function and class in the code is loaded into separate documents. Any remaining code top-level code outside the already loaded functions and classes will be loaded into a separate document.\n",
     "\n",
-    "This approach can potentially improve the accuracy of QA models over source code. Currently, the supported languages for code parsing are Python and JavaScript. The language used for parsing can be configured, along with the minimum number of lines required to activate the splitting based on syntax."
+    "This approach can potentially improve the accuracy of QA models over source code.\n",
+    "\n",
+    "The supported languages for code parsing are:\n",
+    "\n",
+    "- C (*)\n",
+    "- C++ (*)\n",
+    "- C# (*)\n",
+    "- COBOL\n",
+    "- Go (*)\n",
+    "- Java (*)\n",
+    "- JavaScript (requires package `esprima`)\n",
+    "- Kotlin (*)\n",
+    "- Lua (*)\n",
+    "- Perl (*)\n",
+    "- Python\n",
+    "- Ruby (*)\n",
+    "- Rust (*)\n",
+    "- Scala (*)\n",
+    "- TypeScript (*)\n",
+    "\n",
+    "Items marked with (*) require the packages `tree_sitter` and `tree_sitter_languages`.\n",
+    "It is straightforward to add support for additional languages using `tree_sitter`,\n",
+    "although this currently requires modifying LangChain.\n",
+    "\n",
+    "The language used for parsing can be configured, along with the minimum number of\n",
+    "lines required to activate the splitting based on syntax.\n",
+    "\n",
+    "If a language is not explicitly specified, `LanguageParser` will infer one from\n",
+    "filename extensions, if present."
    ]
   },
   {
@@ -19,7 +47,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "%pip install --upgrade --quiet  esprima"
+    "%pip install -qU esprima esprima tree_sitter tree_sitter_languages"
    ]
   },
   {
@@ -395,6 +423,33 @@
    "source": [
     "print(\"\\n\\n--8<--\\n\\n\".join([document.page_content for document in result]))"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Adding Languages using Tree-sitter Template\n",
+    "\n",
+    "Expanding language support using the Tree-Sitter template involves a few essential steps:\n",
+    "\n",
+    "1. **Creating a New Language File**:\n",
+    "    - Begin by creating a new file in the designated directory (langchain/libs/community/langchain_community/document_loaders/parsers/language).\n",
+    "    - Model this file based on the structure and parsing logic of existing language files like **`cpp.py`**.\n",
+    "    - You will also need to create a file in the langchain directory (langchain/libs/langchain/langchain/document_loaders/parsers/language).\n",
+    "2. **Parsing Language Specifics**:\n",
+    "    - Mimic the structure used in the **`cpp.py`** file, adapting it to suit the language you are incorporating.\n",
+    "    - The primary alteration involves adjusting the chunk query array to suit the syntax and structure of the language you are parsing.\n",
+    "3. **Testing the Language Parser**:\n",
+    "    - For thorough validation, generate a test file specific to the new language. Create **`test_language.py`** in the designated directory(langchain/libs/community/tests/unit_tests/document_loaders/parsers/language).\n",
+    "    - Follow the example set by **`test_cpp.py`** to establish fundamental tests for the parsed elements in the new language.\n",
+    "4. **Integration into the Parser and Text Splitter**:\n",
+    "    - Incorporate your new language within the **`language_parser.py`** file. Ensure to update LANGUAGE_EXTENSIONS and LANGUAGE_SEGMENTERS along with the docstring for LanguageParser to recognize and handle the added language.\n",
+    "    - Also, confirm that your language is included in **`text_splitter.py`** in class Language for proper parsing.\n",
+    "\n",
+    "By following these steps and ensuring comprehensive testing and integration, you'll successfully extend language support using the Tree-Sitter template.\n",
+    "\n",
+    "Best of luck!"
+   ]
   }
  ],
  "metadata": {
@@ -413,7 +468,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.16"
+   "version": "3.11.5"
   }
  },
  "nbformat": 4,

diff --git a/docs/docs/integrations/memory/aws_dynamodb.ipynb b/docs/docs/integrations/memory/aws_dynamodb.ipynb
@@ -274,8 +274,6 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from typing import Optional\n",
-    "\n",
     "from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
     "from langchain_core.runnables.history import RunnableWithMessageHistory\n",
     "from langchain_openai import ChatOpenAI"

diff --git a/docs/docs/integrations/memory/mongodb_chat_message_history.ipynb b/docs/docs/integrations/memory/mongodb_chat_message_history.ipynb
@@ -133,8 +133,6 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from typing import Optional\n",
-    "\n",
     "from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
     "from langchain_core.runnables.history import RunnableWithMessageHistory\n",
     "from langchain_openai import ChatOpenAI"

diff --git a/docs/docs/integrations/memory/remembrall.md b/docs/docs/integrations/memory/remembrall.md
@@ -16,6 +16,12 @@ To get started, [sign in with Github on the Remembrall platform](https://remembr
 
 Any request that you send with the modified `openai_api_base` (see below) and Remembrall API key will automatically be tracked in the Remembrall dashboard. You **never** have to share your OpenAI key with our platform and this information is **never** stored by the Remembrall systems.
 
+To do this, we need to install the following dependencies:
+
+```bash
+pip install -U langchain-openai
+```
+
 ### Enable Long Term Memory
 
 In addition to setting the `openai_api_base` and Remembrall API key via `x-gp-api-key`, you should specify a UID to maintain memory for. This will usually be a unique user identifier (like email).

diff --git a/docs/docs/integrations/memory/sql_chat_message_history.ipynb b/docs/docs/integrations/memory/sql_chat_message_history.ipynb
@@ -26,7 +26,7 @@
     "The integration lives in the `langchain-community` package, so we need to install that. We also need to install the `SQLAlchemy` package.\n",
     "\n",
     "```bash\n",
-    "pip install -U langchain-community SQLAlchemy\n",
+    "pip install -U langchain-community SQLAlchemy langchain-openai\n",
     "```"
    ]
   },
@@ -71,10 +71,7 @@
      "end_time": "2023-08-28T10:04:38.077748Z",
      "start_time": "2023-08-28T10:04:36.105894Z"
     },
-    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    }
+    "collapsed": false
    },
    "outputs": [],
    "source": [
@@ -97,10 +94,7 @@
      "end_time": "2023-08-28T10:04:38.929396Z",
      "start_time": "2023-08-28T10:04:38.915727Z"
     },
-    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    }
+    "collapsed": false
    },
    "outputs": [
     {
@@ -137,8 +131,6 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from typing import Optional\n",
-    "\n",
     "from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
     "from langchain_core.runnables.history import RunnableWithMessageHistory\n",
     "from langchain_openai import ChatOpenAI"

diff --git a/docs/docs/integrations/memory/sqlite.ipynb b/docs/docs/integrations/memory/sqlite.ipynb
@@ -119,8 +119,6 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from typing import Optional\n",
-    "\n",
     "from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
     "from langchain_core.runnables.history import RunnableWithMessageHistory\n",
     "from langchain_openai import ChatOpenAI"