Merge branch 'langchain-ai:master' into snova-jorgep/bind_tools_samba…

…nova_chat_models
langchain-ai · Nov 8, 2024 · d833044 · d833044
2 parents d4b50f4 + ff2152b
commit d833044
Show file tree

Hide file tree

Showing 5 changed files with 57 additions and 83 deletions.
diff --git a/docs/docs/how_to/custom_tools.ipynb b/docs/docs/how_to/custom_tools.ipynb
@@ -20,7 +20,7 @@
     "\n",
     "1. Functions;\n",
     "2. LangChain [Runnables](/docs/concepts/runnables);\n",
-    "3. By sub-classing from [BaseTool](https://python.langchain.com/api_reference/core/tools/langchain_core.tools.BaseTool.html) -- This is the most flexible method, it provides the largest degree of control, at the expense of more effort and code.\n",
+    "3. By sub-classing from [BaseTool](https://python.langchain.com/api_reference/core/tools/langchain_core.tools.base.BaseTool.html) -- This is the most flexible method, it provides the largest degree of control, at the expense of more effort and code.\n",
     "\n",
     "Creating tools from functions may be sufficient for most use cases, and can be done via a simple [@tool decorator](https://python.langchain.com/api_reference/core/tools/langchain_core.tools.tool.html#langchain_core.tools.tool). If more configuration is needed-- e.g., specification of both sync and async implementations-- one can also use the [StructuredTool.from_function](https://python.langchain.com/api_reference/core/tools/langchain_core.tools.structured.StructuredTool.html#langchain_core.tools.structured.StructuredTool.from_function) class method.\n",
     "\n",

diff --git a/docs/docs/how_to/document_loader_pdf.ipynb b/docs/docs/how_to/document_loader_pdf.ipynb
@@ -48,7 +48,7 @@
     "\n",
     "## Simple and fast text extraction\n",
     "\n",
-    "If you are looking for a simple string representation of text that is embedded in a PDF, the method below is appropriate. It will return a list of [Document](https://python.langchain.com/api_reference/core/documents/langchain_core.documents.base.Document.html) objects-- one per page-- containing a single string of the page's text in the Document's `page_content` attribute. It will not parse text in images or scanned PDF pages. Under the hood it uses the [pypydf](https://pypdf.readthedocs.io/en/stable/) Python library.\n",
+    "If you are looking for a simple string representation of text that is embedded in a PDF, the method below is appropriate. It will return a list of [Document](https://python.langchain.com/api_reference/core/documents/langchain_core.documents.base.Document.html) objects-- one per page-- containing a single string of the page's text in the Document's `page_content` attribute. It will not parse text in images or scanned PDF pages. Under the hood it uses the [pypdf](https://pypdf.readthedocs.io/en/stable/) Python library.\n",
     "\n",
     "LangChain [document loaders](/docs/concepts/document_loaders) implement `lazy_load` and its async variant, `alazy_load`, which return iterators of `Document` objects. We will use these below."
    ]

diff --git a/docs/docs/how_to/tool_calling.ipynb b/docs/docs/how_to/tool_calling.ipynb
@@ -55,7 +55,7 @@
    "source": [
     "## Defining tool schemas\n",
     "\n",
-    "For a model to be able to call tools, we need to pass in tool schemas that describe what the tool does and what it's arguments are. Chat models that support tool calling features implement a `.bind_tools()` method for passing tool schemas to the model. Tool schemas can be passed in as Python functions (with typehints and docstrings), Pydantic models, TypedDict classes, or LangChain [Tool objects](https://python.langchain.com/api_reference/core/tools/langchain_core.tools.BaseTool.html#langchain_core.tools.BaseTool). Subsequent invocations of the model will pass in these tool schemas along with the prompt.\n",
+    "For a model to be able to call tools, we need to pass in tool schemas that describe what the tool does and what it's arguments are. Chat models that support tool calling features implement a `.bind_tools()` method for passing tool schemas to the model. Tool schemas can be passed in as Python functions (with typehints and docstrings), Pydantic models, TypedDict classes, or LangChain [Tool objects](https://python.langchain.com/api_reference/core/tools/langchain_core.tools.base.BaseTool.html#basetool). Subsequent invocations of the model will pass in these tool schemas along with the prompt.\n",
     "\n",
     "### Python functions\n",
     "Our tool schemas can be Python functions:"

diff --git a/docs/docs/tutorials/index.mdx b/docs/docs/tutorials/index.mdx
@@ -7,25 +7,25 @@ sidebar_class_name: hidden
 New to LangChain or LLM app development in general? Read this material to quickly get up and running.
 
 ## Basics
-- [Build a Simple LLM Application with LCEL](/docs/tutorials/llm_chain)
-- [Build a Chatbot](/docs/tutorials/chatbot)
-- [Build vector stores and retrievers](/docs/tutorials/retrievers)
-- [Build an Agent](/docs/tutorials/agents)
+- [LLM applications](/docs/tutorials/llm_chain): Build and deploy a simple LLM application.
+- [Chatbots](/docs/tutorials/chatbot): Build a chatbot that incorporates memory.
+- [Vector stores](/docs/tutorials/retrievers): Build vector stores and use them to retrieve data.
+- [Agents](/docs/tutorials/agents): Build an agent that interacts with external tools.
 
 ## Working with external knowledge
-- [Build a Retrieval Augmented Generation (RAG) Application](/docs/tutorials/rag)
-- [Build a Conversational RAG Application](/docs/tutorials/qa_chat_history)
-- [Build a Question/Answering system over SQL data](/docs/tutorials/sql_qa)
-- [Build a Query Analysis System](/docs/tutorials/query_analysis)
-- [Build a local RAG application](/docs/tutorials/local_rag)
-- [Build a Question Answering application over a Graph Database](/docs/tutorials/graph)
-- [Build a PDF ingestion and Question/Answering system](/docs/tutorials/pdf_qa/)
+- [Retrieval Augmented Generation (RAG)](/docs/tutorials/rag): Build an application that uses your own documents to inform its responses.
+- [Conversational RAG](/docs/tutorials/qa_chat_history): Build a RAG application that incorporates a memory of its user interactions.
+- [Question-Answering with SQL](/docs/tutorials/sql_qa): Build a question-answering system that executes SQL queries to inform its responses.
+- [Query Analysis](/docs/tutorials/query_analysis): Build a RAG application that analyzes questions to generate filters and other structured queries.
+- [Local RAG](/docs/tutorials/local_rag): Build a RAG application using LLMs running locally on your machine.
+- [Question-Answering with Graph Databases](/docs/tutorials/graph): Build a question-answering system that queries a graph database to inform its responses.
+- [Question-Answering with PDFs](/docs/tutorials/pdf_qa/): Build a question-answering system that ingests PDFs and uses them to inform its responses.
 
 ## Specialized tasks
-- [Build an Extraction Chain](/docs/tutorials/extraction)
-- [Generate synthetic data](/docs/tutorials/data_generation)
-- [Classify text into labels](/docs/tutorials/classification)
-- [Summarize text](/docs/tutorials/summarization)
+- [Extraction](/docs/tutorials/extraction): Extract structured data from text and other unstructured media.
+- [Synthetic data](/docs/tutorials/data_generation): Generate synthetic data using LLMs.
+- [Classification](/docs/tutorials/classification): Classify text into categories or labels.
+- [Summarization](/docs/tutorials/summarization): Generate summaries of (potentially long) texts.
 
 ## LangGraph
 

diff --git a/libs/community/langchain_community/vectorstores/opensearch_vector_search.py b/libs/community/langchain_community/vectorstores/opensearch_vector_search.py
@@ -2,7 +2,7 @@
 
 import uuid
 import warnings
-from typing import Any, Callable, Dict, Iterable, List, Optional, Tuple
+from typing import TYPE_CHECKING, Any, Callable, Dict, Iterable, List, Optional, Tuple
 
 import numpy as np
 from langchain_core.documents import Document
@@ -23,57 +23,18 @@
 PAINLESS_SCRIPTING_SEARCH = "painless_scripting"
 MATCH_ALL_QUERY = {"match_all": {}}  # type: Dict
 
+if TYPE_CHECKING:
+    from opensearchpy import AsyncOpenSearch, OpenSearch
 
-def _import_opensearch() -> Any:
-    """Import OpenSearch if available, otherwise raise error."""
+
+def _get_opensearch_client(opensearch_url: str, **kwargs: Any) -> OpenSearch:
+    """Get OpenSearch client from the opensearch_url, otherwise raise error."""
     try:
         from opensearchpy import OpenSearch
-    except ImportError:
-        raise ImportError(IMPORT_OPENSEARCH_PY_ERROR)
-    return OpenSearch
-
 
-def _import_async_opensearch() -> Any:
-    """Import AsyncOpenSearch if available, otherwise raise error."""
-    try:
-        from opensearchpy import AsyncOpenSearch
-    except ImportError:
-        raise ImportError(IMPORT_ASYNC_OPENSEARCH_PY_ERROR)
-    return AsyncOpenSearch
-
-
-def _import_bulk() -> Any:
-    """Import bulk if available, otherwise raise error."""
-    try:
-        from opensearchpy.helpers import bulk
+        client = OpenSearch(opensearch_url, **kwargs)
     except ImportError:
         raise ImportError(IMPORT_OPENSEARCH_PY_ERROR)
-    return bulk
-
-
-def _import_async_bulk() -> Any:
-    """Import async_bulk if available, otherwise raise error."""
-    try:
-        from opensearchpy.helpers import async_bulk
-    except ImportError:
-        raise ImportError(IMPORT_ASYNC_OPENSEARCH_PY_ERROR)
-    return async_bulk
-
-
-def _import_not_found_error() -> Any:
-    """Import not found error if available, otherwise raise error."""
-    try:
-        from opensearchpy.exceptions import NotFoundError
-    except ImportError:
-        raise ImportError(IMPORT_OPENSEARCH_PY_ERROR)
-    return NotFoundError
-
-
-def _get_opensearch_client(opensearch_url: str, **kwargs: Any) -> Any:
-    """Get OpenSearch client from the opensearch_url, otherwise raise error."""
-    try:
-        opensearch = _import_opensearch()
-        client = opensearch(opensearch_url, **kwargs)
     except ValueError as e:
         raise ImportError(
             f"OpenSearch client string provided is not in proper format. "
@@ -82,11 +43,14 @@ def _get_opensearch_client(opensearch_url: str, **kwargs: Any) -> Any:
     return client
 
 
-def _get_async_opensearch_client(opensearch_url: str, **kwargs: Any) -> Any:
+def _get_async_opensearch_client(opensearch_url: str, **kwargs: Any) -> AsyncOpenSearch:
     """Get AsyncOpenSearch client from the opensearch_url, otherwise raise error."""
     try:
-        async_opensearch = _import_async_opensearch()
-        client = async_opensearch(opensearch_url, **kwargs)
+        from opensearchpy import AsyncOpenSearch
+
+        client = AsyncOpenSearch(opensearch_url, **kwargs)
+    except ImportError:
+        raise ImportError(IMPORT_ASYNC_OPENSEARCH_PY_ERROR)
     except ValueError as e:
         raise ImportError(
             f"AsyncOpenSearch client string provided is not in proper format. "
@@ -127,7 +91,7 @@ def _is_aoss_enabled(http_auth: Any) -> bool:
 
 
 def _bulk_ingest_embeddings(
-    client: Any,
+    client: OpenSearch,
     index_name: str,
     embeddings: List[List[float]],
     texts: Iterable[str],
@@ -142,16 +106,19 @@ def _bulk_ingest_embeddings(
     """Bulk Ingest Embeddings into given index."""
     if not mapping:
         mapping = dict()
+    try:
+        from opensearchpy.exceptions import NotFoundError
+        from opensearchpy.helpers import bulk
+    except ImportError:
+        raise ImportError(IMPORT_OPENSEARCH_PY_ERROR)
 
-    bulk = _import_bulk()
-    not_found_error = _import_not_found_error()
     requests = []
     return_ids = []
     mapping = mapping
 
     try:
         client.indices.get(index=index_name)
-    except not_found_error:
+    except NotFoundError:
         client.indices.create(index=index_name, body=mapping)
 
     for i, text in enumerate(texts):
@@ -177,7 +144,7 @@ def _bulk_ingest_embeddings(
 
 
 async def _abulk_ingest_embeddings(
-    client: Any,
+    client: AsyncOpenSearch,
     index_name: str,
     embeddings: List[List[float]],
     texts: Iterable[str],
@@ -193,14 +160,18 @@ async def _abulk_ingest_embeddings(
     if not mapping:
         mapping = dict()
 
-    async_bulk = _import_async_bulk()
-    not_found_error = _import_not_found_error()
+    try:
+        from opensearchpy.exceptions import NotFoundError
+        from opensearchpy.helpers import async_bulk
+    except ImportError:
+        raise ImportError(IMPORT_ASYNC_OPENSEARCH_PY_ERROR)
+
     requests = []
     return_ids = []
 
     try:
         await client.indices.get(index=index_name)
-    except not_found_error:
+    except NotFoundError:
         await client.indices.create(index=index_name, body=mapping)
 
     for i, text in enumerate(texts):
@@ -230,7 +201,7 @@ async def _abulk_ingest_embeddings(
 def _default_scripting_text_mapping(
     dim: int,
     vector_field: str = "vector_field",
-) -> Dict:
+) -> Dict[str, Any]:
     """For Painless Scripting or Script Scoring,the default mapping to create index."""
     return {
         "mappings": {
@@ -249,7 +220,7 @@ def _default_text_mapping(
     ef_construction: int = 512,
     m: int = 16,
     vector_field: str = "vector_field",
-) -> Dict:
+) -> Dict[str, Any]:
     """For Approximate k-NN Search, this is the default mapping to create index."""
     return {
         "settings": {"index": {"knn": True, "knn.algo_param.ef_search": ef_search}},
@@ -275,7 +246,7 @@ def _default_approximate_search_query(
     k: int = 4,
     vector_field: str = "vector_field",
     score_threshold: Optional[float] = 0.0,
-) -> Dict:
+) -> Dict[str, Any]:
     """For Approximate k-NN Search, this is the default query."""
     return {
         "size": k,
@@ -291,7 +262,7 @@ def _approximate_search_query_with_boolean_filter(
     vector_field: str = "vector_field",
     subquery_clause: str = "must",
     score_threshold: Optional[float] = 0.0,
-) -> Dict:
+) -> Dict[str, Any]:
     """For Approximate k-NN Search, with Boolean Filter."""
     return {
         "size": k,
@@ -313,7 +284,7 @@ def _approximate_search_query_with_efficient_filter(
     k: int = 4,
     vector_field: str = "vector_field",
     score_threshold: Optional[float] = 0.0,
-) -> Dict:
+) -> Dict[str, Any]:
     """For Approximate k-NN Search, with Efficient Filter for Lucene and
     Faiss Engines."""
     search_query = _default_approximate_search_query(
@@ -330,7 +301,7 @@ def _default_script_query(
     pre_filter: Optional[Dict] = None,
     vector_field: str = "vector_field",
     score_threshold: Optional[float] = 0.0,
-) -> Dict:
+) -> Dict[str, Any]:
     """For Script Scoring Search, this is the default query."""
 
     if not pre_filter:
@@ -376,7 +347,7 @@ def _default_painless_scripting_query(
     pre_filter: Optional[Dict] = None,
     vector_field: str = "vector_field",
     score_threshold: Optional[float] = 0.0,
-) -> Dict:
+) -> Dict[str, Any]:
     """For Painless Scripting Search, this is the default query."""
 
     if not pre_filter:
@@ -692,7 +663,10 @@ def delete(
             refresh_indices: Whether to refresh the index
                             after deleting documents. Defaults to True.
         """
-        bulk = _import_bulk()
+        try:
+            from opensearchpy.helpers import bulk
+        except ImportError:
+            raise ImportError(IMPORT_OPENSEARCH_PY_ERROR)
 
         body = []