Merge branch 'master' into erick/docs-platforms-providers

langchain-ai · Oct 15, 2024 · 6ce41a6 · 6ce41a6
2 parents 1666cf0 + d2cd436
commit 6ce41a6
Show file tree

Hide file tree

Showing 44 changed files with 279 additions and 140 deletions.
diff --git a/.github/workflows/api_doc_build.yml b/.github/workflows/api_doc_build.yml
@@ -73,6 +73,10 @@ jobs:
         with:
           repository: langchain-ai/langchain-unstructured
           path: langchain-unstructured
+      - uses: actions/checkout@v4
+        with:
+          repository: langchain-ai/langchain-databricks
+          path: langchain-databricks
 
 
       - name: Set Git config
@@ -97,7 +101,8 @@ jobs:
             langchain/libs/standard-tests \
             langchain/libs/experimental \
             langchain/libs/partners/milvus \
-            langchain/libs/partners/unstructured
+            langchain/libs/partners/unstructured \
+            langchain/libs/databricks
           mv langchain-google/libs/genai langchain/libs/partners/google-genai
           mv langchain-google/libs/vertexai langchain/libs/partners/google-vertexai
           mv langchain-google/libs/community langchain/libs/partners/google-community
@@ -113,6 +118,7 @@ jobs:
           mv langchain-experimental/libs/experimental langchain/libs/experimental
           mv langchain-milvus/libs/milvus langchain/libs/partners/milvus
           mv langchain-unstructured/libs/unstructured langchain/libs/partners/unstructured
+          mv langchain-databricks/libs/databricks langchain/libs/partners/databricks
 
       - name: Rm old html
         run:

diff --git a/docs/docs/integrations/providers/konlpy.mdx b/docs/docs/integrations/providers/konlpy.mdx
@@ -0,0 +1,21 @@
+# KoNLPY
+
+>[KoNLPy](https://konlpy.org/) is a Python package for natural language processing (NLP) 
+> of the Korean language.
+
+
+## Installation and Setup
+
+You need to install the `konlpy` python package.
+
+```bash
+pip install konlpy
+```
+
+## Text splitter
+
+See a [usage example](/docs/how_to/split_by_token/#konlpy).
+
+```python
+from langchain_text_splitters import KonlpyTextSplitter
+```
diff --git a/docs/docs/integrations/providers/kuzu.mdx b/docs/docs/integrations/providers/kuzu.mdx
@@ -0,0 +1,32 @@
+# Kùzu
+
+>[Kùzu](https://kuzudb.com/) is a company based in Waterloo, Ontario, Canada. 
+> It provides a highly scalable, extremely fast, easy-to-use [embeddable graph database](https://github.com/kuzudb/kuzu).
+
+
+
+## Installation and Setup
+
+You need to install the `kuzu` python package.
+
+```bash
+pip install kuzu
+```
+
+## Graph database
+
+See a [usage example](/docs/integrations/graphs/kuzu_db).
+
+```python
+from langchain_community.graphs import KuzuGraph
+```
+
+## Chain
+
+See a [usage example](/docs/integrations/graphs/kuzu_db/#creating-kuzuqachain).
+
+```python
+from langchain.chains import KuzuQAChain
+```
+
+
diff --git a/docs/docs/integrations/providers/llama_index.mdx b/docs/docs/integrations/providers/llama_index.mdx
@@ -0,0 +1,32 @@
+# LlamaIndex
+
+>[LlamaIndex](https://www.llamaindex.ai/) is the leading data framework for building LLM applications
+
+
+## Installation and Setup
+
+You need to install the `llama-index` python package.
+
+```bash
+pip install llama-index
+```
+
+See the [installation instructions](https://docs.llamaindex.ai/en/stable/getting_started/installation/).
+
+## Retrievers
+
+### LlamaIndexRetriever
+
+>It is used for the question-answering with sources over an LlamaIndex data structure.
+
+```python
+from langchain_community.retrievers.llama_index import LlamaIndexRetriever
+```
+
+### LlamaIndexGraphRetriever
+
+>It is used for question-answering with sources over an LlamaIndex graph data structure.
+
+```python
+from langchain_community.retrievers.llama_index import LlamaIndexGraphRetriever
+```
diff --git a/docs/docs/integrations/providers/llamaedge.mdx b/docs/docs/integrations/providers/llamaedge.mdx
@@ -0,0 +1,24 @@
+# LlamaEdge
+
+>[LlamaEdge](https://llamaedge.com/docs/intro/) is the easiest & fastest way to run customized 
+> and fine-tuned LLMs locally or on the edge.
+>
+>* Lightweight inference apps. `LlamaEdge` is in MBs instead of GBs
+>* Native and GPU accelerated performance
+>* Supports many GPU and hardware accelerators
+>* Supports many optimized inference libraries
+>* Wide selection of AI / LLM models
+
+
+
+## Installation and Setup
+
+See the [installation instructions](https://llamaedge.com/docs/user-guide/quick-start-command).
+
+## Chat models
+
+See a [usage example](/docs/integrations/chat/llama_edge).
+
+```python
+from langchain_community.chat_models.llama_edge import LlamaEdgeChatService
+```
diff --git a/docs/docs/integrations/providers/llamafile.mdx b/docs/docs/integrations/providers/llamafile.mdx
@@ -0,0 +1,31 @@
+# llamafile
+
+>[llamafile](https://github.com/Mozilla-Ocho/llamafile) lets you distribute and run LLMs 
+> with a single file.
+
+>`llamafile` makes open LLMs much more accessible to both developers and end users. 
+> `llamafile` is doing that by combining [llama.cpp](https://github.com/ggerganov/llama.cpp) with 
+> [Cosmopolitan Libc](https://github.com/jart/cosmopolitan) into one framework that collapses 
+> all the complexity of LLMs down to a single-file executable (called a "llamafile") 
+> that runs locally on most computers, with no installation.
+
+
+## Installation and Setup
+
+See the [installation instructions](https://github.com/Mozilla-Ocho/llamafile?tab=readme-ov-file#quickstart).
+
+## LLMs
+
+See a [usage example](/docs/integrations/llms/llamafile).
+
+```python
+from langchain_community.llms.llamafile import Llamafile
+```
+
+## Embedding models
+
+See a [usage example](/docs/integrations/text_embedding/llamafile).
+
+```python
+from langchain_community.embeddings import LlamafileEmbeddings
+```
diff --git a/libs/community/langchain_community/agent_toolkits/openapi/planner.py b/libs/community/langchain_community/agent_toolkits/openapi/planner.py
@@ -356,7 +356,7 @@ def _create_and_run_api_controller_agent(plan_str: str) -> str:
         for endpoint_name in endpoint_names:
             found_match = False
             for name, _, docs in api_spec.endpoints:
-                regex_name = re.compile(re.sub("\{.*?\}", ".*", name))
+                regex_name = re.compile(re.sub("\\{.*?\\}", ".*", name))
                 if regex_name.match(endpoint_name):
                     found_match = True
                     docs_str += f"== Docs for {endpoint_name} == \n{yaml.dump(docs)}\n"

diff --git a/libs/community/langchain_community/chat_models/sambanova.py b/libs/community/langchain_community/chat_models/sambanova.py
@@ -174,10 +174,10 @@ class ChatSambaNovaCloud(BaseChatModel):
     temperature: float = Field(default=0.7)
     """model temperature"""
 
-    top_p: Optional[float] = Field()
+    top_p: Optional[float] = Field(default=None)
     """model top p"""
 
-    top_k: Optional[int] = Field()
+    top_k: Optional[int] = Field(default=None)
     """model top k"""
 
     stream_options: dict = Field(default={"include_usage": True})
@@ -593,7 +593,7 @@ class ChatSambaStudio(BaseChatModel):
     streaming_url: str = Field(default="", exclude=True)
     """SambaStudio streaming Url"""
 
-    model: Optional[str] = Field()
+    model: Optional[str] = Field(default=None)
     """The name of the model or expert to use (for CoE endpoints)"""
 
     streaming: bool = Field(default=False)
@@ -605,16 +605,16 @@ class ChatSambaStudio(BaseChatModel):
     temperature: Optional[float] = Field(default=0.7)
     """model temperature"""
 
-    top_p: Optional[float] = Field()
+    top_p: Optional[float] = Field(default=None)
     """model top p"""
 
-    top_k: Optional[int] = Field()
+    top_k: Optional[int] = Field(default=None)
     """model top k"""
 
-    do_sample: Optional[bool] = Field()
+    do_sample: Optional[bool] = Field(default=None)
     """whether to do sampling"""
 
-    process_prompt: Optional[bool] = Field()
+    process_prompt: Optional[bool] = Field(default=True)
     """whether process prompt (for CoE generic v1 and v2 endpoints)"""
 
     stream_options: dict = Field(default={"include_usage": True})
@@ -1012,6 +1012,16 @@ def _process_stream_response(
                                 "system_fingerprint": data["system_fingerprint"],
                                 "created": data["created"],
                             }
+                        if data.get("usage") is not None:
+                            content = ""
+                            id = data["id"]
+                            metadata = {
+                                "finish_reason": finish_reason,
+                                "usage": data.get("usage"),
+                                "model_name": data["model"],
+                                "system_fingerprint": data["system_fingerprint"],
+                                "created": data["created"],
+                            }
                         yield AIMessageChunk(
                             content=content,
                             id=id,

diff --git a/libs/community/langchain_community/embeddings/text2vec.py b/libs/community/langchain_community/embeddings/text2vec.py
@@ -10,7 +10,7 @@ class Text2vecEmbeddings(Embeddings, BaseModel):
     """text2vec embedding models.
 
     Install text2vec first, run 'pip install -U text2vec'.
-    The gitbub repository for text2vec is : https://github.com/shibing624/text2vec
+    The github repository for text2vec is : https://github.com/shibing624/text2vec
 
     Example:
         .. code-block:: python

diff --git a/libs/community/langchain_community/utilities/arxiv.py b/libs/community/langchain_community/utilities/arxiv.py
@@ -94,6 +94,16 @@ def validate_environment(cls, values: Dict) -> Any:
             )
         return values
 
+    def _fetch_results(self, query: str) -> Any:
+        """Helper function to fetch arxiv results based on query."""
+        if self.is_arxiv_identifier(query):
+            return self.arxiv_search(
+                id_list=query.split(), max_results=self.top_k_results
+            ).results()
+        return self.arxiv_search(
+            query[: self.ARXIV_MAX_QUERY_LENGTH], max_results=self.top_k_results
+        ).results()
+
     def get_summaries_as_docs(self, query: str) -> List[Document]:
         """
         Performs an arxiv search and returns list of
@@ -107,16 +117,11 @@ def get_summaries_as_docs(self, query: str) -> List[Document]:
             query: a plaintext search query
         """
         try:
-            if self.is_arxiv_identifier(query):
-                results = self.arxiv_search(
-                    id_list=query.split(),
-                    max_results=self.top_k_results,
-                ).results()
-            else:
-                results = self.arxiv_search(  # type: ignore
-                    query[: self.ARXIV_MAX_QUERY_LENGTH], max_results=self.top_k_results
-                ).results()
+            results = self._fetch_results(
+                query
+            )  # Using helper function to fetch results
         except self.arxiv_exceptions as ex:
+            logger.error(f"Arxiv exception: {ex}")  # Added error logging
             return [Document(page_content=f"Arxiv exception: {ex}")]
         docs = [
             Document(
@@ -146,16 +151,11 @@ def run(self, query: str) -> str:
             query: a plaintext search query
         """
         try:
-            if self.is_arxiv_identifier(query):
-                results = self.arxiv_search(
-                    id_list=query.split(),
-                    max_results=self.top_k_results,
-                ).results()
-            else:
-                results = self.arxiv_search(  # type: ignore
-                    query[: self.ARXIV_MAX_QUERY_LENGTH], max_results=self.top_k_results
-                ).results()
+            results = self._fetch_results(
+                query
+            )  # Using helper function to fetch results
         except self.arxiv_exceptions as ex:
+            logger.error(f"Arxiv exception: {ex}")  # Added error logging
             return f"Arxiv exception: {ex}"
         docs = [
             f"Published: {result.updated.date()}\n"
@@ -208,15 +208,9 @@ def lazy_load(self, query: str) -> Iterator[Document]:
         try:
             # Remove the ":" and "-" from the query, as they can cause search problems
             query = query.replace(":", "").replace("-", "")
-            if self.is_arxiv_identifier(query):
-                results = self.arxiv_search(
-                    id_list=query[: self.ARXIV_MAX_QUERY_LENGTH].split(),
-                    max_results=self.load_max_docs,
-                ).results()
-            else:
-                results = self.arxiv_search(  # type: ignore
-                    query[: self.ARXIV_MAX_QUERY_LENGTH], max_results=self.load_max_docs
-                ).results()
+            results = self._fetch_results(
+                query
+            )  # Using helper function to fetch results
         except self.arxiv_exceptions as ex:
             logger.debug("Error on arxiv: %s", ex)
             return

diff --git a/libs/community/langchain_community/vectorstores/pgvector.py b/libs/community/langchain_community/vectorstores/pgvector.py
@@ -219,12 +219,12 @@ def _results_to_docs(docs_and_scores: Any) -> List[Document]:
     since="0.0.31",
     message=(
         "This class is pending deprecation and may be removed in a future version. "
-        "You can swap to using the `PGVector`"
-        " implementation in `langchain_postgres`. "
+        "You can swap to using the `PGVector` "
+        "implementation in `langchain_postgres`. "
         "Please read the guidelines in the doc-string of this class "
         "to follow prior to migrating as there are some differences "
         "between the implementations. "
-        "See <https://github.com/langchain-ai/langchain-postgres> for details about"
+        "See <https://github.com/langchain-ai/langchain-postgres> for details about "
         "the new implementation."
     ),
     alternative="from langchain_postgres import PGVector;",
@@ -331,11 +331,11 @@ def __init__(
                 message=(
                     "Please use JSONB instead of JSON for metadata. "
                     "This change will allow for more efficient querying that "
-                    "involves filtering based on metadata."
+                    "involves filtering based on metadata. "
                     "Please note that filtering operators have been changed "
-                    "when using JSOB metadata to be prefixed with a $ sign "
+                    "when using JSONB metadata to be prefixed with a $ sign "
                     "to avoid name collisions with columns. "
-                    "If you're using an existing database, you will need to create a"
+                    "If you're using an existing database, you will need to create a "
                     "db migration for your metadata column to be JSONB and update your "
                     "queries to use the new operators. "
                 ),

diff --git a/libs/community/tests/integration_tests/chat_models/test_litellm_router.py b/libs/community/tests/integration_tests/chat_models/test_litellm_router.py
@@ -266,7 +266,6 @@ def test_litellm_router_streaming_callback(
     fake_completion.check_inputs(expected_num_calls=1)
 
 
-@pytest.mark.asyncio
 @pytest.mark.scheduled
 async def test_async_litellm_router(
     fake_completion: FakeCompletion, litellm_router: Any
@@ -295,7 +294,6 @@ async def test_async_litellm_router(
     fake_completion.check_inputs(expected_num_calls=2)
 
 
-@pytest.mark.asyncio
 @pytest.mark.scheduled
 async def test_async_litellm_router_streaming(
     fake_completion: FakeCompletion, litellm_router: Any