Skip to content

Commit

Permalink
Merge branch 'langchain-ai:master' into snova-jorgep/bind_tools_samba…
Browse files Browse the repository at this point in the history
…nova_chat_models
  • Loading branch information
jhpiedrahitao authored Nov 8, 2024
2 parents d4b50f4 + ff2152b commit d833044
Show file tree
Hide file tree
Showing 5 changed files with 57 additions and 83 deletions.
2 changes: 1 addition & 1 deletion docs/docs/how_to/custom_tools.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
"\n",
"1. Functions;\n",
"2. LangChain [Runnables](/docs/concepts/runnables);\n",
"3. By sub-classing from [BaseTool](https://python.langchain.com/api_reference/core/tools/langchain_core.tools.BaseTool.html) -- This is the most flexible method, it provides the largest degree of control, at the expense of more effort and code.\n",
"3. By sub-classing from [BaseTool](https://python.langchain.com/api_reference/core/tools/langchain_core.tools.base.BaseTool.html) -- This is the most flexible method, it provides the largest degree of control, at the expense of more effort and code.\n",
"\n",
"Creating tools from functions may be sufficient for most use cases, and can be done via a simple [@tool decorator](https://python.langchain.com/api_reference/core/tools/langchain_core.tools.tool.html#langchain_core.tools.tool). If more configuration is needed-- e.g., specification of both sync and async implementations-- one can also use the [StructuredTool.from_function](https://python.langchain.com/api_reference/core/tools/langchain_core.tools.structured.StructuredTool.html#langchain_core.tools.structured.StructuredTool.from_function) class method.\n",
"\n",
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/how_to/document_loader_pdf.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
"\n",
"## Simple and fast text extraction\n",
"\n",
"If you are looking for a simple string representation of text that is embedded in a PDF, the method below is appropriate. It will return a list of [Document](https://python.langchain.com/api_reference/core/documents/langchain_core.documents.base.Document.html) objects-- one per page-- containing a single string of the page's text in the Document's `page_content` attribute. It will not parse text in images or scanned PDF pages. Under the hood it uses the [pypydf](https://pypdf.readthedocs.io/en/stable/) Python library.\n",
"If you are looking for a simple string representation of text that is embedded in a PDF, the method below is appropriate. It will return a list of [Document](https://python.langchain.com/api_reference/core/documents/langchain_core.documents.base.Document.html) objects-- one per page-- containing a single string of the page's text in the Document's `page_content` attribute. It will not parse text in images or scanned PDF pages. Under the hood it uses the [pypdf](https://pypdf.readthedocs.io/en/stable/) Python library.\n",
"\n",
"LangChain [document loaders](/docs/concepts/document_loaders) implement `lazy_load` and its async variant, `alazy_load`, which return iterators of `Document` objects. We will use these below."
]
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/how_to/tool_calling.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@
"source": [
"## Defining tool schemas\n",
"\n",
"For a model to be able to call tools, we need to pass in tool schemas that describe what the tool does and what it's arguments are. Chat models that support tool calling features implement a `.bind_tools()` method for passing tool schemas to the model. Tool schemas can be passed in as Python functions (with typehints and docstrings), Pydantic models, TypedDict classes, or LangChain [Tool objects](https://python.langchain.com/api_reference/core/tools/langchain_core.tools.BaseTool.html#langchain_core.tools.BaseTool). Subsequent invocations of the model will pass in these tool schemas along with the prompt.\n",
"For a model to be able to call tools, we need to pass in tool schemas that describe what the tool does and what it's arguments are. Chat models that support tool calling features implement a `.bind_tools()` method for passing tool schemas to the model. Tool schemas can be passed in as Python functions (with typehints and docstrings), Pydantic models, TypedDict classes, or LangChain [Tool objects](https://python.langchain.com/api_reference/core/tools/langchain_core.tools.base.BaseTool.html#basetool). Subsequent invocations of the model will pass in these tool schemas along with the prompt.\n",
"\n",
"### Python functions\n",
"Our tool schemas can be Python functions:"
Expand Down
30 changes: 15 additions & 15 deletions docs/docs/tutorials/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,25 +7,25 @@ sidebar_class_name: hidden
New to LangChain or LLM app development in general? Read this material to quickly get up and running.

## Basics
- [Build a Simple LLM Application with LCEL](/docs/tutorials/llm_chain)
- [Build a Chatbot](/docs/tutorials/chatbot)
- [Build vector stores and retrievers](/docs/tutorials/retrievers)
- [Build an Agent](/docs/tutorials/agents)
- [LLM applications](/docs/tutorials/llm_chain): Build and deploy a simple LLM application.
- [Chatbots](/docs/tutorials/chatbot): Build a chatbot that incorporates memory.
- [Vector stores](/docs/tutorials/retrievers): Build vector stores and use them to retrieve data.
- [Agents](/docs/tutorials/agents): Build an agent that interacts with external tools.

## Working with external knowledge
- [Build a Retrieval Augmented Generation (RAG) Application](/docs/tutorials/rag)
- [Build a Conversational RAG Application](/docs/tutorials/qa_chat_history)
- [Build a Question/Answering system over SQL data](/docs/tutorials/sql_qa)
- [Build a Query Analysis System](/docs/tutorials/query_analysis)
- [Build a local RAG application](/docs/tutorials/local_rag)
- [Build a Question Answering application over a Graph Database](/docs/tutorials/graph)
- [Build a PDF ingestion and Question/Answering system](/docs/tutorials/pdf_qa/)
- [Retrieval Augmented Generation (RAG)](/docs/tutorials/rag): Build an application that uses your own documents to inform its responses.
- [Conversational RAG](/docs/tutorials/qa_chat_history): Build a RAG application that incorporates a memory of its user interactions.
- [Question-Answering with SQL](/docs/tutorials/sql_qa): Build a question-answering system that executes SQL queries to inform its responses.
- [Query Analysis](/docs/tutorials/query_analysis): Build a RAG application that analyzes questions to generate filters and other structured queries.
- [Local RAG](/docs/tutorials/local_rag): Build a RAG application using LLMs running locally on your machine.
- [Question-Answering with Graph Databases](/docs/tutorials/graph): Build a question-answering system that queries a graph database to inform its responses.
- [Question-Answering with PDFs](/docs/tutorials/pdf_qa/): Build a question-answering system that ingests PDFs and uses them to inform its responses.

## Specialized tasks
- [Build an Extraction Chain](/docs/tutorials/extraction)
- [Generate synthetic data](/docs/tutorials/data_generation)
- [Classify text into labels](/docs/tutorials/classification)
- [Summarize text](/docs/tutorials/summarization)
- [Extraction](/docs/tutorials/extraction): Extract structured data from text and other unstructured media.
- [Synthetic data](/docs/tutorials/data_generation): Generate synthetic data using LLMs.
- [Classification](/docs/tutorials/classification): Classify text into categories or labels.
- [Summarization](/docs/tutorials/summarization): Generate summaries of (potentially long) texts.

## LangGraph

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

import uuid
import warnings
from typing import Any, Callable, Dict, Iterable, List, Optional, Tuple
from typing import TYPE_CHECKING, Any, Callable, Dict, Iterable, List, Optional, Tuple

import numpy as np
from langchain_core.documents import Document
Expand All @@ -23,57 +23,18 @@
PAINLESS_SCRIPTING_SEARCH = "painless_scripting"
MATCH_ALL_QUERY = {"match_all": {}} # type: Dict

if TYPE_CHECKING:
from opensearchpy import AsyncOpenSearch, OpenSearch

def _import_opensearch() -> Any:
"""Import OpenSearch if available, otherwise raise error."""

def _get_opensearch_client(opensearch_url: str, **kwargs: Any) -> OpenSearch:
"""Get OpenSearch client from the opensearch_url, otherwise raise error."""
try:
from opensearchpy import OpenSearch
except ImportError:
raise ImportError(IMPORT_OPENSEARCH_PY_ERROR)
return OpenSearch


def _import_async_opensearch() -> Any:
"""Import AsyncOpenSearch if available, otherwise raise error."""
try:
from opensearchpy import AsyncOpenSearch
except ImportError:
raise ImportError(IMPORT_ASYNC_OPENSEARCH_PY_ERROR)
return AsyncOpenSearch


def _import_bulk() -> Any:
"""Import bulk if available, otherwise raise error."""
try:
from opensearchpy.helpers import bulk
client = OpenSearch(opensearch_url, **kwargs)
except ImportError:
raise ImportError(IMPORT_OPENSEARCH_PY_ERROR)
return bulk


def _import_async_bulk() -> Any:
"""Import async_bulk if available, otherwise raise error."""
try:
from opensearchpy.helpers import async_bulk
except ImportError:
raise ImportError(IMPORT_ASYNC_OPENSEARCH_PY_ERROR)
return async_bulk


def _import_not_found_error() -> Any:
"""Import not found error if available, otherwise raise error."""
try:
from opensearchpy.exceptions import NotFoundError
except ImportError:
raise ImportError(IMPORT_OPENSEARCH_PY_ERROR)
return NotFoundError


def _get_opensearch_client(opensearch_url: str, **kwargs: Any) -> Any:
"""Get OpenSearch client from the opensearch_url, otherwise raise error."""
try:
opensearch = _import_opensearch()
client = opensearch(opensearch_url, **kwargs)
except ValueError as e:
raise ImportError(
f"OpenSearch client string provided is not in proper format. "
Expand All @@ -82,11 +43,14 @@ def _get_opensearch_client(opensearch_url: str, **kwargs: Any) -> Any:
return client


def _get_async_opensearch_client(opensearch_url: str, **kwargs: Any) -> Any:
def _get_async_opensearch_client(opensearch_url: str, **kwargs: Any) -> AsyncOpenSearch:
"""Get AsyncOpenSearch client from the opensearch_url, otherwise raise error."""
try:
async_opensearch = _import_async_opensearch()
client = async_opensearch(opensearch_url, **kwargs)
from opensearchpy import AsyncOpenSearch

client = AsyncOpenSearch(opensearch_url, **kwargs)
except ImportError:
raise ImportError(IMPORT_ASYNC_OPENSEARCH_PY_ERROR)
except ValueError as e:
raise ImportError(
f"AsyncOpenSearch client string provided is not in proper format. "
Expand Down Expand Up @@ -127,7 +91,7 @@ def _is_aoss_enabled(http_auth: Any) -> bool:


def _bulk_ingest_embeddings(
client: Any,
client: OpenSearch,
index_name: str,
embeddings: List[List[float]],
texts: Iterable[str],
Expand All @@ -142,16 +106,19 @@ def _bulk_ingest_embeddings(
"""Bulk Ingest Embeddings into given index."""
if not mapping:
mapping = dict()
try:
from opensearchpy.exceptions import NotFoundError
from opensearchpy.helpers import bulk
except ImportError:
raise ImportError(IMPORT_OPENSEARCH_PY_ERROR)

bulk = _import_bulk()
not_found_error = _import_not_found_error()
requests = []
return_ids = []
mapping = mapping

try:
client.indices.get(index=index_name)
except not_found_error:
except NotFoundError:
client.indices.create(index=index_name, body=mapping)

for i, text in enumerate(texts):
Expand All @@ -177,7 +144,7 @@ def _bulk_ingest_embeddings(


async def _abulk_ingest_embeddings(
client: Any,
client: AsyncOpenSearch,
index_name: str,
embeddings: List[List[float]],
texts: Iterable[str],
Expand All @@ -193,14 +160,18 @@ async def _abulk_ingest_embeddings(
if not mapping:
mapping = dict()

async_bulk = _import_async_bulk()
not_found_error = _import_not_found_error()
try:
from opensearchpy.exceptions import NotFoundError
from opensearchpy.helpers import async_bulk
except ImportError:
raise ImportError(IMPORT_ASYNC_OPENSEARCH_PY_ERROR)

requests = []
return_ids = []

try:
await client.indices.get(index=index_name)
except not_found_error:
except NotFoundError:
await client.indices.create(index=index_name, body=mapping)

for i, text in enumerate(texts):
Expand Down Expand Up @@ -230,7 +201,7 @@ async def _abulk_ingest_embeddings(
def _default_scripting_text_mapping(
dim: int,
vector_field: str = "vector_field",
) -> Dict:
) -> Dict[str, Any]:
"""For Painless Scripting or Script Scoring,the default mapping to create index."""
return {
"mappings": {
Expand All @@ -249,7 +220,7 @@ def _default_text_mapping(
ef_construction: int = 512,
m: int = 16,
vector_field: str = "vector_field",
) -> Dict:
) -> Dict[str, Any]:
"""For Approximate k-NN Search, this is the default mapping to create index."""
return {
"settings": {"index": {"knn": True, "knn.algo_param.ef_search": ef_search}},
Expand All @@ -275,7 +246,7 @@ def _default_approximate_search_query(
k: int = 4,
vector_field: str = "vector_field",
score_threshold: Optional[float] = 0.0,
) -> Dict:
) -> Dict[str, Any]:
"""For Approximate k-NN Search, this is the default query."""
return {
"size": k,
Expand All @@ -291,7 +262,7 @@ def _approximate_search_query_with_boolean_filter(
vector_field: str = "vector_field",
subquery_clause: str = "must",
score_threshold: Optional[float] = 0.0,
) -> Dict:
) -> Dict[str, Any]:
"""For Approximate k-NN Search, with Boolean Filter."""
return {
"size": k,
Expand All @@ -313,7 +284,7 @@ def _approximate_search_query_with_efficient_filter(
k: int = 4,
vector_field: str = "vector_field",
score_threshold: Optional[float] = 0.0,
) -> Dict:
) -> Dict[str, Any]:
"""For Approximate k-NN Search, with Efficient Filter for Lucene and
Faiss Engines."""
search_query = _default_approximate_search_query(
Expand All @@ -330,7 +301,7 @@ def _default_script_query(
pre_filter: Optional[Dict] = None,
vector_field: str = "vector_field",
score_threshold: Optional[float] = 0.0,
) -> Dict:
) -> Dict[str, Any]:
"""For Script Scoring Search, this is the default query."""

if not pre_filter:
Expand Down Expand Up @@ -376,7 +347,7 @@ def _default_painless_scripting_query(
pre_filter: Optional[Dict] = None,
vector_field: str = "vector_field",
score_threshold: Optional[float] = 0.0,
) -> Dict:
) -> Dict[str, Any]:
"""For Painless Scripting Search, this is the default query."""

if not pre_filter:
Expand Down Expand Up @@ -692,7 +663,10 @@ def delete(
refresh_indices: Whether to refresh the index
after deleting documents. Defaults to True.
"""
bulk = _import_bulk()
try:
from opensearchpy.helpers import bulk
except ImportError:
raise ImportError(IMPORT_OPENSEARCH_PY_ERROR)

body = []

Expand Down

0 comments on commit d833044

Please sign in to comment.