docs: tool, retriever contributing docs (#28602)

langchain-ai · Dec 7, 2024 · 9b84849 · 9b84849
1 parent 5e8553c
commit 9b84849
Show file tree

Hide file tree

Showing 2 changed files with 100 additions and 1 deletion.
diff --git a/docs/docs/contributing/how_to/integrations/index.mdx b/docs/docs/contributing/how_to/integrations/index.mdx
@@ -1,4 +1,5 @@
 ---
+pagination_prev: null
 pagination_next: contributing/how_to/integrations/package
 ---
 
@@ -37,14 +38,14 @@ While any component can be integrated into LangChain, there are specific types o
         <li>Chat Models</li>
         <li>Tools/Toolkits</li>
         <li>Retrievers</li>
-        <li>Document Loaders</li>
         <li>Vector Stores</li>
         <li>Embedding Models</li>
       </ul>
     </td>
     <td>
       <ul>
         <li>LLMs (Text-Completion Models)</li>
+        <li>Document Loaders</li>
         <li>Key-Value Stores</li>
         <li>Document Transformers</li>
         <li>Model Caches</li>

diff --git a/docs/docs/contributing/how_to/integrations/package.mdx b/docs/docs/contributing/how_to/integrations/package.mdx
@@ -175,6 +175,60 @@ import EmbeddingsSource from '/src/theme/integration_template/integration_templa
     </TabItem>
     <TabItem value="tools" label="Tools">
 
+Tools are used in 2 main ways:
+
+1. To define an "input schema" or "args schema" to pass to a chat model's tool calling
+feature along with a text request, such that the chat model can generate a "tool call",
+or parameters to call the tool with.
+2. To take a "tool call" as generated above, and take some action and return a response
+that can be passed back to the chat model as a ToolMessage.
+
+The `Tools` class must inherit from the [BaseTool](https://python.langchain.com/api_reference/core/tools/langchain_core.tools.base.BaseTool.html#langchain_core.tools.base.BaseTool) base class. This interface has 3 properties and 2 methods that should be implemented in a 
+subclass.
+
+| Method/Property         | Description                                          |
+|------------------------ |------------------------------------------------------|
+| `name`                  | Name of the tool (passed to the LLM too).            |
+| `description`           | Description of the tool (passed to the LLM too).     |
+| `args_schema`           | Define the schema for the tool's input arguments.    |
+| `_run`                  | Run the tool with the given arguments.               |
+| `_arun`                 | Asynchronously run the tool with the given arguments.|
+
+### Properties
+
+`name`, `description`, and `args_schema` are all properties that should be implemented
+in the subclass. `name` and `description` are strings that are used to identify the tool
+and provide a description of what the tool does. Both of these are passed to the LLM,
+and users may override these values depending on the LLM they are using as a form of
+"prompt engineering." Giving these a concise and LLM-usable name and description is
+important for the initial user experience of the tool.
+
+`args_schema` is a Pydantic `BaseModel` that defines the schema for the tool's input
+arguments. This is used to validate the input arguments to the tool, and to provide
+a schema for the LLM to fill out when calling the tool. Similar to the `name` and
+`description` of the overall Tool class, the fields' names (the variable name) and
+description (part of `Field(..., description="description")`) are passed to the LLM, 
+and the values in these fields should be concise and LLM-usable.
+
+### Run Methods
+
+`_run` is the main method that should be implemented in the subclass. This method
+takes in the arguments from `args_schema` and runs the tool, returning a string
+response. This method is usually called in a LangGraph [`ToolNode`](https://langchain-ai.github.io/langgraph/how-tos/tool-calling/), and can also be called in a legacy
+`langchain.agents.AgentExecutor`.
+
+`_arun` is optional because by default, `_run` will be run in an async executor.
+However, if your tool is calling any apis or doing any async work, you should implement
+this method to run the tool asynchronously in addition to `_run`.
+
+### Implementation
+
+The `langchain-cli` package contains [template integrations](https://github.com/langchain-ai/langchain/tree/master/libs/cli/langchain_cli/integration_template/integration_template)
+for major LangChain components that are tested against the standard unit and
+integration tests in the LangChain Github repository. You can access the starter
+embedding model implementation [here](https://github.com/langchain-ai/langchain/blob/master/libs/cli/langchain_cli/integration_template/integration_template/tools.py).
+For convenience, we also include the code below.
+
         <details>
             <summary>Example tool code</summary>
 
@@ -194,6 +248,50 @@ import ToolSource from '/src/theme/integration_template/integration_template/too
     </TabItem>
     <TabItem value="retrievers" label="Retrievers">
 
+Retrievers are used to retrieve documents from APIs, databases, or other sources
+based on a query. The `Retriever` class must inherit from the [BaseRetriever](https://python.langchain.com/api_reference/core/retrievers/langchain_core.retrievers.BaseRetriever.html) base class. This interface has 1 attribute and 2 methods that should be implemented in a subclass.
+
+| Method/Property         | Description                                          |
+|------------------------ |------------------------------------------------------|
+| `k`                     | Default number of documents to retrieve (configurable). |
+| `_get_relevant_documents`| Retrieve documents based on a query.                 |
+| `_aget_relevant_documents`| Asynchronously retrieve documents based on a query.  |
+
+### Attributes
+
+`k` is an attribute that should be implemented in the subclass. This attribute
+can simply be defined at the top of the class with a default value like
+`k: int = 5`. This attribute is the default number of documents to retrieve
+from the retriever, and can be overridden by the user when constructing or calling
+the retriever.
+
+### Methods
+
+`_get_relevant_documents` is the main method that should be implemented in the subclass.
+
+This method takes in a query and returns a list of `Document` objects, which have 2
+main properties:
+
+- `page_content` - the text content of the document
+- `metadata` - a dictionary of metadata about the document
+
+Retrievers are typically directly invoked by a user, e.g. as
+`MyRetriever(k=4).invoke("query")`, which will automatically call `_get_relevant_documents`
+under the hood.
+
+`_aget_relevant_documents` is optional because by default, `_get_relevant_documents` will
+be run in an async executor. However, if your retriever is calling any apis or doing
+any async work, you should implement this method to run the retriever asynchronously
+in addition to `_get_relevant_documents` for performance reasons.
+
+### Implementation
+
+The `langchain-cli` package contains [template integrations](https://github.com/langchain-ai/langchain/tree/master/libs/cli/langchain_cli/integration_template/integration_template)
+for major LangChain components that are tested against the standard unit and
+integration tests in the LangChain Github repository. You can access the starter
+embedding model implementation [here](https://github.com/langchain-ai/langchain/blob/master/libs/cli/langchain_cli/integration_template/integration_template/retrievers.py).
+For convenience, we also include the code below.
+
         <details>
             <summary>Example retriever code</summary>