Skip to content

Commit

Permalink
Update docstrings for store classes (#2616)
Browse files Browse the repository at this point in the history
  • Loading branch information
hinthornw authored Dec 4, 2024
1 parent 584d927 commit 5fa196a
Show file tree
Hide file tree
Showing 13 changed files with 533 additions and 68 deletions.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

123 changes: 123 additions & 0 deletions docs/docs/cloud/deployment/semantic_search.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# How to add semantic search to your LangGraph deployment

This guide explains how to add semantic search to your LangGraph deployment's cross-thread [store](../../concepts/persistence.md#memory-store), so that your agent can search for memories and other documents by semantic similarity.

## Prerequisites

- A LangGraph deployment (see [how to deploy](setup_pyproject.md))
- API keys for your embedding provider (in this case, OpenAI)
- `langchain >= 0.3.8` (if you specify using the string format below)

## Steps

1. Update your `langgraph.json` configuration file to include the store configuration:

```json
{
...
"store": {
"index": {
"embed": "openai:text-embeddings-3-small",
"dims": 1536,
"fields": ["$"]
}
}
}
```

This configuration:

- Uses OpenAI's text-embeddings-3-small model for generating embeddings
- Sets the embedding dimension to 1536 (matching the model's output)
- Indexes all fields in your stored data (`["$"]` means index everything, or specify specific fields like `["text", "metadata.title"]`)

2. To use the string embedding format above, make sure your dependencies include `langchain >= 0.3.8`:

```toml
# In pyproject.toml
[project]
dependencies = [
"langchain>=0.3.8"
]
```

Or if using requirements.txt:

```
langchain>=0.3.8
```

## Usage

Once configured, you can use semantic search in your LangGraph nodes. The store requires a namespace tuple to organize memories:

```python
def search_memory(state: State, *, store: BaseStore):
# Search the store using semantic similarity
# The namespace tuple helps organize different types of memories
# e.g., ("user_facts", "preferences") or ("conversation", "summaries")
results = store.search(
namespace=("memory", "facts"), # Organize memories by type
query="your search query",
k=3 # number of results to return
)
return results
```

## Custom Embeddings

If you want to use custom embeddings, you can pass a path to a custom embedding function:

```json
{
...
"store": {
"index": {
"embed": "path/to/embedding_function.py:embed",
"dims": 1536,
"fields": ["$"]
}
}
}
```

The deployment will look for the function in the specified path. The function must be async and accept a list of strings:

```python
# path/to/embedding_function.py
from openai import AsyncOpenAI

client = AsyncOpenAI()

async def aembed_texts(texts: list[str]) -> list[list[float]]:
"""Custom embedding function that must:
1. Be async
2. Accept a list of strings
3. Return a list of float arrays (embeddings)
"""
response = await client.embeddings.create(
model="text-embedding-3-small",
input=texts
)
return [e.embedding for e in response.data]
```

## Querying via the API

You can also query the store using the LangGraph SDK. Since the SDK uses async operations:

```python
from langgraph_sdk import get_client

async def search_store():
client = get_client()
results = await client.store.search(
namespace=("memory", "facts"),
query="your search query",
limit=3 # number of results to return
)
return results

# Use in an async context
results = await search_store()
```
30 changes: 24 additions & 6 deletions docs/docs/concepts/memory.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ trim_messages(

## Long-term memory

Long-term memory in LangGraph allows systems to retain information across different conversations or sessions. Unlike short-term memory, which is thread-scoped, long-term memory is saved within custom "namespaces."
Long-term memory in LangGraph allows systems to retain information across different conversations or sessions. Unlike short-term memory, which is **thread-scoped**, long-term memory is saved within custom "namespaces."

### Storing memories

Expand All @@ -180,16 +180,34 @@ LangGraph stores long-term memories as JSON documents in a [store](persistence.m
```python
from langgraph.store.memory import InMemoryStore


def embed(texts: list[str]) -> list[list[float]]:
# Replace with an actual embedding function or LangChain embeddings object
return [[1.0, 2.0] * len(texts)]


# InMemoryStore saves data to an in-memory dictionary. Use a DB-backed store in production use.
store = InMemoryStore()
store = InMemoryStore(index={"embed": embed, "dims": 2})
user_id = "my-user"
application_context = "chitchat"
namespace = (user_id, application_context)
store.put(namespace, "a-memory", {"rules": ["User likes short, direct language", "User only speaks English & python"], "my-key": "my-value"})
store.put(
namespace,
"a-memory",
{
"rules": [
"User likes short, direct language",
"User only speaks English & python",
],
"my-key": "my-value",
},
)
# get the "memory" by ID
item = store.get(namespace, "a-memory")
# list "memories" within this namespace, filtering on content equivalence
items = store.search(namespace, filter={"my-key": "my-value"})
# search for "memories" within this namespace, filtering on content equivalence, sorted by vector similarity
items = store.search(
namespace, filter={"my-key": "my-value"}, query="language preferences"
)
```

### Framework for thinking about long-term memory
Expand Down Expand Up @@ -232,7 +250,7 @@ Alternatively, memories can be a collection of documents that are continuously u

However, this shifts some complexity memory updating. The model must now _delete_ or _update_ existing items in the list, which can be tricky. In addition, some models may default to over-inserting and others may default to over-updating. See the [Trustcall](https://github.com/hinthornw/trustcall) package for one way to manage this and consider evaluation (e.g., with a tool like [LangSmith](https://docs.smith.langchain.com/tutorials/Developers/evaluation)) to help you tune the behavior.

Working with document collections also shifts complexity to memory **search** over the list. The `Store` currently supports [filtering by metadata](https://langchain-ai.github.io/langgraph/reference/store/#storage) and will soon add [semantic search shortly](https://python.langchain.com/docs/concepts/vectorstores/), but selecting the most relevant documents can be tricky as the list grows.
Working with document collections also shifts complexity to memory **search** over the list. The `Store` currently supports both [semantic search](https://langchain-ai.github.io/langgraph/reference/store/#langgraph.store.base.SearchOp.query) and [filtering by content](https://langchain-ai.github.io/langgraph/reference/store/#langgraph.store.base.SearchOp.filter).

Finally, using a collection of memories can make it challenging to provide comprehensive context to the model. While individual memories may follow a specific schema, this structure might not capture the full context or relationships between memories. As a result, when using these memories to generate responses, the model may lack important contextual information that would be more readily available in a unified profile approach.

Expand Down
15 changes: 12 additions & 3 deletions docs/docs/how-tos/cross-thread-persistence.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,9 @@
" <p>\n",
" Support for the <code><a href=\"https://langchain-ai.github.io/langgraph/reference/store/#langgraph.store.base.BaseStore\">Store</a></code> API that is used in this guide was added in LangGraph <code>v0.2.32</code>.\n",
" </p>\n",
" <p>\n",
" Support for <b>index</b> and <b>query</b> arguments of the <code><a href=\"https://langchain-ai.github.io/langgraph/reference/store/#langgraph.store.base.BaseStore\">Store</a></code> API that is used in this guide was added in LangGraph <code>v0.2.54</code>.\n",
" </p>\n",
"</div>\n",
"\n",
"## Setup\n",
Expand Down Expand Up @@ -114,7 +117,7 @@
"\n",
"Importantly, to determine the user, we will be passing `user_id` via the config keyword argument of the node function.\n",
"\n",
"Let's first define an `InMemoryStore` which is already populated with some memories about the users."
"Let's first define an `InMemoryStore` already populated with some memories about the users."
]
},
{
Expand All @@ -125,8 +128,14 @@
"outputs": [],
"source": [
"from langgraph.store.memory import InMemoryStore\n",
"from langchain_openai import OpenAIEmbeddings\n",
"\n",
"in_memory_store = InMemoryStore()"
"in_memory_store = InMemoryStore(\n",
" index={\n",
" \"embed\": OpenAIEmbeddings(model=\"text-embedding-3-small\"),\n",
" \"dims\": 1536,\n",
" }\n",
")"
]
},
{
Expand Down Expand Up @@ -163,7 +172,7 @@
"def call_model(state: MessagesState, config: RunnableConfig, *, store: BaseStore):\n",
" user_id = config[\"configurable\"][\"user_id\"]\n",
" namespace = (\"memories\", user_id)\n",
" memories = store.search(namespace)\n",
" memories = store.search(namespace, query=str(state[\"messages\"][-1].content))\n",
" info = \"\\n\".join([d.value[\"data\"] for d in memories])\n",
" system_msg = f\"You are a helpful assistant talking to the user. User info: {info}\"\n",
"\n",
Expand Down
2 changes: 2 additions & 0 deletions docs/docs/how-tos/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ LangGraph makes it easy to manage conversation [memory](../concepts/memory.md) i
- [How to manage conversation history](memory/manage-conversation-history.ipynb)
- [How to delete messages](memory/delete-messages.ipynb)
- [How to add summary conversation memory](memory/add-summary-conversation-history.ipynb)
- [Add long-term memory (cross-thread)](cross-thread-persistence.ipynb)

### Human-in-the-loop

Expand Down Expand Up @@ -139,6 +140,7 @@ Learn how to set up your app for deployment to LangGraph Platform:
- [How to set up app for deployment (requirements.txt)](../cloud/deployment/setup.md)
- [How to set up app for deployment (pyproject.toml)](../cloud/deployment/setup_pyproject.md)
- [How to set up app for deployment (JavaScript)](../cloud/deployment/setup_javascript.md)
- [How to add semantic search](../cloud/deployment/semantic_search.md)
- [How to customize Dockerfile](../cloud/deployment/custom_docker.md)
- [How to test locally](../cloud/deployment/test_locally.md)
- [How to rebuild graph at runtime](../cloud/deployment/graph_rebuild.md)
Expand Down
62 changes: 62 additions & 0 deletions libs/checkpoint-postgres/langgraph/store/postgres/aio.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,68 @@


class AsyncPostgresStore(AsyncBatchedBaseStore, BasePostgresStore[_ainternal.Conn]):
"""Asynchronous Postgres-backed store with optional vector search using pgvector.
!!! example "Examples"
Basic setup and key-value storage:
```python
from langgraph.store.postgres import AsyncPostgresStore
async with AsyncPostgresStore.from_conn_string(
"postgresql://user:pass@localhost:5432/dbname"
) as store:
await store.setup()
# Store and retrieve data
await store.aput(("users", "123"), "prefs", {"theme": "dark"})
item = await store.aget(("users", "123"), "prefs")
```
Vector search using LangChain embeddings:
```python
from langchain.embeddings import init_embeddings
from langgraph.store.postgres import AsyncPostgresStore
async with AsyncPostgresStore.from_conn_string(
"postgresql://user:pass@localhost:5432/dbname",
index={
"dims": 1536,
"embed": init_embeddings("openai:text-embedding-3-small"),
"fields": ["text"] # specify which fields to embed. Default is the whole serialized value
}
) as store:
await store.setup() # Do this once to run migrations
# Store documents
await store.aput(("docs",), "doc1", {"text": "Python tutorial"})
await store.aput(("docs",), "doc2", {"text": "TypeScript guide"})
# Search by similarity
results = await store.asearch(("docs",), query="python programming")
```
Using connection pooling for better performance:
```python
from langgraph.store.postgres import AsyncPostgresStore, PoolConfig
async with AsyncPostgresStore.from_conn_string(
"postgresql://user:pass@localhost:5432/dbname",
pool_config=PoolConfig(
min_size=5,
max_size=20
)
) as store:
await store.setup()
# Use store with connection pooling...
```
Warning:
Make sure to:
1. Call `setup()` before first use to create necessary tables and indexes
2. Have the pgvector extension available to use vector search
3. Use Python 3.10+ for async functionality
"""

__slots__ = (
"_deserializer",
"pipe",
Expand Down
46 changes: 46 additions & 0 deletions libs/checkpoint-postgres/langgraph/store/postgres/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -534,6 +534,52 @@ def _get_filter_condition(self, key: str, op: str, value: Any) -> tuple[str, lis


class PostgresStore(BaseStore, BasePostgresStore[_pg_internal.Conn]):
"""Postgres-backed store with optional vector search using pgvector.
!!! example "Examples"
Basic setup and key-value storage:
```python
from langgraph.store.postgres import PostgresStore
store = PostgresStore(
connection_string="postgresql://user:pass@localhost:5432/dbname"
)
store.setup()
# Store and retrieve data
store.put(("users", "123"), "prefs", {"theme": "dark"})
item = store.get(("users", "123"), "prefs")
```
Vector search using LangChain embeddings:
```python
from langchain.embeddings import init_embeddings
from langgraph.store.postgres import PostgresStore
store = PostgresStore(
connection_string="postgresql://user:pass@localhost:5432/dbname",
index={
"dims": 1536,
"embed": init_embeddings("openai:text-embedding-3-small"),
"fields": ["text"] # specify which fields to embed. Default is the whole serialized value
}
)
store.setup() # Do this once to run migrations
# Store documents
store.put(("docs",), "doc1", {"text": "Python tutorial"})
store.put(("docs",), "doc2", {"text": "TypeScript guide"})
# Search by similarity
results = store.search(("docs",), query="python programming")
```
Warning:
Make sure to call `setup()` before first use to create necessary tables and indexes.
The pgvector extension must be available to use vector search.
"""

__slots__ = (
"_deserializer",
"pipe",
Expand Down
Loading

0 comments on commit 5fa196a

Please sign in to comment.