Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: BigtableChatMessageHistory implementation #6

Merged
merged 27 commits into from
Feb 7, 2024
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 112 additions & 15 deletions docs/chat_message_history.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,55 +4,152 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Google DATABASE\n",
"# Bigtable\n",
"\n",
"[Google DATABASE](https://cloud.google.com/DATABASE).\n",
"[Bigtable](https://cloud.google.com/bigtable) is a key-value and wide-column store, ideal for fast access to structured, semi-structured, or unstructured data.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setting up"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: In Cloud Docs we call this "Before you begin" -- should we be consistent here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To run this notebook, you will need a Google Cloud Project, a Bigtable instance, and Google credentials."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we provide some links to how to create or set these things?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install langchain-google-bigtable"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basic Usage"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_bigtable import (\n",
" BigtableChatMessageHistory,\n",
")\n",
"\n",
"Save chat messages into `DATABASE`."
"message_history = BigtableChatMessageHistory(\n",
" instance_id=\"my-instance\",\n",
" table_id=\"my-table\",\n",
" session_id=\"user-session-id\",\n",
")\n",
"\n",
"message_history.add_user_message(\"hi!\")\n",
"message_history.add_ai_message(\"whats up?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[HumanMessage(content='hi!'),\n",
" HumanMessage(content='hi!'),\n",
" AIMessage(content='whats up?')]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"message_history.messages"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Pre-reqs"
"## Setting up the schema\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"## Setting up the schema\n",
"## Initializing schema\n",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"\n",
"The chat history will be written to a column called `history` in a column family called `langchain`. If this column family does not exist in your table, you will need to call init_schema:"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this imply the table should be already created? Or will init_schema all create the table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table is not created by init_schema, only the column family. I don't mind creating the table as well, though, but at some point we said we shouldn't. I have no preference :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Table created in ctor.

]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"metadata": {},
"outputs": [],
"source": [
"%pip install PACKAGE_NAME"
"message_history = BigtableChatMessageHistory(\n",
" instance_id=\"my-instance\",\n",
" table_id=\"my-table\",\n",
")\n",
"\n",
"message_history.init_schema()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Custom client\n",
"The client created by default is the default client, using only admin=True option. To use a non-default, a [custom client](https://cloud.google.com/python/docs/reference/bigtable/latest/client#class-googlecloudbigtableclientclientprojectnone-credentialsnone-readonlyfalse-adminfalse-clientinfonone-clientoptionsnone-adminclientoptionsnone-channelnone) can be passed to the constructor."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"tags": []
},
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from PACKAGE import LOADER"
"from google.cloud import bigtable\n",
"\n",
"custom_client_message_history = BigtableChatMessageHistory(\n",
" instance_id=\"my-instance\",\n",
" table_id=\"my-table\",\n",
" client=bigtable.Client(...),\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basic Usage"
"## Cleaning up\n",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it flow better to add this section to the bottom of "Basic Usage"? It would then be initialize ChatMessageHistory -> add messages -> show messages -> clear messages This way you don't need to initialize another class object and can just use the one you already have...

message_history.clear()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"\n",
"When the history of a specific session is obsolete and can be deleted, it can be done the following way.\n",
"Note: Once deleted, the data is no longer stored in Bigtable and is gone forever."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
"source": [
"message_history = BigtableChatMessageHistory(\n",
" instance_id=\"my-instance\",\n",
" table_id=\"my-table\",\n",
" session_id=\"obsolete-session-id\",\n",
")\n",
"\n",
"message_history.clear()"
]
}
],
"metadata": {
Expand Down
5 changes: 5 additions & 0 deletions src/langchain_google_bigtable/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,8 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


from langchain_google_bigtable.chat_message_history import BigtableChatMessageHistory

__all__ = ["BigtableChatMessageHistory"]
99 changes: 99 additions & 0 deletions src/langchain_google_bigtable/chat_message_history.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Bigtable-based chat message history"""
from __future__ import annotations

import json
import re
import time
import uuid
from typing import List, Optional

from google.cloud import bigtable
from google.cloud.bigtable.row_filters import RowKeyRegexFilter
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.messages import BaseMessage, messages_from_dict

COLUMN_FAMILY = "langchain"
COLUMN_NAME = "history"


class BigtableChatMessageHistory(BaseChatMessageHistory):
"""Chat message history that stores history in Bigtable.

Args:
instance_id: The Bigtable instance to use for chat message history.
table_id: The Bigtable table to use for chat message history.
session_id: The session ID.
averikitsch marked this conversation as resolved.
Show resolved Hide resolved
"""

def __init__(
self,
instance_id: str,
table_id: str,
session_id: str,
client: Optional[bigtable.Client] = None,
) -> None:
kurtisvg marked this conversation as resolved.
Show resolved Hide resolved
self.client = (
(client or bigtable.Client(admin=True))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we memoize the bigtable Client between multiple integrations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

.instance(instance_id)
.table(table_id)
)

self.session_id = session_id

@property
def messages(self) -> List[BaseMessage]: # type: ignore
"""Retrieve all session messages from DB"""
rows = self.client.read_rows(
averikitsch marked this conversation as resolved.
Show resolved Hide resolved
filter_=RowKeyRegexFilter(
str.encode("^" + re.escape(self.session_id) + "#.*")
)
)
items = [
json.loads(row.cells[COLUMN_FAMILY][COLUMN_NAME.encode()][0].value.decode())
for row in rows
]
messages = messages_from_dict(
[{"type": item["type"], "data": item} for item in items]
)
return messages

def init_schema(self):
families = self.client.list_column_families()
if COLUMN_FAMILY not in families:
self.client.column_family(
COLUMN_FAMILY, gc_rule=bigtable.column_family.MaxVersionsGCRule(1)
).create()

def add_message(self, message: BaseMessage) -> None:
"""Write a message to the table"""

row_key = str.encode(
self.session_id
+ "#"
+ str(time.time_ns()).rjust(25, "0")
+ "#"
+ uuid.uuid4().hex
)
row = self.client.direct_row(row_key)
value = str.encode(message.json())
row.set_cell(COLUMN_FAMILY, COLUMN_NAME, value)
row.commit()

def clear(self) -> None:
"""Clear session memory from DB"""
row_key_prefix = self.session_id
self.client.drop_by_prefix(row_key_prefix, timeout=200)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should timeout be configurable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dropped it altogether. It probably shouldn't be configurable, as this operation should be quick enough.

Loading
Loading