-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: BigtableChatMessageHistory implementation #6
Changes from 16 commits
d4fafef
d639cd3
07bfa85
925ce3d
b2efefc
74e0003
9fefbf0
9d06109
7c3a00f
61634b2
abd6d44
6b6664c
5d511d2
6d2a143
e4db70f
2e762c6
39b7b1f
c208665
ebc5823
b7f0fbc
140e78d
ac3586d
a7544f5
2f527c9
20b8a96
0daa11a
0750ab0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -4,55 +4,152 @@ | |||||
"cell_type": "markdown", | ||||||
"metadata": {}, | ||||||
"source": [ | ||||||
"# Google DATABASE\n", | ||||||
"# Bigtable\n", | ||||||
"\n", | ||||||
"[Google DATABASE](https://cloud.google.com/DATABASE).\n", | ||||||
"[Bigtable](https://cloud.google.com/bigtable) is a key-value and wide-column store, ideal for fast access to structured, semi-structured, or unstructured data.\n" | ||||||
] | ||||||
}, | ||||||
{ | ||||||
"cell_type": "markdown", | ||||||
"metadata": {}, | ||||||
"source": [ | ||||||
"## Setting up" | ||||||
] | ||||||
}, | ||||||
{ | ||||||
"cell_type": "markdown", | ||||||
"metadata": {}, | ||||||
"source": [ | ||||||
"To run this notebook, you will need a Google Cloud Project, a Bigtable instance, and Google credentials." | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we provide some links to how to create or set these things? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||||||
] | ||||||
}, | ||||||
{ | ||||||
"cell_type": "code", | ||||||
"execution_count": null, | ||||||
"metadata": {}, | ||||||
"outputs": [], | ||||||
"source": [ | ||||||
"%pip install langchain-google-bigtable" | ||||||
] | ||||||
}, | ||||||
{ | ||||||
"cell_type": "markdown", | ||||||
"metadata": {}, | ||||||
"source": [ | ||||||
"## Basic Usage" | ||||||
] | ||||||
}, | ||||||
{ | ||||||
"cell_type": "code", | ||||||
"execution_count": null, | ||||||
"metadata": {}, | ||||||
"outputs": [], | ||||||
"source": [ | ||||||
"from langchain_google_bigtable import (\n", | ||||||
" BigtableChatMessageHistory,\n", | ||||||
")\n", | ||||||
"\n", | ||||||
"Save chat messages into `DATABASE`." | ||||||
"message_history = BigtableChatMessageHistory(\n", | ||||||
" instance_id=\"my-instance\",\n", | ||||||
" table_id=\"my-table\",\n", | ||||||
" session_id=\"user-session-id\",\n", | ||||||
")\n", | ||||||
"\n", | ||||||
"message_history.add_user_message(\"hi!\")\n", | ||||||
"message_history.add_ai_message(\"whats up?\")" | ||||||
] | ||||||
}, | ||||||
{ | ||||||
"cell_type": "code", | ||||||
"execution_count": null, | ||||||
"metadata": {}, | ||||||
"outputs": [ | ||||||
{ | ||||||
"data": { | ||||||
"text/plain": [ | ||||||
"[HumanMessage(content='hi!'),\n", | ||||||
" HumanMessage(content='hi!'),\n", | ||||||
" AIMessage(content='whats up?')]" | ||||||
] | ||||||
}, | ||||||
"metadata": {}, | ||||||
"output_type": "display_data" | ||||||
} | ||||||
], | ||||||
"source": [ | ||||||
"message_history.messages" | ||||||
] | ||||||
}, | ||||||
{ | ||||||
"cell_type": "markdown", | ||||||
"metadata": {}, | ||||||
"source": [ | ||||||
"## Pre-reqs" | ||||||
"## Setting up the schema\n", | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||||||
"\n", | ||||||
"The chat history will be written to a column called `history` in a column family called `langchain`. If this column family does not exist in your table, you will need to call init_schema:" | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this imply the table should be already created? Or will init_schema all create the table? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The table is not created by init_schema, only the column family. I don't mind creating the table as well, though, but at some point we said we shouldn't. I have no preference :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Table created in ctor. |
||||||
] | ||||||
}, | ||||||
{ | ||||||
"cell_type": "code", | ||||||
"execution_count": null, | ||||||
"metadata": { | ||||||
"tags": [] | ||||||
}, | ||||||
"metadata": {}, | ||||||
"outputs": [], | ||||||
"source": [ | ||||||
"%pip install PACKAGE_NAME" | ||||||
"message_history = BigtableChatMessageHistory(\n", | ||||||
" instance_id=\"my-instance\",\n", | ||||||
" table_id=\"my-table\",\n", | ||||||
")\n", | ||||||
"\n", | ||||||
"message_history.init_schema()" | ||||||
] | ||||||
}, | ||||||
{ | ||||||
"cell_type": "markdown", | ||||||
"metadata": {}, | ||||||
"source": [ | ||||||
"## Custom client\n", | ||||||
"The client created by default is the default client, using only admin=True option. To use a non-default, a [custom client](https://cloud.google.com/python/docs/reference/bigtable/latest/client#class-googlecloudbigtableclientclientprojectnone-credentialsnone-readonlyfalse-adminfalse-clientinfonone-clientoptionsnone-adminclientoptionsnone-channelnone) can be passed to the constructor." | ||||||
] | ||||||
}, | ||||||
{ | ||||||
"cell_type": "code", | ||||||
"execution_count": 3, | ||||||
"metadata": { | ||||||
"tags": [] | ||||||
}, | ||||||
"execution_count": null, | ||||||
"metadata": {}, | ||||||
"outputs": [], | ||||||
"source": [ | ||||||
"from PACKAGE import LOADER" | ||||||
"from google.cloud import bigtable\n", | ||||||
"\n", | ||||||
"custom_client_message_history = BigtableChatMessageHistory(\n", | ||||||
" instance_id=\"my-instance\",\n", | ||||||
" table_id=\"my-table\",\n", | ||||||
" client=bigtable.Client(...),\n", | ||||||
")" | ||||||
] | ||||||
}, | ||||||
{ | ||||||
"cell_type": "markdown", | ||||||
"metadata": {}, | ||||||
"source": [ | ||||||
"## Basic Usage" | ||||||
"## Cleaning up\n", | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would it flow better to add this section to the bottom of "Basic Usage"? It would then be message_history.clear() There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||||||
"\n", | ||||||
"When the history of a specific session is obsolete and can be deleted, it can be done the following way.\n", | ||||||
"Note: Once deleted, the data is no longer stored in Bigtable and is gone forever." | ||||||
] | ||||||
}, | ||||||
{ | ||||||
"cell_type": "code", | ||||||
"execution_count": null, | ||||||
"metadata": {}, | ||||||
"outputs": [], | ||||||
"source": [] | ||||||
"source": [ | ||||||
"message_history = BigtableChatMessageHistory(\n", | ||||||
" instance_id=\"my-instance\",\n", | ||||||
" table_id=\"my-table\",\n", | ||||||
" session_id=\"obsolete-session-id\",\n", | ||||||
")\n", | ||||||
"\n", | ||||||
"message_history.clear()" | ||||||
] | ||||||
} | ||||||
], | ||||||
"metadata": { | ||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
# Copyright 2024 Google LLC | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
"""Bigtable-based chat message history""" | ||
from __future__ import annotations | ||
|
||
import json | ||
import re | ||
import time | ||
import uuid | ||
from typing import List, Optional | ||
|
||
from google.cloud import bigtable | ||
from google.cloud.bigtable.row_filters import RowKeyRegexFilter | ||
from langchain_core.chat_history import BaseChatMessageHistory | ||
from langchain_core.messages import BaseMessage, messages_from_dict | ||
|
||
COLUMN_FAMILY = "langchain" | ||
COLUMN_NAME = "history" | ||
|
||
|
||
class BigtableChatMessageHistory(BaseChatMessageHistory): | ||
"""Chat message history that stores history in Bigtable. | ||
|
||
Args: | ||
instance_id: The Bigtable instance to use for chat message history. | ||
table_id: The Bigtable table to use for chat message history. | ||
session_id: The session ID. | ||
averikitsch marked this conversation as resolved.
Show resolved
Hide resolved
|
||
""" | ||
|
||
def __init__( | ||
self, | ||
instance_id: str, | ||
table_id: str, | ||
session_id: str, | ||
client: Optional[bigtable.Client] = None, | ||
) -> None: | ||
kurtisvg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
self.client = ( | ||
(client or bigtable.Client(admin=True)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we memoize the bigtable Client between multiple integrations? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
.instance(instance_id) | ||
.table(table_id) | ||
) | ||
|
||
self.session_id = session_id | ||
|
||
@property | ||
def messages(self) -> List[BaseMessage]: # type: ignore | ||
"""Retrieve all session messages from DB""" | ||
rows = self.client.read_rows( | ||
averikitsch marked this conversation as resolved.
Show resolved
Hide resolved
|
||
filter_=RowKeyRegexFilter( | ||
str.encode("^" + re.escape(self.session_id) + "#.*") | ||
) | ||
) | ||
items = [ | ||
json.loads(row.cells[COLUMN_FAMILY][COLUMN_NAME.encode()][0].value.decode()) | ||
for row in rows | ||
] | ||
messages = messages_from_dict( | ||
[{"type": item["type"], "data": item} for item in items] | ||
) | ||
return messages | ||
|
||
def init_schema(self): | ||
families = self.client.list_column_families() | ||
if COLUMN_FAMILY not in families: | ||
self.client.column_family( | ||
COLUMN_FAMILY, gc_rule=bigtable.column_family.MaxVersionsGCRule(1) | ||
).create() | ||
|
||
def add_message(self, message: BaseMessage) -> None: | ||
"""Write a message to the table""" | ||
|
||
row_key = str.encode( | ||
self.session_id | ||
+ "#" | ||
+ str(time.time_ns()).rjust(25, "0") | ||
+ "#" | ||
+ uuid.uuid4().hex | ||
) | ||
row = self.client.direct_row(row_key) | ||
value = str.encode(message.json()) | ||
row.set_cell(COLUMN_FAMILY, COLUMN_NAME, value) | ||
row.commit() | ||
|
||
def clear(self) -> None: | ||
"""Clear session memory from DB""" | ||
row_key_prefix = self.session_id | ||
self.client.drop_by_prefix(row_key_prefix, timeout=200) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should timeout be configurable? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. dropped it altogether. It probably shouldn't be configurable, as this operation should be quick enough. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: In Cloud Docs we call this "Before you begin" -- should we be consistent here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done