langchain-ai · vbarda · Nov 15, 2024 · Sep 10, 2024 · Sep 10, 2024 · Sep 11, 2024
diff --git a/docs/docs/integrations/chat/reka.ipynb b/docs/docs/integrations/chat/reka.ipynb
@@ -73,7 +73,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Initialize a client"
+    "## Instantiation"
    ]
   },
   {
@@ -110,7 +110,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 1,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -123,21 +123,21 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Single turn text message"
+    "## Invocation"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 2,
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "AIMessage(content=' Hello! How can I help you today? If you have a question, need assistance, or just want to chat, feel free to let me know. Have a great day!\\n\\n', additional_kwargs={}, response_metadata={}, id='run-b40e505a-5110-451a-92e6-a2a34988472c-0')"
+       "AIMessage(content=' Hello! How can I help you today? If you have a question, need assistance, or just want to chat, feel free to let me know. Have a great day!\\n\\n', additional_kwargs={}, response_metadata={}, id='run-61522ec2-0587-4fd5-a492-5b205fd8860c-0')"
       ]
      },
-     "execution_count": 3,
+     "execution_count": 2,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -155,14 +155,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 3,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      " The image shows an indoor setting with no visible weather elements. It features a cat on a desk licking a computer keyboard. The background includes a computer monitor, a desk with a few items like a pen holder and a mobile phone, and a glimpse of a window with blinds partially drawn.\n"
+      " The image shows an indoor setting with no visible windows or natural light, and there are no indicators of weather conditions. The focus is on a cat sitting on a computer keyboard, and the background includes a computer monitor and various office supplies.\n"
      ]
     }
    ],
@@ -193,18 +193,18 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 4,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      " The first image shows two German Shepherds, one adult and one puppy, in a grassy field. The adult dog is carrying a large stick in its mouth, indicating playfulness or a game being played. The background features a natural, leafy environment, suggesting an outdoor setting conducive to activities like running or training.\n",
+      " The first image features two German Shepherds, one adult and one puppy, in a vibrant, lush green setting. The adult dog is carrying a large stick in its mouth, running through what appears to be a grassy field, with the puppy following close behind. Both dogs exhibit striking physical characteristics typical of the breed, such as pointed ears and dense fur.\n",
       "\n",
-      "The second image features a close-up of a single cat with striking blue eyes, set against a background of dry leaves or grass. The cat has a calm and somewhat intense expression, with its fur neatly groomed and whiskers prominently visible. The focus is on the cat's face, capturing its serene demeanor in a quiet, natural outdoor setting.\n",
+      "The second image shows a close-up of a single cat with striking blue eyes, likely a breed like the Siberian or Maine Coon, in a natural outdoor setting. The cat's fur is lighter, possibly a mix of white and gray, and it has a more subdued expression compared to the dogs. The background is blurred, suggesting a focus on the cat's face.\n",
       "\n",
-      "The main differences lie in the subjects (dogs vs. cat) and their expressions (playful vs. serene), as well as the composition and focus of the images (outdoor play vs. close-up portrait).\n"
+      "Overall, the differences lie in the subjects (two dogs vs. one cat), the setting (lush, vibrant grassy field vs. a more muted outdoor background), and the overall mood and activity depicted (playful and active vs. serene and focused).\n"
      ]
     }
    ],
@@ -230,6 +230,52 @@
     "print(response.content)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Chaining"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "AIMessage(content=' Ich liebe Programmieren.\\n\\n', additional_kwargs={}, response_metadata={}, id='run-ffc4ace1-b73a-4fb3-ad0f-57e60a0f9b8d-0')"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from langchain_core.prompts import ChatPromptTemplate\n",
+    "\n",
+    "prompt = ChatPromptTemplate(\n",
+    "    [\n",
+    "        (\n",
+    "            \"system\",\n",
+    "            \"You are a helpful assistant that translates {input_language} to {output_language}.\",\n",
+    "        ),\n",
+    "        (\"human\", \"{input}\"),\n",
+    "    ]\n",
+    ")\n",
+    "\n",
+    "chain = prompt | model\n",
+    "chain.invoke(\n",
+    "    {\n",
+    "        \"input_language\": \"English\",\n",
+    "        \"output_language\": \"German\",\n",
+    "        \"input\": \"I love programming.\",\n",
+    "    }\n",
+    ")"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -507,6 +553,20 @@
     "    print(chunk)\n",
     "    print(\"----\")"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## API reference"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "https://docs.reka.ai/quick-start"
+   ]
   }
  ],
  "metadata": {

diff --git a/libs/community/extended_testing_deps.txt b/libs/community/extended_testing_deps.txt
@@ -87,6 +87,7 @@ telethon>=1.28.5,<2
 tidb-vector>=0.0.3,<1.0.0
 timescale-vector==0.0.1
 tqdm>=4.48.0
+tiktoken>=0.8.0
 tree-sitter>=0.20.2,<0.21
 tree-sitter-languages>=1.8.0,<2
 upstash-redis>=1.1.0,<2

diff --git a/libs/community/langchain_community/chat_models/reka.py b/libs/community/langchain_community/chat_models/reka.py
@@ -13,7 +13,6 @@
     Union,
 )
 
-import tiktoken
 from langchain_core.callbacks import (
     AsyncCallbackManagerForLLMRun,
     CallbackManagerForLLMRun,
@@ -156,7 +155,9 @@ class ChatReka(BaseChatModel):
     reka_api_key: Optional[str] = None
     model_kwargs: Dict[str, Any] = Field(default_factory=dict)
     model_config = ConfigDict(extra="forbid")
-    _tiktoken_encoder = None
+    token_counter: Optional[
+        Union[Callable[[list[BaseMessage]], int], Callable[[BaseMessage], int]]
+    ] = None
 
     @model_validator(mode="before")
     @classmethod
@@ -329,11 +330,29 @@ async def _agenerate(
 
         return ChatResult(generations=[ChatGeneration(message=message)])
 
-    def get_num_tokens(self, text: str) -> int:
+    def get_num_tokens(self, input: Union[str, BaseMessage, List[BaseMessage]]) -> int:
         """Calculate number of tokens."""
-        if self._tiktoken_encoder is None:
-            self._tiktoken_encoder = tiktoken.get_encoding("cl100k_base")
-        return len(self._tiktoken_encoder.encode(text))
+        # Initialize encoder if not already set
+
+        if self.token_counter is None:
+            try:
+                import tiktoken
+            except ImportError:
+                raise ImportError(
+                    "Could not import tiktoken python package. "
+                    "Please install it with `pip install tiktoken`."
+                )
+            encoding = tiktoken.get_encoding("cl100k_base")
+
+            if isinstance(input, str):
+                return len(encoding.encode(input))
+            elif isinstance(input, BaseMessage):
+                return len(encoding.encode(input.content))
+            elif isinstance(input, list):
+                return sum(len(encoding.encode(msg.content)) for msg in input)
+            raise ValueError(f"Got unexpected type for input: {type(input)}")
+
+        return self.token_counter(input)
 
     def bind_tools(
         self,

diff --git a/libs/community/tests/unit_tests/chat_models/test_imports.py b/libs/community/tests/unit_tests/chat_models/test_imports.py
@@ -45,7 +45,7 @@
     "ChatVertexAI",
     "ChatYandexGPT",
     "ChatYuan2",
-    "ChatReKa",
+    "ChatReka",
     "ChatZhipuAI",
     "ErnieBotChat",
     "FakeListChatModel",

diff --git a/libs/community/tests/unit_tests/chat_models/test_reka.py b/libs/community/tests/unit_tests/chat_models/test_reka.py
@@ -4,7 +4,6 @@
 from unittest.mock import MagicMock, patch
 
 import pytest
-import tiktoken
 from langchain_core.messages import AIMessage, BaseMessage, HumanMessage, SystemMessage
 from pydantic import ValidationError
 
@@ -305,20 +304,44 @@ def test_multiple_system_messages_error() -> None:
         convert_to_reka_messages(messages)
 
 
+@pytest.mark.requires("tiktoken")
 @pytest.mark.requires("reka")
 def test_get_num_tokens() -> None:
-    """Test that token counting works correctly."""
+    """Test that token counting works correctly for different input types."""
     llm = ChatReka()
+    import tiktoken
 
-    # Test basic text
+    encoding = tiktoken.get_encoding("cl100k_base")
+
+    # Test string input
     text = "Hello, world!"
-    expected_tokens = len(tiktoken.get_encoding("cl100k_base").encode(text))
+    expected_tokens = len(encoding.encode(text))
     assert llm.get_num_tokens(text) == expected_tokens
 
-    # Test empty string
+    # Test BaseMessage input
+    message = HumanMessage(content="What is the weather like today?")
+    expected_tokens = len(encoding.encode(message.content))
+    assert llm.get_num_tokens(message) == expected_tokens
+
+    # Test List[BaseMessage] input
+    messages = [
+        SystemMessage(content="You are a helpful assistant."),
+        HumanMessage(content="Hi!"),
+        AIMessage(content="Hello! How can I help you today?"),
+    ]
+    expected_tokens = sum(len(encoding.encode(msg.content)) for msg in messages)
+    assert llm.get_num_tokens(messages) == expected_tokens
+
+    # Test empty inputs
     assert llm.get_num_tokens("") == 0
+    assert llm.get_num_tokens(HumanMessage(content="")) == 0
+    assert llm.get_num_tokens([]) == 0
 
-    # Test longer text with special characters
+    # Test complex text with special characters
     complex_text = "Hello 🌍! This is a test of the token counting"
-    expected_tokens = len(tiktoken.get_encoding("cl100k_base").encode(complex_text))
+    expected_tokens = len(encoding.encode(complex_text))
     assert llm.get_num_tokens(complex_text) == expected_tokens
+
+    # Test invalid input type
+    with pytest.raises(ValueError, match="Got unexpected type for input:"):
+        llm.get_num_tokens(123)  # type: ignore