chore:add deepseek reasonr separate answer and deepseek chat model ex…

…ample (#1498) Co-authored-by: Wendong <[email protected]> Co-authored-by: Wendong-Fan <[email protected]>
camel-ai · Jan 24, 2025 · b369100 · b369100
1 parent 167a836
commit b369100
Show file tree

Hide file tree

Showing 3 changed files with 273 additions and 0 deletions.
diff --git a/examples/models/deepseek_chat_model_example.py b/examples/models/deepseek_chat_model_example.py
@@ -0,0 +1,110 @@
+# ========= Copyright 2023-2024 @ CAMEL-AI.org. All Rights Reserved. =========
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ========= Copyright 2023-2024 @ CAMEL-AI.org. All Rights Reserved. =========
+
+from camel.agents import ChatAgent
+from camel.configs import DeepSeekConfig
+from camel.models import ModelFactory
+from camel.types import ModelPlatformType, ModelType
+
+"""
+please set the below os environment:
+export DEEPSEEK_API_KEY=""
+"""
+
+model = ModelFactory.create(
+    model_platform=ModelPlatformType.DEEPSEEK,
+    model_type=ModelType.DEEPSEEK_CHAT,
+    model_config_dict=DeepSeekConfig(temperature=0.2).as_dict(),
+)
+
+# Define system message
+sys_msg = "You are a helpful assistant."
+
+# Set agent
+camel_agent = ChatAgent(system_message=sys_msg, model=model)
+
+user_msg = """How many Rs are there in the word 'strawberry'?"""
+
+# Get response information
+response = camel_agent.step(user_msg)
+print(response.msgs[0].content)
+
+# ruff: noqa: E501
+'''
+### Step 1: Understanding the Problem
+
+Before jumping into counting, it's essential to understand what's being asked. The question is: **"How many Rs are there in the word 'strawberry'?"** This means I need to look at the word "strawberry" and count how many times the letter 'R' appears in it.
+
+### Step 2: Writing Down the Word
+
+To avoid missing any letters, I'll write down the word clearly:
+
+**S T R A W B E R R Y**
+
+Breaking it down like this helps me visualize each letter individually.
+
+### Step 3: Identifying Each Letter
+
+Now, I'll go through each letter one by one to identify if it's an 'R':
+
+1. **S** - Not an 'R'.
+2. **T** - Not an 'R'.
+3. **R** - This is an 'R'. (First 'R' found)
+4. **A** - Not an 'R'.
+5. **W** - Not an 'R'.
+6. **B** - Not an 'R'.
+7. **E** - Not an 'R'.
+8. **R** - This is an 'R'. (Second 'R' found)
+9. **R** - This is another 'R'. (Third 'R' found)
+10. **Y** - Not an 'R'.
+
+### Step 4: Counting the Rs
+
+From the above identification:
+
+- The first 'R' is the 3rd letter.
+- The second 'R' is the 8th letter.
+- The third 'R' is the 9th letter.
+
+So, there are **three** instances of the letter 'R' in "strawberry."
+
+### Step 5: Double-Checking
+
+To ensure accuracy, I'll recount:
+
+1. **S** - Not 'R'.
+2. **T** - Not 'R'.
+3. **R** - 1st 'R'.
+4. **A** - Not 'R'.
+5. **W** - Not 'R'.
+6. **B** - Not 'R'.
+7. **E** - Not 'R'.
+8. **R** - 2nd 'R'.
+9. **R** - 3rd 'R'.
+10. **Y** - Not 'R'.
+
+Yes, the count remains consistent at three 'R's.
+
+### Step 6: Considering Pronunciation
+
+Sometimes, pronunciation can be misleading. In "strawberry," the 'R's are pronounced, but that doesn't affect the count. Whether silent or pronounced, each 'R' is still a distinct letter in the spelling.
+
+### Step 7: Final Answer
+
+After carefully analyzing and recounting, I conclude that there are **three** 'R's in the word "strawberry."
+
+---
+
+**Final Answer:** There are **three** 'R's in the word "strawberry."
+'''
diff --git a/examples/models/deepseek_model_example.py → ...models/deepseek_reasoner_model_example.py b/examples/models/deepseek_model_example.py → ...models/deepseek_reasoner_model_example.py
diff --git a/examples/models/deepseek_reasoner_model_separate_answers.py b/examples/models/deepseek_reasoner_model_separate_answers.py
@@ -0,0 +1,163 @@
+# ========= Copyright 2023-2024 @ CAMEL-AI.org. All Rights Reserved. =========
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ========= Copyright 2023-2024 @ CAMEL-AI.org. All Rights Reserved. =========
+
+import re
+
+from camel.agents import ChatAgent
+from camel.configs import DeepSeekConfig
+from camel.models import ModelFactory
+from camel.types import ModelPlatformType, ModelType
+
+"""
+please set the below os environment:
+export DEEPSEEK_API_KEY=""
+"""
+
+model = ModelFactory.create(
+    model_platform=ModelPlatformType.DEEPSEEK,
+    model_type=ModelType.DEEPSEEK_REASONER,
+    model_config_dict=DeepSeekConfig(temperature=0.2).as_dict(),
+)
+
+# Define system message
+sys_msg = "You are a helpful assistant."
+
+# Set agent
+camel_agent = ChatAgent(system_message=sys_msg, model=model)
+
+user_msg = """Please explain in detail how the output sequence of transformer becomes longer"""
+
+# Get response information
+response = camel_agent.step(user_msg)
+
+
+def extract_original_response(content):
+    # Split the content at the marker
+    parts = content.split("BELOW IS THE REASONING CONTENT:")
+    if len(parts) > 0:
+        return parts[0].strip()
+    return ""
+
+
+# Extract original response
+original_response = extract_original_response(response.msgs[0].content)
+print("Original Response:")
+print(original_response)
+
+# Extract reasoning content using regex
+reasoning_pattern = (
+    r"BELOW IS THE REASONING CONTENT:(.*?)(?=BELOW IS THE FINAL ANSWER:|$)"
+)
+reasoning_match = re.search(
+    reasoning_pattern, response.msgs[0].content, re.DOTALL
+)
+reasoning_response = (
+    reasoning_match.group(1).strip() if reasoning_match else ""
+)
+
+print("Reasoning Response:")
+print(reasoning_response)
+# ruff: noqa: E501,RUF001
+"""
+
+Original Response:
+The output sequence of a transformer model can dynamically grow longer due to its **autoregressive generation mechanism** and architectural design. Here's a detailed breakdown of how this works:
+
+---
+
+### 1. **Autoregressive Generation**
+Transformers (especially decoder-only models like GPT or encoder-decoder models like T5) generate sequences **token-by-token** in a left-to-right manner:
+- **Step 1**: Start with an initial token (e.g., `<START>`).
+- **Step 2**: Feed the current sequence into the model to predict the **next token**.
+- **Step 3**: Append the new token to the sequence.
+- **Step 4**: Repeat until a stopping condition is met (e.g., an `<END>` token or maximum length).
+
+This iterative process inherently allows the output to grow incrementally.
+
+---
+
+### 2. **Key Architectural Features**
+#### a. **Masked Self-Attention**
+- The decoder uses **masked self-attention** to ensure each token only attends to previous positions (not future ones).
+- **During inference**: At each step, the model processes the entire sequence generated so far, but the mask restricts attention to existing tokens. This allows the model to handle sequences of arbitrary length.
+
+#### b. **Positional Encodings**
+- Positional embeddings (absolute or relative) assign a unique representation to each token’s position.
+- As new tokens are added, their positions are dynamically encoded, enabling the model to distinguish between tokens at different steps.
+
+#### c. **Cached Key-Value States**
+- To optimize efficiency, many implementations cache the **key-value (KV) states** of previous tokens.
+- This avoids recomputing all previous states at each step, making autoregressive generation faster while still allowing the sequence to grow.
+
+---
+
+### 3. **Training vs. Inference Dynamics**
+- **Training**: The model is trained with **teacher forcing**, where the entire target sequence is provided, and a causal mask ensures each token only sees prior tokens.
+- **Inference**: The model generates tokens sequentially, building the output incrementally. The lack of a fixed output length constraint allows flexibility.
+
+---
+
+### 4. **Stopping Conditions**
+The sequence grows until one of these conditions is met:
+- A predefined **`<END>` token** is generated.
+- A **maximum length** threshold is reached (e.g., 512 tokens).
+- A **probability threshold** (e.g., low confidence in next token).
+
+---
+
+### 5. **Example Workflow**
+Let’s say we’re generating text with GPT-3:
+1. Input: `"The cat sat on the"`
+2. Step 1: Model predicts `"mat"` → Output: `"The cat sat on the mat"`
+3. Step 2: Model predicts `"."` → Output: `"The cat sat on the mat."`
+4. Step 3: Model predicts `<END>` → Generation stops.
+
+At each step, the sequence length increases by 1.
+
+---
+
+### 6. **Non-Autoregressive Extensions (Optional)**
+Some variants (e.g., **non-autoregressive transformers**) generate all tokens in parallel but require tricks like:
+- **Iterative refinement** (e.g., Mask-Predict) to revise outputs over multiple steps.       
+- **Length prediction** upfront (e.g., for machine translation).
+
+---
+
+### Summary
+Transformers produce longer sequences by:
+1. Autoregressively generating one token at a time.
+2. Using masked attention and positional encodings to handle variable lengths.
+3. Caching intermediate states for efficiency.
+4. Halting based on dynamic stopping conditions.
+
+This design balances flexibility with computational feasibility, enabling applications like chatbots, translation, and text generation.
+Reasoning Response:
+Okay, so the user wants to know how the output sequence of a transformer becomes longer. Hmm, let me start by recalling what I know about transformers. Transformers are used in models like GPT and BERT for tasks like translation, text generation, etc. They process sequences in parallel using self-attention mechanisms. But when generating output, especially in autoregressive models, they produce tokens one by one.
+
+Wait, the output sequence length... Normally, in machine translation, the model might have a fixed output length, but sometimes it's variable. How does that happen? Oh right, in autoregressive models, each generated token is fed back into the model to predict the next one. So the sequence grows step by step. But how exactly does the architecture allow this?
+
+Maybe it's because the decoder uses masked self-attention to prevent looking at future tokens. So during training, the model learns to predict the next token given the previous ones. At inference time, it starts with a start token, generates the first token, then appends that to the input and generates the next, and so on until an end token is produced. That way, the output sequence can keep growing until the stop condition is met.
+
+But wait, what about models that generate non-autoregressively? Like some parallel decoding methods. Oh right, some approaches like the Transformer-XL or models using iterative refinement might generate sequences in chunks or revise previous outputs. But the standard transformer for tasks like translation is autoregressive.
+
+Another thing: positional encodings. Since each new token is added to the output, the positional embeddings would increment accordingly. So the model can handle variable lengths because the positional encodings are applied based on the position, not fixed during training. During training, they mi training, they might use teacher forcing with varying sequence lengths, so the model is trained to handle different positions.
+
+Also, during generation, the model's self-attention in the decoder is masked so each step only attends to previous positions. That allows the model to process sequences of increasing length step by step. The key is that the model isn't constrained by a fixed window size, thanks to the positional embeddings and the autoregressive mechanism.
+
+Wait, but how does the model manage memory or computational constraints when the sequence gets very long? Oh, right, in practice, there might be a maximum length set due to memory limits, but theoretically, it can keep generating until the stop token. So the output sequence length is dynamic based on the generation process.
+
+So putting it all together, the transformer decoder generates tokens one at a time, each time taking the previously generated tokens as input, using positional encodings to handle the growing length, and masking to ensure each step only depends on prior tokens. This iterative process allows the output sequence to become longer until a termination condition is met.
+
+
+"""