Merge pull request #1222 from Codium-ai/tr/docs_and_fixes

enhance: cap patch extra lines and update documentation with separato…
Codium-ai · Sep 12, 2024 · 5047d07 · 5047d07
2 parents dd8d78e + 7de6bb0
commit 5047d07
Show file tree

Hide file tree

Showing 4 changed files with 31 additions and 5 deletions.
diff --git a/docs/docs/faq/index.md b/docs/docs/faq/index.md
@@ -20,7 +20,7 @@
 
     Read more about this issue in our [blog](https://www.codium.ai/blog/understanding-the-challenges-and-pain-points-of-the-pull-request-cycle/)
 
-   
+___
 
 ??? note "Question: I received an incorrect or irrelevant suggestion. Why?"
 
@@ -38,22 +38,30 @@
     - In addition, we recommend to use the [`extra_instructions`](https://pr-agent-docs.codium.ai/tools/improve/#extra-instructions-and-best-practices) field to guide the model to suggestions that are more relevant to the specific needs of the project. 
     - The interactive [PR chat](https://pr-agent-docs.codium.ai/chrome-extension/) also provides an easy way to get more tailored suggestions and feedback from the AI model.
 
+___
+
 ??? note "Question: How can I get more tailored suggestions?"
     #### Answer:<span style="display:none;">3</span>
 
     See [here](https://pr-agent-docs.codium.ai/tools/improve/#extra-instructions-and-best-practices) for more information on how to use the `extra_instructions` and `best_practices` configuration options, to guide the model to more tailored suggestions.
 
+___
+
 ??? note "Question: Will you store my code ? Are you using my code to train models?"
     #### Answer:<span style="display:none;">4</span>
 
     No. PR-Agent strict privacy policy ensures that your code is not stored or used for training purposes.
 
     For a detailed overview of our data privacy policy, please refer to [this link](https://pr-agent-docs.codium.ai/overview/data_privacy/)
 
+___
+
 ??? note "Question: Can I use my own LLM keys with PR-Agent?"
     #### Answer:<span style="display:none;">5</span>
 
     When you self-host, you use your own keys. 
 
     PR-Agent Pro with SaaS deployment is a hosted version of PR-Agent, where Codium AI manages the infrastructure and the keys.
     For enterprise customers, on-prem deployment is also available. [Contact us](https://www.codium.ai/contact/#pricing) for more information.
+
+___
diff --git a/docs/docs/tools/review.md b/docs/docs/tools/review.md
@@ -8,6 +8,9 @@ The tool can be triggered automatically every time a new PR is [opened](../usage
 
 Note that the main purpose of the `review` tool is to provide the **PR reviewer** with useful feedbacks and insights. The PR author, in contrast, may prefer to save time and focus on the output of the [improve](./improve.md) tool, which provides actionable code suggestions.
 
+(Read more about the different personas in the PR process and how PR-Agent aims to assist them in our [blog](https://www.codium.ai/blog/understanding-the-challenges-and-pain-points-of-the-pull-request-cycle/))
+
+
 ## Example usage
 
 ### Manual triggering

diff --git a/docs/docs/usage-guide/additional_configurations.md b/docs/docs/usage-guide/additional_configurations.md
@@ -92,8 +92,8 @@ patch_extra_lines_before=4
 patch_extra_lines_after=2
 ```
 
-Increasing this number provides more context to the model, but will also increase the token budget.
-If the PR is too large (see [PR Compression strategy](https://github.com/Codium-ai/pr-agent/blob/main/PR_COMPRESSION.md)), PR-Agent automatically sets this number to 0, using the original git patch.
+Increasing this number provides more context to the model, but will also increase the token budget, and may overwhelm the model with too much information, unrelated to the actual PR code changes.
+If the PR is too large (see [PR Compression strategy](https://github.com/Codium-ai/pr-agent/blob/main/PR_COMPRESSION.md)), PR-Agent automatically may set this number to 0, and will use the original git patch.
 
 
 ## Editing the prompts

diff --git a/pr_agent/algo/pr_processing.py b/pr_agent/algo/pr_processing.py
@@ -23,8 +23,15 @@
 
 OUTPUT_BUFFER_TOKENS_SOFT_THRESHOLD = 1500
 OUTPUT_BUFFER_TOKENS_HARD_THRESHOLD = 1000
+MAX_EXTRA_LINES = 10
 
 
+def cap_and_log_extra_lines(value, direction) -> int:
+    if value > MAX_EXTRA_LINES:
+        get_logger().warning(f"patch_extra_lines_{direction} was {value}, capping to {MAX_EXTRA_LINES}")
+        return MAX_EXTRA_LINES
+    return value
+
 
 def get_pr_diff(git_provider: GitProvider, token_handler: TokenHandler,
                 model: str,
@@ -38,6 +45,8 @@ def get_pr_diff(git_provider: GitProvider, token_handler: TokenHandler,
     else:
         PATCH_EXTRA_LINES_BEFORE = get_settings().config.patch_extra_lines_before
         PATCH_EXTRA_LINES_AFTER = get_settings().config.patch_extra_lines_after
+        PATCH_EXTRA_LINES_BEFORE = cap_and_log_extra_lines(PATCH_EXTRA_LINES_BEFORE, "before")
+        PATCH_EXTRA_LINES_AFTER = cap_and_log_extra_lines(PATCH_EXTRA_LINES_AFTER, "after")
 
     try:
         diff_files_original = git_provider.get_diff_files()
@@ -408,11 +417,17 @@ def get_pr_multi_diffs(git_provider: GitProvider,
     for lang in pr_languages:
         sorted_files.extend(sorted(lang['files'], key=lambda x: x.tokens, reverse=True))
 
+    # Get the maximum number of extra lines before and after the patch
+    PATCH_EXTRA_LINES_BEFORE = get_settings().config.patch_extra_lines_before
+    PATCH_EXTRA_LINES_AFTER = get_settings().config.patch_extra_lines_after
+    PATCH_EXTRA_LINES_BEFORE = cap_and_log_extra_lines(PATCH_EXTRA_LINES_BEFORE, "before")
+    PATCH_EXTRA_LINES_AFTER = cap_and_log_extra_lines(PATCH_EXTRA_LINES_AFTER, "after")
+
     # try first a single run with standard diff string, with patch extension, and no deletions
     patches_extended, total_tokens, patches_extended_tokens = pr_generate_extended_diff(
         pr_languages, token_handler, add_line_numbers_to_hunks=True,
-        patch_extra_lines_before=get_settings().config.patch_extra_lines_before,
-        patch_extra_lines_after=get_settings().config.patch_extra_lines_after)
+        patch_extra_lines_before=PATCH_EXTRA_LINES_BEFORE,
+        patch_extra_lines_after=PATCH_EXTRA_LINES_AFTER)
 
     # if we are under the limit, return the full diff
     if total_tokens + OUTPUT_BUFFER_TOKENS_SOFT_THRESHOLD < get_max_tokens(model):