Comment generation logic improved. #36

Nisarg1112 · 2024-01-31T21:17:04Z

Issues this PR focuses on:

In certain instances, the tool truncates large function methods during comment generation, impacting the accuracy of the code logic.
If the objective is solely to generate comments on top of functions and not inline comments, requesting the original method implementation may not align with the intended goal. Requesting the method implementation along with comment generation leads to a notable increase in token utilization, especially on large repositories, resulting in a significant impact on costs, with token utilization escalating by up to 5X.
When utilizing Azure OpenAI or OpenAI API, frequent rate-limiting errors are encountered, leading to langchain errors after multiple retries. The current approach of writing all generated comments back to the file only after processing all functions poses a risk of losing comments for which tokens have already been spent and Azure or OpenAI has incurred charges.
Lack of visibility into the number of tokens being generated prevents comprehensive financial analysis when using this tool for comment generation on large repositories.

improvements added to solve above issues:

By default, the tool is set to request only comment generation. The response is analyzed using a regex match against the generated markdown block to extract comments for the specified language. This addresses the issue encountered when requesting comments with method implementation, as sometimes it didn't generate the correct method implementation, impacting the logic. To eliminate the need for method implementation and ensure accurate comments, we focus solely on comment generation. This approach leads to a reduction in token utilization, as the emphasis is on generating comments exclusively. It also reduces time taken for comment generation.
If the --inline or --comment_with_source_code argument is provided, the tool will generate comments along with the corresponding code.
When writing comments back to the function, the tool now appends them on top of the original functions, preserving the existing code. However, if it's an --inline comment or the --comment_with_source_code argument is specified, the tool replaces the original code block in the file.
Comments are promptly written back to the file immediately after generation, mitigating the risk of data loss in case of rate-limiting errors or other issues.
In the event of the program being halted due to rate-limiting or any other error, it can seamlessly resume comment generation from the function where it stopped earlier, ensuring continuity in the process.

…generated comment only while maintaining backward compatibility. Improved logic to write generated comments back to the file saving tokens which can directly affect cost for comment generation

fynnfluegge · 2024-02-01T09:35:11Z

Hey @Nisarg1112 awesome work! Will have a close look asap. But your concerns are justified, the improvements make sense to me. Very good suggestions!

Nisarg1112 · 2024-02-03T21:56:55Z

Hey, @fynnfluegge Did you get a chance to look at these changes?

fynnfluegge · 2024-02-04T09:49:27Z

Hey @Nisarg1112 had a closer look now. Also did some manual tests. I think there is a bug in extract_comments_from_markdown_code_block. After generating docs for some methods extract_comments_from_markdown_code_block returns always empty strings for me. But the llm returns only the docs without the code of the method itself. This is a great improvement, it saves lots of tokens!
We need some unit tests for utils.extract_comments_from_markdown_code_block, similar tests as in doc-comments-ai/tests/response_parser_test.py.

Here I started also writing some unit tests to write the comments back to the file https://github.com/fynnfluegge/doc-comments-ai/tree/chore/write_code_to_file_tests.
At the end we can write some nice intergration tests maybe. But however, we need unit tests on both ends.

Let me know if you need some help 🙌

fynnfluegge · 2024-02-04T09:50:40Z

doc_comments_ai/utils.py

+# This function retrieves the comment pattern for a specified programming language
+def get_comments_pattern_for_language(language):
+    comment_patterns = {
+        "python": r"#.*",


Here I would prefer to use the value of the enum constants.Language as key

fynnfluegge · 2024-02-04T10:38:10Z

doc_comments_ai/app.py

+        print(f"✅ Doc comment for {method_name} generated.")
+
+    print(f"📊 Total Input Tokens: {total_original_tokens}")
+    print(f"🚀 Total Generated Tokens: {total_generated_tokens}")


I like these outputs! Maybe we can add a --verbose argument and only output token details in verbose mode.

Sure nice idea! I'll make this change.

Nisarg1112 · 2024-02-04T17:32:57Z

Hey @Nisarg1112 had a closer look now. Also did some manual tests. I think there is a bug in extract_comments_from_markdown_code_block. After generating docs for some methods extract_comments_from_markdown_code_block returns always empty strings for me. But the llm returns only the docs without the code of the method itself. This is a great improvement, it saves lots of tokens! We need some unit tests for utils.extract_comments_from_markdown_code_block, similar tests as in doc-comments-ai/tests/response_parser_test.py.

Here I started also writing some unit tests to write the comments back to the file https://github.com/fynnfluegge/doc-comments-ai/tree/chore/write_code_to_file_tests. At the end we can write some nice intergration tests maybe. But however, we need unit tests on both ends.

Let me know if you need some help 🙌

Yeah right! I tested it for Haskell only, we should definitely add unit tests. i'll work on it.

Nisarg1112 added 3 commits February 1, 2024 02:05

Added support to generate only comment. Added functionality to parse …

8fc8914

…generated comment only while maintaining backward compatibility. Improved logic to write generated comments back to the file saving tokens which can directly affect cost for comment generation

Removed unnecessary prints and better monitoring of tokens

69affcd

Removed Haskell specific code line to solve internal usecase

ee80941

fynnfluegge reviewed Feb 4, 2024

View reviewed changes

Nisarg1112 and others added 4 commits March 6, 2024 18:36

Prompt updated

a287416

Minor description updated

427775f

Prompt updated to avoid writing comments for self-explanatory functions

10862c0

Merge branch 'fynnfluegge:main' into fixes/jp-comment-generation

bc248a3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comment generation logic improved. #36

Comment generation logic improved. #36

Nisarg1112 commented Jan 31, 2024 •

edited

Loading

fynnfluegge commented Feb 1, 2024

Nisarg1112 commented Feb 3, 2024

fynnfluegge commented Feb 4, 2024 •

edited

Loading

fynnfluegge Feb 4, 2024

Nisarg1112 Feb 4, 2024

fynnfluegge Feb 4, 2024

Nisarg1112 Feb 4, 2024

Nisarg1112 commented Feb 4, 2024

Comment generation logic improved. #36

Are you sure you want to change the base?

Comment generation logic improved. #36

Conversation

Nisarg1112 commented Jan 31, 2024 • edited Loading

fynnfluegge commented Feb 1, 2024

Nisarg1112 commented Feb 3, 2024

fynnfluegge commented Feb 4, 2024 • edited Loading

fynnfluegge Feb 4, 2024

Choose a reason for hiding this comment

Nisarg1112 Feb 4, 2024

Choose a reason for hiding this comment

fynnfluegge Feb 4, 2024

Choose a reason for hiding this comment

Nisarg1112 Feb 4, 2024

Choose a reason for hiding this comment

Nisarg1112 commented Feb 4, 2024

Nisarg1112 commented Jan 31, 2024 •

edited

Loading

fynnfluegge commented Feb 4, 2024 •

edited

Loading