Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comment generation logic improved. #36

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

Nisarg1112
Copy link

@Nisarg1112 Nisarg1112 commented Jan 31, 2024

Issues this PR focuses on:

  1. In certain instances, the tool truncates large function methods during comment generation, impacting the accuracy of the code logic.
  2. If the objective is solely to generate comments on top of functions and not inline comments, requesting the original method implementation may not align with the intended goal. Requesting the method implementation along with comment generation leads to a notable increase in token utilization, especially on large repositories, resulting in a significant impact on costs, with token utilization escalating by up to 5X.
  3. When utilizing Azure OpenAI or OpenAI API, frequent rate-limiting errors are encountered, leading to langchain errors after multiple retries. The current approach of writing all generated comments back to the file only after processing all functions poses a risk of losing comments for which tokens have already been spent and Azure or OpenAI has incurred charges.
  4. Lack of visibility into the number of tokens being generated prevents comprehensive financial analysis when using this tool for comment generation on large repositories.

improvements added to solve above issues:

  1. By default, the tool is set to request only comment generation. The response is analyzed using a regex match against the generated markdown block to extract comments for the specified language. This addresses the issue encountered when requesting comments with method implementation, as sometimes it didn't generate the correct method implementation, impacting the logic. To eliminate the need for method implementation and ensure accurate comments, we focus solely on comment generation. This approach leads to a reduction in token utilization, as the emphasis is on generating comments exclusively. It also reduces time taken for comment generation.
  2. If the --inline or --comment_with_source_code argument is provided, the tool will generate comments along with the corresponding code.
  3. When writing comments back to the function, the tool now appends them on top of the original functions, preserving the existing code. However, if it's an --inline comment or the --comment_with_source_code argument is specified, the tool replaces the original code block in the file.
  4. Comments are promptly written back to the file immediately after generation, mitigating the risk of data loss in case of rate-limiting errors or other issues.
  5. In the event of the program being halted due to rate-limiting or any other error, it can seamlessly resume comment generation from the function where it stopped earlier, ensuring continuity in the process.

…generated comment only while maintaining backward compatibility. Improved logic to write generated comments back to the file saving tokens which can directly affect cost for comment generation
@fynnfluegge
Copy link
Owner

Hey @Nisarg1112 awesome work! Will have a close look asap. But your concerns are justified, the improvements make sense to me. Very good suggestions!

@Nisarg1112
Copy link
Author

Hey, @fynnfluegge Did you get a chance to look at these changes?

@fynnfluegge
Copy link
Owner

fynnfluegge commented Feb 4, 2024

Hey @Nisarg1112 had a closer look now. Also did some manual tests. I think there is a bug in extract_comments_from_markdown_code_block. After generating docs for some methods extract_comments_from_markdown_code_block returns always empty strings for me. But the llm returns only the docs without the code of the method itself. This is a great improvement, it saves lots of tokens!
We need some unit tests for utils.extract_comments_from_markdown_code_block, similar tests as in doc-comments-ai/tests/response_parser_test.py.

Here I started also writing some unit tests to write the comments back to the file https://github.com/fynnfluegge/doc-comments-ai/tree/chore/write_code_to_file_tests.
At the end we can write some nice intergration tests maybe. But however, we need unit tests on both ends.

Let me know if you need some help 🙌

# This function retrieves the comment pattern for a specified programming language
def get_comments_pattern_for_language(language):
comment_patterns = {
"python": r"#.*",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I would prefer to use the value of the enum constants.Language as key

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay cool!

print(f"✅ Doc comment for {method_name} generated.")

print(f"📊 Total Input Tokens: {total_original_tokens}")
print(f"🚀 Total Generated Tokens: {total_generated_tokens}")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like these outputs! Maybe we can add a --verbose argument and only output token details in verbose mode.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure nice idea! I'll make this change.

@Nisarg1112
Copy link
Author

Hey @Nisarg1112 had a closer look now. Also did some manual tests. I think there is a bug in extract_comments_from_markdown_code_block. After generating docs for some methods extract_comments_from_markdown_code_block returns always empty strings for me. But the llm returns only the docs without the code of the method itself. This is a great improvement, it saves lots of tokens! We need some unit tests for utils.extract_comments_from_markdown_code_block, similar tests as in doc-comments-ai/tests/response_parser_test.py.

Here I started also writing some unit tests to write the comments back to the file https://github.com/fynnfluegge/doc-comments-ai/tree/chore/write_code_to_file_tests. At the end we can write some nice intergration tests maybe. But however, we need unit tests on both ends.

Let me know if you need some help 🙌

Yeah right! I tested it for Haskell only, we should definitely add unit tests. i'll work on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants