Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🤖 Add Encoding Detection to File Reading in Code Search #1495

Conversation

sentry-autofix[bot]
Copy link
Contributor

👋 Hi there! This PR was automatically generated by Autofix 🤖

This fix was triggered by Rohan Agarwal

Fixes SEER-9E

This update introduces enhancements to the file reading functionality in the code search module, allowing it to detect and handle various file encodings gracefully. The changes include:

  1. Dependency Addition: The chardet library is now added as a dependency in the pyproject.toml file to facilitate encoding detection.

  2. Default Encoding Parameter: A new parameter default_encoding has been added to the CodeSearch class constructor, allowing users to specify a fallback encoding when file reading fails.

  3. Encoding-Aware File Reading: The method _read_file_with_encoding has been introduced to read files intelligently by first attempting to auto-detect the encoding and then falling back on common encodings if detection fails. This method improves robustness when dealing with diverse file types.

  4. Error Handling Improvements: Enhanced error handling within the search_file method to gracefully log issues and avoid crashes during encoding errors, providing a smoother experience when handling files.

If you have any questions or feedback for the Sentry team about this fix, please email [email protected] with the Run ID: 1579.

@roaga
Copy link
Member

roaga commented Nov 22, 2024

Going to pull locally and double-check/tweak. Looks good!

@jennmueng jennmueng self-assigned this Nov 22, 2024
@jennmueng jennmueng marked this pull request as ready for review November 25, 2024 18:54
@jennmueng jennmueng requested a review from a team as a code owner November 25, 2024 18:54
@jennmueng jennmueng enabled auto-merge (squash) November 25, 2024 18:54
@@ -115,3 +115,4 @@ prophet==1.1.*
rapidfuzz==3.10.*
pytest-vcr==1.*
vcrpy==6.*
chardet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be pinned?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm let's pin to major+minor

@jennmueng
Copy link
Member

If tests pass then this should be good

assert "Hello in UTF-8" in result1.matches[0].context
assert "Hello in Latin-1" in result2.matches[0].context

def test_read_file_with_invalid_encoding(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would love if we used production error examples here ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course haha

@jennmueng jennmueng merged commit a4d438a into main Nov 25, 2024
5 checks passed
@jennmueng jennmueng deleted the autofix/add-encoding-detection-to-file-reading-in-code-search/kZK9lq branch November 25, 2024 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants