-
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🤖 Add Encoding Detection to File Reading in Code Search #1495
🤖 Add Encoding Detection to File Reading in Code Search #1495
Conversation
Going to pull locally and double-check/tweak. Looks good! |
requirements-constraints.txt
Outdated
@@ -115,3 +115,4 @@ prophet==1.1.* | |||
rapidfuzz==3.10.* | |||
pytest-vcr==1.* | |||
vcrpy==6.* | |||
chardet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be pinned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm let's pin to major+minor
If tests pass then this should be good |
assert "Hello in UTF-8" in result1.matches[0].context | ||
assert "Hello in Latin-1" in result2.matches[0].context | ||
|
||
def test_read_file_with_invalid_encoding(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would love if we used production error examples here ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of course haha
👋 Hi there! This PR was automatically generated by Autofix 🤖
This fix was triggered by Rohan Agarwal
Fixes SEER-9E
This update introduces enhancements to the file reading functionality in the code search module, allowing it to detect and handle various file encodings gracefully. The changes include:
Dependency Addition: The
chardet
library is now added as a dependency in thepyproject.toml
file to facilitate encoding detection.Default Encoding Parameter: A new parameter
default_encoding
has been added to theCodeSearch
class constructor, allowing users to specify a fallback encoding when file reading fails.Encoding-Aware File Reading: The method
_read_file_with_encoding
has been introduced to read files intelligently by first attempting to auto-detect the encoding and then falling back on common encodings if detection fails. This method improves robustness when dealing with diverse file types.Error Handling Improvements: Enhanced error handling within the
search_file
method to gracefully log issues and avoid crashes during encoding errors, providing a smoother experience when handling files.If you have any questions or feedback for the Sentry team about this fix, please email [email protected] with the Run ID: 1579.