🤖 Add Encoding Detection to File Reading in Code Search #1495

sentry-autofix · 2024-11-22T19:00:55Z

👋 Hi there! This PR was automatically generated by Autofix 🤖

This fix was triggered by Rohan Agarwal

This update introduces enhancements to the file reading functionality in the code search module, allowing it to detect and handle various file encodings gracefully. The changes include:

Dependency Addition: The chardet library is now added as a dependency in the pyproject.toml file to facilitate encoding detection.
Default Encoding Parameter: A new parameter default_encoding has been added to the CodeSearch class constructor, allowing users to specify a fallback encoding when file reading fails.
Encoding-Aware File Reading: The method _read_file_with_encoding has been introduced to read files intelligently by first attempting to auto-detect the encoding and then falling back on common encodings if detection fails. This method improves robustness when dealing with diverse file types.
Error Handling Improvements: Enhanced error handling within the search_file method to gracefully log issues and avoid crashes during encoding errors, providing a smoother experience when handling files.

If you have any questions or feedback for the Sentry team about this fix, please email [email protected] with the Run ID: 1579.

roaga · 2024-11-22T19:01:18Z

Going to pull locally and double-check/tweak. Looks good!

trillville · 2024-11-25T18:54:58Z

requirements-constraints.txt

@@ -115,3 +115,4 @@ prophet==1.1.*
 rapidfuzz==3.10.*
 pytest-vcr==1.*
 vcrpy==6.*
+chardet


should this be pinned?

hmm let's pin to major+minor

jennmueng · 2024-11-25T18:58:18Z

If tests pass then this should be good

trillville · 2024-11-25T18:59:05Z

tests/automation/codebase/test_code_search.py

+        assert "Hello in UTF-8" in result1.matches[0].context
+        assert "Hello in Latin-1" in result2.matches[0].context
+
+    def test_read_file_with_invalid_encoding(self):


would love if we used production error examples here ;)

of course haha

sentry-autofix bot added 2 commits November 22, 2024 19:00

File change

f2620e2

File change

d9bd7c7

jennmueng self-assigned this Nov 22, 2024

update requirements and add test cases

74f953f

jennmueng marked this pull request as ready for review November 25, 2024 18:54

jennmueng requested a review from a team as a code owner November 25, 2024 18:54

jennmueng enabled auto-merge (squash) November 25, 2024 18:54

trillville reviewed Nov 25, 2024

View reviewed changes

pin chardet

c3484a5

trillville reviewed Nov 25, 2024

View reviewed changes

trillville approved these changes Nov 25, 2024

View reviewed changes

fix mypy

3d04e7c

jennmueng merged commit a4d438a into main Nov 25, 2024
5 checks passed

jennmueng deleted the autofix/add-encoding-detection-to-file-reading-in-code-search/kZK9lq branch November 25, 2024 22:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🤖 Add Encoding Detection to File Reading in Code Search #1495

🤖 Add Encoding Detection to File Reading in Code Search #1495

sentry-autofix bot commented Nov 22, 2024

roaga commented Nov 22, 2024

trillville Nov 25, 2024

jennmueng Nov 25, 2024

jennmueng commented Nov 25, 2024

trillville Nov 25, 2024

jennmueng Nov 25, 2024

🤖 Add Encoding Detection to File Reading in Code Search #1495

🤖 Add Encoding Detection to File Reading in Code Search #1495

Conversation

sentry-autofix bot commented Nov 22, 2024

roaga commented Nov 22, 2024

trillville Nov 25, 2024

Choose a reason for hiding this comment

jennmueng Nov 25, 2024

Choose a reason for hiding this comment

jennmueng commented Nov 25, 2024

trillville Nov 25, 2024

Choose a reason for hiding this comment

jennmueng Nov 25, 2024

Choose a reason for hiding this comment