Added BOM sniffing check to file load #565

AngelOnFira · 2024-05-25T03:34:29Z

On Windows, config files will run into an error loading due to a Byte-order mark that may appear at the start. I don't understand this fully, but we would run into an issue from it below:

Where a file would load just fine on Mac/Linux, but show this on Windows. This fix will detect this zero-width mark, and properly skip it.

ldemidov · 2025-02-20T20:59:45Z

We have run into the same issue with reading UTF-8 files with BOM in our product too.

Seems like this PR would fix it in a decent way. Is there any reason it hasn't been merged yet. If there is another way the maintainers would like to handle this issue, I could look into doing that work.

polarathene · 2025-02-20T21:29:38Z

I'm not actively maintaining here atm, but when the PR was raised the project had no active maintainers (it does now since the move to rust-cli org.

The author probably just needs to rebase the work and so long as there aren't any concerns in review, it'll likely get merged 👍

epage · 2025-02-20T22:33:28Z

src/file/source/file.rs

+        DecodeReaderBytesBuilder::new()
+            // On Windows, we need to check for added Byte Order Mark (BOM). To
+            // do this, we don't specify an encoding, and enable BOM sniffing.
+            .encoding(None)
+            .bom_sniffing(true)
+            .build(std::fs::File::open(filename)?)
+            .read_to_string(&mut text)?;


if I'm reading the docs correctly, this does more than BOM sniffing / stripping but also handles other encodings. That should be a separate question, discussed in an issue first. This also then feels like a heavy weight dependency just for handling BOM.

epage · 2025-02-20T22:35:02Z

src/file/source/file.rs

+            // On Windows, we need to check for added Byte Order Mark (BOM). To
+            // do this, we don't specify an encoding, and enable BOM sniffing.
+            .encoding(None)
+            .bom_sniffing(true)


FYI this should have tests. We ask that the PR be split into two commits

Add the tests showing the bad behavior (ie they pass)

Change behavior and update the tests to reflect that change in behavior

In this case, I would want to have the first commit have a test that is parsing content with a BOM and to show the failure that happens. The follow up commit should then strip the BOM and update the test to not error anymore.

Added BOM sniffing check to file load

04e6135

epage reviewed Feb 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added BOM sniffing check to file load #565

Added BOM sniffing check to file load #565

AngelOnFira commented May 25, 2024

ldemidov commented Feb 20, 2025

polarathene commented Feb 20, 2025

epage Feb 20, 2025

epage Feb 20, 2025

Added BOM sniffing check to file load #565

Are you sure you want to change the base?

Added BOM sniffing check to file load #565

Conversation

AngelOnFira commented May 25, 2024

ldemidov commented Feb 20, 2025

polarathene commented Feb 20, 2025

epage Feb 20, 2025

Choose a reason for hiding this comment

epage Feb 20, 2025

Choose a reason for hiding this comment