Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added BOM sniffing check to file load #565

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

AngelOnFira
Copy link

On Windows, config files will run into an error loading due to a Byte-order mark that may appear at the start. I don't understand this fully, but we would run into an issue from it below:

image

Where a file would load just fine on Mac/Linux, but show this on Windows. This fix will detect this zero-width mark, and properly skip it.

@ldemidov
Copy link

We have run into the same issue with reading UTF-8 files with BOM in our product too.

Seems like this PR would fix it in a decent way. Is there any reason it hasn't been merged yet. If there is another way the maintainers would like to handle this issue, I could look into doing that work.

@polarathene
Copy link
Collaborator

I'm not actively maintaining here atm, but when the PR was raised the project had no active maintainers (it does now since the move to rust-cli org.

The author probably just needs to rebase the work and so long as there aren't any concerns in review, it'll likely get merged 👍

Comment on lines +121 to +127
DecodeReaderBytesBuilder::new()
// On Windows, we need to check for added Byte Order Mark (BOM). To
// do this, we don't specify an encoding, and enable BOM sniffing.
.encoding(None)
.bom_sniffing(true)
.build(std::fs::File::open(filename)?)
.read_to_string(&mut text)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I'm reading the docs correctly, this does more than BOM sniffing / stripping but also handles other encodings. That should be a separate question, discussed in an issue first. This also then feels like a heavy weight dependency just for handling BOM.

// On Windows, we need to check for added Byte Order Mark (BOM). To
// do this, we don't specify an encoding, and enable BOM sniffing.
.encoding(None)
.bom_sniffing(true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI this should have tests. We ask that the PR be split into two commits

  1. Add the tests showing the bad behavior (ie they pass)
  2. Change behavior and update the tests to reflect that change in behavior

In this case, I would want to have the first commit have a test that is parsing content with a BOM and to show the failure that happens. The follow up commit should then strip the BOM and update the test to not error anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants