-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added BOM sniffing check to file load #565
base: main
Are you sure you want to change the base?
Conversation
We have run into the same issue with reading UTF-8 files with BOM in our product too. Seems like this PR would fix it in a decent way. Is there any reason it hasn't been merged yet. If there is another way the maintainers would like to handle this issue, I could look into doing that work. |
I'm not actively maintaining here atm, but when the PR was raised the project had no active maintainers (it does now since the move to The author probably just needs to rebase the work and so long as there aren't any concerns in review, it'll likely get merged 👍 |
DecodeReaderBytesBuilder::new() | ||
// On Windows, we need to check for added Byte Order Mark (BOM). To | ||
// do this, we don't specify an encoding, and enable BOM sniffing. | ||
.encoding(None) | ||
.bom_sniffing(true) | ||
.build(std::fs::File::open(filename)?) | ||
.read_to_string(&mut text)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I'm reading the docs correctly, this does more than BOM sniffing / stripping but also handles other encodings. That should be a separate question, discussed in an issue first. This also then feels like a heavy weight dependency just for handling BOM.
// On Windows, we need to check for added Byte Order Mark (BOM). To | ||
// do this, we don't specify an encoding, and enable BOM sniffing. | ||
.encoding(None) | ||
.bom_sniffing(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI this should have tests. We ask that the PR be split into two commits
- Add the tests showing the bad behavior (ie they pass)
- Change behavior and update the tests to reflect that change in behavior
In this case, I would want to have the first commit have a test that is parsing content with a BOM and to show the failure that happens. The follow up commit should then strip the BOM and update the test to not error anymore.
On Windows, config files will run into an error loading due to a Byte-order mark that may appear at the start. I don't understand this fully, but we would run into an issue from it below:
Where a file would load just fine on Mac/Linux, but show this on Windows. This fix will detect this zero-width mark, and properly skip it.