Be more careful about file reading #4476

mwichmann · 2024-02-08T15:26:44Z

If SCons reads a file to interpret the contents, codecs are a concern. The File node class has a get_text_contents() method which makes a best effort at decoding bytes data, but there are other places that don't get their file contents via that method, and so should do their own careful decoding - but don't, they just read as text and hope it's okay.

Move the decode-bytes portion out of File.get_text_contents() to SCons.Util.to_Text() so that everyone that needs this can call it. Add a couple of additional known BOM codes (after consulting Python's codecs module).

Note that while get_text_contents acts on nodes, the new (moved) routine to_Text acts on passed bytes, so it can be used in a non-Node context as well - for example the Java tool initializer reads a file and tries to decode it, and can get it wrong (see #3569), this change provides it some help.

Fixes #3569
Fixes #4462

No docs impact: documented behavior does not change, just makes SCons less error-prone in certain situations.

Contributor Checklist:

I have created a new test or updated the unit tests to cover the new/changed functionality.
I have updated CHANGES.txt (and read the README.rst)
I have updated the appropriate documentation

If SCons reads a file to interpret the contents, codecs are a concern. The File node class has a get_text_contents() method which makes a best effort at decoding bytes data, but there are other places that don't get their file contents via that method, and so should do their own careful decoding - but don't, they just read as text and hope it's okay. Move the decode-bytes portion out of File.get_text_contents() to SCons.Util.to_Text() so that everyone that needs this can call it. Add a couple of additional known BOM codes (after consulting Python's codecs module). Note that while get_text_contents acts on nodes, the new (moved) routine to_Text acts on passed bytes, so it can be used in a non-Node context as well - for example the Java tool initializer reads a file and tries to decode it, and can get it wrong (see SCons#3569), this change provides it some help. Fixes SCons#3569 FIxes SCons#4462 Signed-off-by: Mats Wichmann <[email protected]>

mwichmann · 2024-02-13T15:01:39Z

The Windows failure was the builder timing out after an hour.

mwichmann added scanner Java Java tools and language support labels Feb 8, 2024

mwichmann force-pushed the rawfile-convert branch from 5c6bb9a to 180c601 Compare February 8, 2024 16:30

Merge branch 'master' into rawfile-convert

3e60ee1

bdbaddog merged commit 759ed8c into SCons:master Feb 14, 2024
4 of 6 checks passed

mwichmann added this to the 4.7 milestone Feb 14, 2024

mwichmann deleted the rawfile-convert branch February 14, 2024 15:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Be more careful about file reading #4476

Be more careful about file reading #4476

mwichmann commented Feb 8, 2024 •

edited

Loading

mwichmann commented Feb 13, 2024

Be more careful about file reading #4476

Be more careful about file reading #4476

Conversation

mwichmann commented Feb 8, 2024 • edited Loading

Contributor Checklist:

mwichmann commented Feb 13, 2024

mwichmann commented Feb 8, 2024 •

edited

Loading