Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Be more careful about file reading #4476

Merged
merged 2 commits into from
Feb 14, 2024
Merged

Conversation

mwichmann
Copy link
Collaborator

@mwichmann mwichmann commented Feb 8, 2024

If SCons reads a file to interpret the contents, codecs are a concern. The File node class has a get_text_contents() method which makes a best effort at decoding bytes data, but there are other places that don't get their file contents via that method, and so should do their own careful decoding - but don't, they just read as text and hope it's okay.

Move the decode-bytes portion out of File.get_text_contents() to SCons.Util.to_Text() so that everyone that needs this can call it. Add a couple of additional known BOM codes (after consulting Python's codecs module).

Note that while get_text_contents acts on nodes, the new (moved) routine to_Text acts on passed bytes, so it can be used in a non-Node context as well - for example the Java tool initializer reads a file and tries to decode it, and can get it wrong (see #3569), this change provides it some help.

Fixes #3569
Fixes #4462

No docs impact: documented behavior does not change, just makes SCons less error-prone in certain situations.

Contributor Checklist:

  • I have created a new test or updated the unit tests to cover the new/changed functionality.
  • I have updated CHANGES.txt (and read the README.rst)
  • I have updated the appropriate documentation

@mwichmann mwichmann added scanner Java Java tools and language support labels Feb 8, 2024
If SCons reads a file to interpret the contents, codecs are a concern.
The File node class has a get_text_contents() method which makes a best
effort at decoding bytes data, but there are other places that don't get
their file contents via that method, and so should do their own careful
decoding - but don't, they just read as text and hope it's okay.

Move the decode-bytes portion out of File.get_text_contents() to
SCons.Util.to_Text() so that everyone that needs this can call it.
Add a couple of additional known BOM codes (after consulting Python's
codecs module).

Note that while get_text_contents acts on nodes, the new (moved) routine
to_Text acts on passed bytes, so it can be used in a non-Node context
as well - for example the Java tool initializer reads a file and tries
to decode it, and can get it wrong (see SCons#3569), this change provides it
some help.

Fixes SCons#3569
FIxes SCons#4462

Signed-off-by: Mats Wichmann <[email protected]>
@mwichmann
Copy link
Collaborator Author

The Windows failure was the builder timing out after an hour.

@bdbaddog bdbaddog merged commit 759ed8c into SCons:master Feb 14, 2024
4 of 6 checks passed
@mwichmann mwichmann added this to the 4.7 milestone Feb 14, 2024
@mwichmann mwichmann deleted the rawfile-convert branch February 14, 2024 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Java Java tools and language support scanner
Projects
None yet
2 participants