Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Unicode encoding issues in Bazel's use of Starlark #24417

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

fmeum
Copy link
Collaborator

@fmeum fmeum commented Nov 20, 2024

Bazel internally uses String as a container for raw bytes assumed to be UTF-8, which differs from ordinary usage of String as a container for UTF-16 characters. This requires special implementations of certain Starlark functions that care about the notion of a "character":

  • {l,r,}split must not strip non-ASCII whitespace as those may be part of a UTF-8-encoded non-whitespace character.
  • json.decode has to emit UTF-8 bytes rather than UTF-16 characters.

Compatibility is verified by running all script-based tests both parsed as UTF-8 and using Bazel's internal encoding.

@fmeum fmeum marked this pull request as ready for review November 25, 2024 15:49
@fmeum fmeum requested review from tjgq and removed request for tetromino and brandjon November 25, 2024 15:49
@github-actions github-actions bot added the awaiting-review PR is awaiting review from an assigned reviewer label Nov 25, 2024
@tjgq tjgq requested a review from tetromino November 25, 2024 21:33
@iancha1992 iancha1992 added the team-Starlark-Integration Issues involving Bazel's integration with Starlark, excluding builtin symbols label Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-review PR is awaiting review from an assigned reviewer team-Starlark-Integration Issues involving Bazel's integration with Starlark, excluding builtin symbols
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants