Fix Unicode encoding issues in Bazel's use of Starlark #24417

fmeum · 2024-11-20T17:06:42Z

Bazel internally uses String as a container for raw bytes assumed to be UTF-8, which differs from ordinary usage of String as a container for UTF-16 characters. This requires special implementations of certain Starlark functions that care about the notion of a "character":

{l,r,}split must not strip non-ASCII whitespace as those may be part of a UTF-8-encoded non-whitespace character.
json.decode has to emit UTF-8 bytes rather than UTF-16 characters.

Compatibility is verified by running all script-based tests both parsed as UTF-8 and using Bazel's internal encoding.

fmeum added 3 commits November 20, 2024 18:05

Fix Unicode encoding issues in Bazel's use of Starlark

d2bb6c3

Fix JSON tests

1f2bb58

Enable in Bazel

67867c9

fmeum force-pushed the 23859-unicode-starlark branch from 8a1ba67 to 67867c9 Compare November 21, 2024 21:30

Fix test

78b8b63

fmeum force-pushed the 23859-unicode-starlark branch from 69925e5 to 78b8b63 Compare November 25, 2024 15:39

Drop Encoder argument

0435c09

fmeum force-pushed the 23859-unicode-starlark branch from c62a5eb to 0435c09 Compare November 25, 2024 15:49

fmeum marked this pull request as ready for review November 25, 2024 15:49

fmeum requested review from brandjon and tetromino as code owners November 25, 2024 15:49

fmeum requested review from tjgq and removed request for tetromino and brandjon November 25, 2024 15:49

github-actions bot added the awaiting-review PR is awaiting review from an assigned reviewer label Nov 25, 2024

fmeum mentioned this pull request Nov 25, 2024

Avoid char array allocation in Starlark format #23763

Open

tjgq requested a review from tetromino November 25, 2024 21:33

iancha1992 added the team-Starlark-Integration Issues involving Bazel's integration with Starlark, excluding builtin symbols label Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Unicode encoding issues in Bazel's use of Starlark #24417

Fix Unicode encoding issues in Bazel's use of Starlark #24417

fmeum commented Nov 20, 2024 •

edited

Loading

Fix Unicode encoding issues in Bazel's use of Starlark #24417

Are you sure you want to change the base?

Fix Unicode encoding issues in Bazel's use of Starlark #24417

Conversation

fmeum commented Nov 20, 2024 • edited Loading

fmeum commented Nov 20, 2024 •

edited

Loading