You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This regex includes the character class [^a-zA-Z0-9_\\x7f-\\xff] (with backslashes escaped for JSON). The \x7f-\xff range in this negated class is triggering an Oniguruma bug, resulting in a bug in the Hack grammar. Although that range appears like it excludes code points U+7F to U+FF, in fact it excludes U+7F to U+10FFFF. The reason is described in this comment.
This can be fixed by simply changing the \xff to \x{ff}, which changes Oniguruma from interpreting it as an invalid standalone encoded byte value (true for unenclosed \xHH for values above 7F) to a code point value (always true for the enclosed form \x{...}). Note that this handling is specific to Oniguruma, not other regex flavors.
Also note that this is the only place in the grammar that \\x7f-\\xff appears, but the correct version \\x{7f}-\\x{ff} appears 28 times.
To fix this, the \\xff should be replaced with \\x{ff}. It will then work correctly in Oniguruma. It is optional for the \\x7f to also be replaced with \\x{7f}.
The text was updated successfully, but these errors were encountered:
In addition to causing edge case bugs, this issue is also preventing the Hack grammar from running in Shiki when using its JS engine (which transpiles Oniguruma regexes to JS using Oniguruma-To-ES). Oniguruma-To-ES intentionally doesn't reproduce Oniguruma's bugs related to handling of unenclosed \xF5 through \xFF, and instead throws for them (as Oniguruma does for \x80 through \xF4).
This regex includes the character class
[^a-zA-Z0-9_\\x7f-\\xff]
(with backslashes escaped for JSON). The\x7f-\xff
range in this negated class is triggering an Oniguruma bug, resulting in a bug in the Hack grammar. Although that range appears like it excludes code points U+7F to U+FF, in fact it excludes U+7F to U+10FFFF. The reason is described in this comment.This can be fixed by simply changing the
\xff
to\x{ff}
, which changes Oniguruma from interpreting it as an invalid standalone encoded byte value (true for unenclosed\xHH
for values above7F
) to a code point value (always true for the enclosed form\x{...}
). Note that this handling is specific to Oniguruma, not other regex flavors.Also note that this is the only place in the grammar that
\\x7f-\\xff
appears, but the correct version\\x{7f}-\\x{ff}
appears 28 times.To fix this, the
\\xff
should be replaced with\\x{ff}
. It will then work correctly in Oniguruma. It is optional for the\\x7f
to also be replaced with\\x{7f}
.The text was updated successfully, but these errors were encountered: